Learn Excel - Clean Data with Power Query - Podcast 2037

If you like this content, please consider visiting the video on YouTube and subscribe to the MrExcel Channel to show your support!
This video has been published on Oct 2, 2016.
Two Power Query tricks today: Importing a list of files from a folder into Excel and then cleaning all CSV files in a folder. Episode Recap:
The Power Query tools are on the Data tab in Excel 2016
Free add-in for 2010 and 2013
List all files from a folder into the Excel grid using Power Query
Choose New Query, From File, From Folder
Not obvious: expand the attribute field to get size
If your data is in CSV files, you can import all of the files at once into a single grid
Promote the heading row
Delete the remaining header rows
Replace "" with null
Fill down for the outline view
Delete the grand total column
Unpivot the data
Formula to convert month names into dates
Complete list of steps - world's greatest Undo
Next day - refresh the query to re-do all of the steps
maxresdefault.jpg


Transcript of the video:
Learn Excel for MrExcel Podcast, Episode 2037. Clean Data with Power Query.
I am podcasting this entire book.
Click that "i" in the top right-hand corner to get to the playlist.
Alright, in the book-- in the book-- I always said I wanted to just introduce Power Query. Now, Power Query is built into Excel 2016 here under Get & Transform.
But, if you have 2010 or 2013, as long as you're running Windows and not Mac, everything that's here in Get & Transform, you can download for free from Microsoft-- from, just search for "download Power Query". And I'm interested in getting a file list today.
All right, I want to list all the files in a folder.
Maybe I need to see which files are the large files, or I need to sort, or I need, you know, to get a combination of, you know, the budget files that we sent out.
And then a different folder of which ones we came back with. Alright? So, something.
And I paste it in a folder path that I copied from Windows Explorer, click OK, and they show me this preview. I want to edit this.
Alright.
And a couple of things here-- you see we have the file Name, the Extension, the Date it was accessed, modified, created, and then it's really not obvious that this symbol here-- this Attributes-- there's more stuff in here.
And if you click this symbol, then I can go in and get things like file Size, or if it's ReadOnly and things like that.
So, in this case, I just want file Size, click OK. So, I have Attributes.Size, I can see how many bytes are in each one.
Alright, and maybe-- you know, maybe-- I don't need everything here.
Maybe I don't need the Date created, so I can right click and say that I want to Remove that column.
Alright, and this binary, I don't need that-- we'll Remove that column, and then just close, and load, and it now gives me a sortable view of everything in that folder.
If the folder changes, I can come in here and I can refresh the query, and it will go back out and pull that data in. Right.
This is-- for me-- this was a problem that we used to have all the time-- we would send out 200 budget files, and you get some of them back, not all them back, you need to be able to compare.
So now I can essentially do a VLOOKUP between Folders, right? Just amazing how cool it is.
Hey, but look, let's go beyond what I have in the book and show you how that's just the tip of the iceberg. Alright?
So, I'm going to create another Query here-- Data, New Query, From File, From Folder, I'll copy that folder path here, click Edit.
Okay, now, right here-- October 2016-- this next trick only works with CSV files.
The day that this works with .xlsx files, it'll be an amazing thing.
Here, I have a folder, a whole bunch of files, and I want to create one Excel grid with all of the data from all of these files.
It's not intuitive at all.
These two down-arrows pointing at the horizontal line, when I click that, BAM!-- it just pulled in every single record from every single file in that folder.
Isn't that amazing?
I mean, that was a VBA macro before, and it takes months to learn VBA macros; You can learn Power Query in ten minutes.
Alright, now, a couple of things here; you see that the word Customer, Product, Jan, Feb, Mar, is repeated multiple times, because it was at the top of every single file.
So, I'm going to take the top row, go to the Transform tab, and say, Use First Row As Headers, which is going to replace Column1, Column2, Column3, with these real headers.
And then, I'm going to open this Filter drop down and say, remove Customer, and that'll get rid of all of the other headers from all of the other files, but keep the header from the first file.
Alright, then we have this outline view right in regular Excel: Ctrl+G, Special Blanks = Up Arrow, Ctrl+Enter, Re-select the whole column, Ctrl+C, Alt+E S V, Enter.
Alright, that was a lot of steps. It's still some steps, now-- it's still some work.
We have to select this column and go to Replace Values, say that we're going to replace "nothing" with the word "null," click OK.
That'll give us nulls there, and then those nulls allow us to use this amazing man called Fill-- Fill down.
Watch that column when I choose Fill down-- BAM! It just pulled in all of that outline view and brought the value down a lot faster than the whole other set of steps.
Alright, the grand total I don't need-- right-click and Remove.
Now-- right now-- at this point, you say, "Oh, yeah, hey, we could pull this in," and it'd be awesome.
But if we wanted to create a PivotTable from this data, having a repeating group going across Jan, Feb, Mar, is not a good format for a PivotTable.
Right now, we have 47 rows; I need to have 47 times twelve rows.
And to do this in a regular Excel file, it is horrendous using a multiple consolidation range that I learned from Mike Alexander, at Data Peak Technologies.
But, check this out, I'm going to choose the labels-- these are the things that I don't want to change-- and then, say, Unpivot Other Columns, and we go from 47 rows to 564 rows. That's an amazing step.
Here, you can see that these are text, easy enough to change it to either Currency or a Whole Number, right-click, Rename, and call it Revenue. How about these months?
They're all texts.
Here's an awesome way to fix that: We go to Add Column, Add Custom Column, doesn't matter what the name is, "1, 2016"-- because they're all from 2016-- click OK, now we have this new Custom column.
I'm going to take the Attribute column-- the month-- and the "1, 2016", and say, I want to Merge Columns, with a Space in between, and call it Date, click OK. Alright.
And that looks enough like a real date, that when I go to Transform and change it to a Date, it converts to a true Excel date.
And at this point, these two temporary columns, I can right click and Remove.
Alright, now, you're sitting there saying to yourself, "Well, Bill, we could have done all of this in Excel". And that's absolutely true. We could have done all of it in Excel.
It would have been harder to get all the CSV files into one file; it would have taken longer to fill in the blanks; it definitely would have taken longer to do the Unpivot operation.
But here's the thing, look over on the right-hand side-- we haven't talked about APPLIED STEPS, at all. This is like the world's greatest undo.
If you needed an audit trail-- if the auditors come and say, "Well, how did you get from all these CSV files to this file that we're building our financial statements on?"-- you can go back and show what it looked like at each step along the way.
If you screwed something up back here, you could change or edit that step, and Close & Load.
Alright, so here's our data set-- this is based on all of the files in this folder.
Let's build a little PivotTable from here-- Insert, PivotTable, Existing Worksheet, right here, and I'll put Revenue in the Values area, Products down the left hand side You see that we have 6,051,849. Well, that's today's data.
Now, tomorrow-- tomorrow-- let's say that we get a couple of new customers, a couple of new files come along, and our IT department takes those and dumps them into our folder.
All I have to do is reopen this file, select the query, come over here and Refresh, and then come here, Analyze, Refresh the PivotTable, and we have the new data.
It's faster on day one, maybe by a factor of 20, 30%; on day two, it's faster by 99%.
It's an, absolutely, amazing product power query.
It's in Excel 2016, but if you're in 10 or 13, for Windows, you're more than welcome to go out and download it for free.
Well, in every podcast in this series, I've asked you to buy this book, but today I'm not going to do that.
Today, I think you should buy this book, from Ken Puls and Miguel Escobar, "M is for (DATA) MONKEY, A Guide to the M Language in Excel Power Query".
This book will teach you everything about the Power Query interface. It's an amazing book-- the best book on Power Query.
Everything I learned, I learned from this book.
Click the "I" in the top right hand corner to get to a place where you can buy this book.
It will have a-- honestly, on a flight from Orlando to Dallas, I read the whole book, and my knowledge of Power Query just soared in two hours.
You can be up to speed and replace things that you would have had to-- used to-- have done with VBA.
Alright, so, Power Query tools-- they're already built into Excel 2016, but you can get them for free from 2010 and 2013.
My first example in the example from the book was, List all files from a folder into the Excel grid; Choose New Query, From File, From Folder; Not obvious, that attribute field you have to expand it to find the file size.
But then, the second trick that I showed you-- if your data's in CSV files, you can import all of the files at once into a single grid, Promote the heading row.
Delete the remaining header rows, Replace the blanks "" with null, Fill down, got rid of the grand total column, and then the amazing Unpivot the data.
Unpivot other columns, and then we use the formula to convert month names into dates.
They have that complete list of steps on the right hand side-- the world's greatest undo.
And, the next day, when you have to do it all over again, just click the Refresh button.
In fact, if you're too-- lazy too lazy to click the Refresh button-- come out here under Data, Connections, Properties, and say that every time I open this book, just go-- this workbook-- go do the exact same thing, alright, you know, if you don't...
if clicking refresh is too much of a hassle, which is ridiculous.
But, you could have it setup that every time you open that workbook, it just goes out to that folder, pulls everything in, and you'll just have to Refresh the PivotTable here.
PowerQuery-- absolutely amazing, amazing tool.
 

Forum statistics

Threads
1,213,536
Messages
6,114,215
Members
448,554
Latest member
Gleisner2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top