Learn Excel 2013 - "Scraping Web Pages with Excel": Podcast #1684

If you like this content, please consider visiting the video on YouTube and subscribe to the MrExcel Channel to show your support!
This video has been published on Apr 9, 2013.
As promised, this is the continuation of yesterday's Podcast. When we left Episode #1683, Bill had shown us how to extract Page URLs from Links via a little Macro that he'd cooked up in an earlier Podcast. Today, MrExcel will pick up where he left off, showing us how to get the actual Data from each of the URLs that we obtained yesterday. Follow along with Episode #1684 as Bill shows us how to set up another VBA Macro, Record and Modify another and pull in 70 Individual Worksheets of Data from his Web Query.

**This Podcast involves Intermediate to Advanced Excel User information; but test your skills and give it try - even if you've never done it before!! Challenge your abilities and learn what Microsoft Excel can really do for you! Have fun!**

This mini-series is a really cool overview of some of our earlier Podcasts and how to integrate a variety of Formulas and Functions to evaluate Real Estate.

...This blog is the video podcast companion to the book, Learn Excel 2007 through Excel 2010 from MrExcel. Download a new two minute video every workday to learn one of the 512 Excel Mysteries Solved! and 35% More Tips than the previous edition of Bill's book! http://www.mrexcel.com/learn2010/LE20...

And for more information on Excel 2013, check out "Microsoft 2013 InDepth" -- by Bill Jelen. Excel 2013 In Depth is the beyond-the-basics, beneath-the-surface guide for everyone working with Excel 2013. Excel expert and MVP Bill Jelen provides specific, tested, proven solutions to the problems Excel users run into every day: the types of challenges other books ignore or oversimplify. Jelen thoroughly covers all facets of working with Excel 2013. http://www.amazon.com/Excel-2013-Dept...
maxresdefault.jpg


Transcript of the video:
MrExcel Podcast is sponsored by Easy-XL.
Learn Excel from MrExcel podcast episode 1684: Scraping Web Pages.
Hey welcome back to the MrExcel netcast I'm Bill Jelen. Well yesterday episode 1683 I had a little formula here ISODD and ISEVEN to try and find the houses on a certain side of the street and then we had these hyperlinks here.
I showed you a quick little macro that would take those hyperlinks and put them out in column F. So we actually see the hyperlinked address.
Today the goal is to go and visit each one of those hyperlinks and retrieve some information from that page.
So I have the start of a macro wrap here.
Let's see, so we have two worksheets WSD is my first worksheet that's the one with the whole list of 70 properties and then WSW, I'm going to create on the fly.
Actually I'm going to create it 70 times on the fly.
So we're going to go from Row 2 to Row 71.
For each row we're going to build a variable called this ThisURL.
It's going to be the value from column 6 that's column F and I'm going to prepend that with URL semicolon this is what the web query needs.
This parcel is column A so your column 1 is column A.
Put a P for that and then we're going to add a new worksheet, Worksheets.Add.
Where do we want to add it?
We want to add it after the last worksheet so I just say hey what's the count right now?
I was tempted to use “I” but it would be one too many so I'd have to use I-1 and then rename that worksheet to be this parcel.
I used a “P” before the parcel number because I hate to have worksheet names that are numeric because it’s hard to select them later.
That's why I wrote the “P”.
Now select that worksheet and do a web query here.
Well that's a heck of a lot of code to do a web query here.
I don't remember that code so I'm going to do what I always do when I get to something hard I turn on the macro recorder.
So we come back here and we're going to add a little worksheet.
I'm going to copy this big long URL to the clipboard.
So I'm going to choose the whole thing in the formula bar and it's very important you do in the formula bar, Ctrl+C or on the clipboard and now escape.
All right and then here on Sheet 1 we're going to turn on the macro recorder.
Go to the View tab then Macros and select Record Macro.
Leave everything else the way it is and click OK.
The macro recorder is now recording all of my steps.
So I go to the Data tab and select “Get External Data from Web” We will paste in the URL that we copied here ctrl+V to paste.
Cliick “Go”.
All right, now maybe there's a particular table here that I want but I'm just going to get the whole webpage and we'll deal with that later.
Click Import.
It goes to A1, great.
There we go, that's it, we can stop recording now.
Let's go look at our code alt+F11.
Where did the code go?
It goes to a new module.
Module 2.
All right so right here you can see where they've put that big long connect string in so we're going to have to modify that to work with our data.
So I'm going to copy these lines of code go back to module 1 where my comment says “do a web query here” and we'll just paste.
All right, so they're using active sheet which is going to work because I selected WSW.
That's great.
The connection:= this big long string, I don't need that.
Shift+down arrow, there we go and then backspace a little bit all right.
So “ThisURL” is our connection and Destination:= Range($A$1).
CommandType = 0, that's something new and I don't know what that is so we're going to get rid of that and under Name we don't need to have this big long name every time.
I do like to make each query have a different name though so I'm going to use “Query” & i.
It’s just that the way names are unique each time.
Ok, now everything else here is generally ok.
I always go down and make sure that Refresh BackgroundQuery is equal to false.
You never want to have this one to be equal to true.
if you leave that as false then the macro has to pause until the query finishes.
If you would let it be as true, then the first sheet is trying to retrieve the query and the macro is already on sheet 2.
Now you have a whole bunch of different sheets trying to pull data from the same website if you don't irritate the website like sometimes the website only allows some of the connections from the same machine.
I just always let it go one at a time.
Next I will go back up and get our next line.
Let's just do debug compile here see if there's any obvious mistakes all right alt+F11 to go back to Excel.
You can actually get rid of this temporary sheet.
All right Alt+F8 and we're going to say RetrievePages and we'll click Run.
Okay, so down in the status bar; I'll zoom in down there, you can see that it's switching back and forth very rapidly between connecting to the web, copying data from the web page, retrieving web data and so on.
So every time we see that those come back to let's say copying we know that it's getting the next page.
There's 70 pages to get, it will take a while so I’ll just pause the recording here and we'll let it finish.
All right, there we go.
Look at that.
All those sheets hopefully there should be about 70 sheets there, ctrl+left click, right click and there is the whole list of sheets, all right.
So we have now retrieved data from 70 different web pages.
Tomorrow we just need to go through and write a little macro or maybe some indirect formulas.
I'm not even sure how I'm going to do it to grab the sales data from each of those pages.
All right, so using a macro and a web query to scrape data first, in this case 70 different web pages.
All right, well hey I want to thank you for stopping by.
You know by the way code like this is in VBA and macros and of course the Excel 2013 version is out.
It's been what 2003, 2007, 2010, 2013; four different editions of this book so in fact this very code that we're creating today has similar examples in that book.
All right, well, hey I want to thank you for stopping by.
We’ll see you next time for another netcast of MrExcel.
 

Forum statistics

Threads
1,214,979
Messages
6,122,557
Members
449,088
Latest member
davidcom

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top