MrExcel's Learn Excel #665 - Scraping Webpages

If you like this content, please consider visiting the video on YouTube and subscribe to the MrExcel Channel to show your support!
This video has been published on Feb 20, 2009.
In todays netcast, I will show you how to use a simple loop and some recorded macro code to pull all of the data from many web pages. Episode 665 shows you how.

This blog is the video podcast companion to the book, Learn Excel 97-2007 from MrExcel. Download a new two minute video every workday to learn one of the 377 tips from the book!
maxresdefault.jpg


Transcript of the video:
Hey! Welcome back to the MrExcel netcast.
I'm Bill Jelen.
Yesterday, I showed you how to take this column of links over here in column A and basically take the link, the url out of each item and bring it over here to column B, doing a couple of lines of VBA.
Today, we have something a little bit more complicated.
But basically, I want to take the contents of all of those web pages and bring them into Excel.
So, the first thing I'm going to do, is copy those characters from the formula bar.
I don't want to copy the whole cell.
I want to copy just the characters, and I'm going to go to a brand new workbook, will turn on the macro recorder.
I'll leave those settings the way they are and build the first web query.
Data, import external data, new web query.
In the address bar, I'm going to paste in, the first page that I want to grab data from and click [ go ].
Now, I'm going to wait for this to load.
Now, this happens to be a page where I don't unfortunately, have a table right around the article.
So, I'm going to have to grab the entire web page, but in your situation especially, if the web page is generated by some sort of a database application.
There's probably a table, it gets exactly what you need click [ import ] and then click [ ok ], for the next dialog that comes up, and you'll see that.
It does in fact pull in all the text from the web page.
I'm going to stop the macro recorder and go take a look at that recorded code.
It's macro 5, will edit that and as you look through this.
You can kind of see where it puts the Url in here.
If you ask for a specific table, it's going to ask for that table.
I'm going to copy this recorded code [ ctrl + C ] to copy and go back to my workbook, where I built just a few lines of an initial loop.
Basically it says hey, we're going to look at everything in range B1, through it's actually B10 and for each cell, we're going to add a new worksheet, and then I'm going to build a variable called 'This Url'.
It's equal to url semicolon (;) and the link, that's in the cell.
So, right here in the middle of this code.
I'm going to paste the recorded code and the one thing I have to change is the recorded code has hard-coded the url, instead of that I'm going to take out the hard coded value and put the variable called This Url.
Everything else is going to stay exactly the same, and we should be able to go back to Excel.
Now, I'll close my temporary workbook, that's where I recorded the code.
I don't need to save those changes.
I'm back in my original code.
We're going to go everywhere from B1 to B10.
I'll run that code, run loop through, click [ run ] and very quickly I'm going to get the 10 new worksheets inserted, and if we wait sure enough in each worksheet.
I'm going to get the web-page copied in.
So, very quick way to go through and grab data from a website and get all that data right here on your computer.
Now, my favorite example of this, is an article.
I wrote a couple of years ago.
It's out at the website MrExcel.com/tip 072.shtml.
This is where I basically had a website, where I was trying to get about 4,000 addresses from 4,000 different webpages and this web page goes through and tells the story of exactly how I built those queries, to go pull all of that data.
So, while today's netcast is a very simple example.
We just grabbed 10 web pages.
The website has a specific example, where we were able to get a whole lot of data out of a website very easily and very quickly.
Hey! Thanks for stopping by, we'll see you next time for another netcast from MrExcel.
 

Forum statistics

Threads
1,214,881
Messages
6,122,074
Members
449,064
Latest member
MattDRT

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top