Dueling Excel - Count Words, Clean Data Duel 169

If you like this content, please consider visiting the video on YouTube and subscribe to the MrExcel Channel to show your support!
This video has been published on Feb 7, 2015.
Today's problem is sent in by Bill. He is looking for a random selection of 250 items out of 2500. But, he also wants to clear out two kinds of garbage data first. Bill thinks he needs a =WORDCOUNT() function, but there are many ways to solve this problem. See Mike Girvin introduce Right-Click-O-O to sort, plus IFERROR, FIND ALL, Text to Columns, RAND and much much more.
maxresdefault.jpg


Transcript of the video:
Hey, welcome back, it's time for another Dueling Excel podcast.
I'm Bill Jelen from MrExcel be joined by Mike Girvin from Excel Is Fun.
This is our episode 169.
=WordCount.
Bill: Alright, welcome back.
Today's question sent in by Bill: I want a random selection of 250 values from a list of 2500+ street names.
But that is not the problem.
I first need to do some data cleansing . I want to throw out any entries that have MORE than one word.
Alright, see, so this one here that ends in BL for Boulevard?
Need to throw that one out.
Alright, so how do we come up with items that have just one word?
Might be really cool, if we could just come over here and use the =WORDCOUNT function in Excel.
But unfortunately, that function has not been invented yet.
Neither is =WORDS.
Alright, then why aren't these here?
These would be really, really useful things.
But unfortunately, they are not there, unless you have some add-ins.
I know some add-ins have the words function.
So we need to figure out how many words are over there in column A. And I'm thinking VBA, but I'm just in a hurry, I'm trying to get this thing done.
And so, I'm going to do this crazy set of steps.
Ctrl+C, copy the data over here to column C. So that way I'm not going to destroy the original data.
And then on the DATA tab, do “Text to Columns”, Delimited by a Space.
Which is going to split the words out into multiple columns.
So, Eisenhower Independence Boulevard is in three different cells.
And then to get my word count, I'm going to use =COUNTA(C2:O2).
Count of course would ignore all text, and just come over here way too far.
And that has three words, double-click to shoot that down.
Now I'll convert those for values and Ctrl+C and the Program Key+V. I can delete these extra columns out here.
Of course now to get my selection from there: Right, I just want to first sort descending to find anything with more than one word.
Using PgDn and PgUp.
Probably I could have searched for that number 1.
But anyway, here we are, I'll do Alt+EDR to delete those.
So now I have all the one-word names.
I also don't want a whole bunch of numbers, you see all these numbers up here.
I'm looking for real words.
I'm going to delete many, actually all of those.
And then to get a random selection of 250.
It looks like we have 1497.
I'm just going to come over here and use the rand function.
=RAND() It gives us a number between 0 and 1.
And we will sort that data A to Z.
And the first 250 items are my list.
Ctrl+C, and those are my list.
But Mike, there has to be a better way than using DATA and Text to Columns, so let's see if you can come up with a cool, cool formula for us.
Mike: Thanks MrExcel.
Oh, My Heavens!
I totally want a Words function.
Oh man, when you started to do that, I was so excited, I thought there was a new function.
Oh man, this is going to be a trick, because we don't want anything that has this space, and we don't want anything with a number.
Well, I guess I'll start with the number and I'll ask the =LEFT(A2,1) function to please go and get the first character from the left.
Ctrl+Enter, double-click, and send it down.
Now it'll pull out the number but that's actually not a number.
F2, but if I do any math operation, like =LEFT(A2,1)+0, Ctrl+Enter, to populate that all the way down, then it converts it back to the numbers.
But any text gets a value.
So, really, I'm interested in the values.
Now, that'll check for the number, right?
But I also need to check for the spaces.
So I'm going to use the SEARCH function: =SEARCH(“ ”,A2), and SEARCH FOR, in double quotes a space, comma, within this cell right here, Ctrl+Enter, double-click and send it down.
Oh, look at that, it's the error I'm interested in.
So, if I wrap the =ISERROR(LEFT(A2,1)+0).
(The first one ignores and the second one doesn't, I don't really care.) If I wrap that around this, it'll tell me TRUE when it sees an error.
Same with this one right here, =ISERROR(SEARCH(“ ”,A2)).
I'm going to take the second one, Ctrl+Enter, Ctrl+Enter, double-click and send it down.
So, it's only the TRUE TRUE TRUE TRUE, that tells me that record is good, so I'm going to copy this, Esc, Ctrl+Shift+Down Arrow, Delete, F2 and I'm going to put this inside the =AND(ISERROR(LEFT(A2,1)+0), just like in English that has to be true ,ISERROR(SEARCH(“ ”,A2))) and that has to be true.
Close parentheses, Ctrl+Enter and double-click and send it down.
So, the FALSEs mean I don't want it.
Don’t want it, do want it, don't want it.
Now, I'm going to do a little trickier.
Ctrl+Shift+Down Arrow, and I want to find all of the FALSEs to delete them.
So, Ctrl+F to find, and I'm going to put a FALSE.
And I'm going to make sure if this were Formulas, like we usually do.
It tells me nothing.
I want to make sure that this is Values and say Find All.
Instantly it gives me a list down here.
If I Ctrl+A, it highlights all them.
Esc to close that and I'm going to do what MrExcel did.
Alt+Edit, which is EDR for row, and Enter.
That wasn't as spectacular as MrExcel, cause it wasn't as fast.
Ctrl+home, get to the top, and now I have all of my records.
Now I'm going to copy MrExcel and do the =RAND() function, Ctrl+Enter, double-click and send it down.
That will give me a number between 0 and 1 with 15 digits.
And before I sort it, I'm going to do a little trick here.
I'm going to do a relative cell reference to the first one =A2, Ctrl+Enter, Ctrl+C to copy.
And I want to copy that formula exactly down 250 rows.
F5 to open up Go To and let's see: e251.
Now, before I hit Enter, which will enact that OK, I'm going to hold Shift+Enter.
Notice that sends it all the way down, Ctrl+V.
I've copied the relative cell reference all the way down.
Now, I can simply Right-Click, Sort.
Point to sort, whichever sort, or however you want to sort it; in instantly, every time I sort it, I'm going to do the Right-Click key and it's O+O, Right-Click key+O+O to sort.
I'm getting a new list of 250 people, Ctrl+Shift+Down Arrow.
I’ve got to make sure the count down here is looking good, Ctrl+Home.
Alright, throw it back to MrExcel.
Bill: Oh Mike you have to lov.
You have to love these Dueling Excel podcasts, cause it shows the completely different approach with how we solve these problems.
I mean, I had this data to clean.
First thing I attack is the number of words, but the first thing you attack.
is “let's get rid of the numbers”.
And you did it all with formulas.
Along the way we saw the ISERROR function, using the GO TO dialog box, but holding down the Shift key before clicking OK, FIND ALL and then Right-Click+O+O, dis- or descending -that's amazing.
Now, when I sent that video to you, I was fairly convinced, that you were going to rock out this crazy array formula: =SUMPRODUCT(--(MID(A2,ROW(INDIRECT(“1:”&LEN(A2))),1)=” “))+1.
Which, you don't even have to press Ctrl+Shift+Enter for, that takes the LEN of A2, concatenates that with a 1, does it INDIRECT, passes it to the ROW and then asks for the MID of A2.
And, in essence, this is giving the MID of A2.
The first position, second position and third position, seeing if it's equal to a space, changing those TRUEs and FALSEs to 1s and 0s with minus-minus.
Doing a sum product and then finally adding one to the end, to come up with a word count.
That's what I thought you were going to do.
And had you done that, I was ready to come back and hit you with some VBA, where we can actually create the WORDS function.
It's not a lot of VBA, it's one-two-three-four lines.
Start out with everything has one word (Words = 1) and then for i = 1, to the length of my value (For i = 1 To Len (MyValue), look for a space, if it is a space, words equals words+1 (If Mid(MyValue, i, 1) = “ “ Then Words = Words + 1).
That's how I thought the duel was going to come out.
But you come at it with a completely different approach and solve the problem much more easily than I ever thought we would do it.
Awesome.
Well, hey, it's great to be doing duels and I want to thank everyone out there for stopping by.
We'll see you next week for another Dueling Excel podcast from MrExcel and Excel Is Fun.
 

Forum statistics

Threads
1,213,511
Messages
6,114,054
Members
448,543
Latest member
MartinLarkin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top