Comparing two columns containing grid refs to find duplicates.

squirrel31

New Member
Joined
Sep 15, 2014
Messages
30
Hello,

First off I'd like to point out that I originally had posted this query at the following link:- Comparing two columns containing grid refs to find duplicates. 5 days ago where I was unsucessful in gaining any replies that helped me out. (You will also find an attached example showing a snippet of my work so far)

As stated in the thread using the above link. I have a problem where I have in one column I have an ID, to the right I have Pos_X, to the right of that I have Pos_Y (Grid reference coords).

I have used a formula to find the duplicate IDs so that I could filter them out however I now need to analyse these duplicates further. Therefore I need to work out within the duplicate IDs which have duplicate grid references and which have unique grid references. Only the first of x amount of duplicate grid refs is then marked down in the Additional intersect? column using the format 'Y'. Any unique grid ref ID is also marked down as 'Y' in the Additional intersect? column.

Any scenario where a 'N' is marked would be the following:

  • The ID is not a duplicate therefore there is only one position for that ID.
  • There are multiple duplicate grid references and therefore the 2nd onwards are marked as N whilst the first is Y.

Any scenario where a 'Y' is marked are as follows:

  • There is a duplicate grid reference under the same ID which only the first duplicate grid ref is marked as Y
  • There are x amount of unique grid references under the same ID which all are marked as Y


Thanks in advance!
 

Excel Facts

How to create a cell-sized chart?
Tiny charts, called Sparklines, were added to Excel 2010. Look for Sparklines on the Insert tab.
To be clear do all 3 columns you mention have to be the same to be classed as a duplicate? Some sample data with expected results would be good

As stated in the thread using the above link. I have a problem where I have in one column I have an ID, to the right I have Pos_X, to the right of that I have Pos_Y (Grid reference coords).
 
Upvote 0
To be classified as a duplicate the following needs to be true:

  • ID is same

To be Y the following needs to be true:

  • ID is same
  • Pos_X and Pos_Y are unique

To be N the following needs to be true:


  • ID is same
  • Pos_X and Pos_Y are the same

As for that formula Andrew it doesn't quite work in the way I want (See the example I posted in the original thread I mentioned). It is currently telling me that there is an additional intersect at a non-duplicate record. Additionally, it is telling me there is an additional intersect for a record that is a duplicate however the grid refs are the same in both.
 
Upvote 0
Please post some sample data including examples where the formula I suggested isn't returning the expected result and the reason. We need to keep everything in this thread rather than referring to one in another Forum.
 
Upvote 0
Example in table form (Additional Intersect shows expected results):

Orange = Completed
Green = In progress
Red = No intersect
Violet = Duplicate

ID
Pos_X
Pos_Y
Con
Lookup
Pon Lookup
Checked?
Intersect?
Completed Entry?
Duplicate?
Additional Intersect?

363964822

<tbody>
</tbody>
151815
32546

3639648221518132545

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

363964822

<tbody>
</tbody>
Y
Y
Y
N
-

363611803

<tbody>
</tbody>
158801
37559

3636118031588037559

<tbody>
</tbody>

3636118031588037559

<tbody>
</tbody>

363611803

<tbody>
</tbody>
Y
Y
N
N
-

363611805

<tbody>
</tbody>
161140
39994

3636118051611439994

<tbody>
</tbody>

3636118051611439994

<tbody>
</tbody>

363611805

<tbody>
</tbody>
Y
Y
Y
N
-

363976725

<tbody>
</tbody>
167048
30340

3639767251670430340

<tbody>
</tbody>

3639767251670430340

<tbody>
</tbody>

363976725

<tbody>
</tbody>
Y
Y
N
N
-

363611789

<tbody>
</tbody>
166502
42025

3636117891665042025

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

363611789

<tbody>
</tbody>
Y
Y
Y
Y
Y

363976725

<tbody>
</tbody>
167085
30311

3639767251670830310

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

363976725

<tbody>
</tbody>
Y
Y
Y
Y
Y

630624513

<tbody>
</tbody>
178576
43780

6306245131785743780

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

630624513

<tbody>
</tbody>
Y
Y
Y
N
-

210182335

<tbody>
</tbody>

212551

<tbody>
</tbody>
200431

2101823352125520043

<tbody>
</tbody>

210182335

<tbody>
</tbody>
Y
N
-
N
-

210182301

<tbody>
</tbody>
242047
205823

2101823012420420582

<tbody>
</tbody>

#N/A

<tbody>
</tbody>


210182301

<tbody>
</tbody>

<tbody>
</tbody>
Y
Y
Y
N
-

210182310

<tbody>
</tbody>

241135

<tbody>
</tbody>
205965

2101823102411320596

<tbody>
</tbody>

210182310

<tbody>
</tbody>
Y
Y
N
N
-

210182300

<tbody>
</tbody>

242677

<tbody>
</tbody>
205715

2101823002426720571

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

210182300

<tbody>
</tbody>
Y
Y
Y
Y
Y

210182300

<tbody>
</tbody>

243785

<tbody>
</tbody>
205306

2101823002437820530

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

210182300

<tbody>
</tbody>
Y
Y
Y
Y
Y

1

<tbody>
</tbody>

123456

<tbody>
</tbody>
234567

11234523456

<tbody>
</tbody>
Y
Y
N
N
-

1

<tbody>
</tbody>

123456

<tbody>
</tbody>
234567

11234523456

<tbody>
</tbody>
Y
Y
Y
Y
N

<tbody>
</tbody>

Results that went wrong (Additional Intersect is current using formula):

210182109, 210182132 and 1000534854 are wrong because they are not duplicates. 363631355 is wrong because although they are duplicates (Same ID) the grid references are the same therefore no unique grid refs)

ID
Pos_X
Pos_Y
Con
Lookup
Pon Lookup
Checked
Intersect
Completed Entry?
Duplicate
Additional Intersect?

210182109

<tbody>
</tbody>

261133

<tbody>
</tbody>

193517

<tbody>
</tbody>

2101821092611319351

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

#N/A

<tbody>
</tbody>
Y
Y
N
N
Y

210182132

<tbody>
</tbody>

266185

<tbody>
</tbody>

193035

<tbody>
</tbody>

2101821322661819303

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

#N/A

<tbody>
</tbody>
Y
N
-
N
Y

363631355

<tbody>
</tbody>

255745

<tbody>
</tbody>

133052

<tbody>
</tbody>

3636313552557413305

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

Y

<tbody>
</tbody>
N
-
Y
N

363631355

<tbody>
</tbody>

255745

<tbody>
</tbody>

133052

<tbody>
</tbody>

3636313552557413305

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

Y

<tbody>
</tbody>
N
-
Y
Y

<tbody>
</tbody>

ID
Pos_X
Pos_Y
Con
Lookup
Pon Lookup
Checked
Intersect
Completed Entry?
Duplicate
Additional Intersect?

1000534854

<tbody>
</tbody>

272342

<tbody>
</tbody>

192919

<tbody>
</tbody>

10005348542723419291

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

1000534854

<tbody>
</tbody>
Y
Y
Y
N
Y

<tbody>
</tbody>
 
Upvote 0
Hi,

I've just retried =IF(COUNTIFS(A$2:A2,A1,B$2:B2,B1,C$2:C2,C2)>1,"Y","N") and for some reason it is now detecting whether or not it is a duplicate however it appears that it is not sensing whether the grid references are unique or not. 1st part correct now just need to fine tune it :)


Just a few examples where its now going wrong.

Reason why 210182300 should be Y is because it is a duplicate and has a unique grid reference.


IDPos_XPos_YConLookupPon LookupCheckedIntersectCompleted Entry?DuplicateAdditional Intersect?
363631360

<tbody>
</tbody><colgroup><col></colgroup>
255239

<tbody>
</tbody><colgroup><col></colgroup>
133365

<tbody>
</tbody><colgroup><col></colgroup>
3636313602552313336

<tbody>
</tbody><colgroup><col></colgroup>
#N/A

<tbody>
</tbody><colgroup><col></colgroup>
1000546862

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
363631360

<tbody>
</tbody><colgroup><col></colgroup>
255239

<tbody>
</tbody><colgroup><col></colgroup>
133365

<tbody>
</tbody><colgroup><col></colgroup>
3636313602552313336

<tbody>
</tbody><colgroup><col></colgroup>
#N/A

<tbody>
</tbody><colgroup><col></colgroup>
1000546862

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y (SHOULD BE N)

<tbody>
</tbody><colgroup><col></colgroup>
210182300

<tbody>
</tbody><colgroup><col></colgroup>
242677

<tbody>
</tbody><colgroup><col></colgroup>
205715

<tbody>
</tbody><colgroup><col></colgroup>
2101823002426720571

<tbody>
</tbody><colgroup><col></colgroup>
#N/A

<tbody>
</tbody><colgroup><col></colgroup>
210182300

<tbody>
</tbody><colgroup><col></colgroup>
YYYYN (SHOULD BE Y)

<tbody>
</tbody>
 
Upvote 0
Shouldn't the formula be?

=IF(COUNTIFS(A$2:A2,A2,B$2:B2,B2,C$2:C2,C2)>1,"Y","N")

Rather than posting some isolated snippets from your data please create some which clearly demonstrates the results you are expecting.
 
Upvote 0
Sorry, my excel seems to have the mind of its own. Thought I had copied that formula correctly (even compared them in notepad...)

One example that shows the result I should be expecting:

IDPos_XPos_YIntersect?Duplicate?Additional Intersect?
210182300

<tbody>
</tbody><colgroup><col></colgroup>
243130

<tbody>
</tbody><colgroup><col></colgroup>
205587

<tbody>
</tbody><colgroup><col></colgroup>
YN-
210182300

<tbody>
</tbody><colgroup><col></colgroup>
243130

<tbody>
</tbody><colgroup><col></colgroup>
205587

<tbody>
</tbody><colgroup><col></colgroup>
YYN
210182300

<tbody>
</tbody><colgroup><col></colgroup>
242677

<tbody>
</tbody><colgroup><col></colgroup>
205715

<tbody>
</tbody><colgroup><col></colgroup>
YYY
210182300

<tbody>
</tbody><colgroup><col></colgroup>
243785

<tbody>
</tbody><colgroup><col></colgroup>

<tbody>
</tbody><colgroup><col></colgroup>
205306

<tbody>
</tbody><colgroup><col></colgroup>
YYY

<tbody>
</tbody>


Explained: The second row is the first time 210182300 appears so is not a duplicate. The third row is where an intersect appears however shares the same grid reference meaning it can be ignored. The following two have the same IDs however they both have unique grid references therefore they are additional intersects (because they also intersect).

Rules:

Same ID + Unique grid ref + Intersect is Y = Additional intersect is Y
Same ID + No unique grid ref + Intersect is Y = Additional intersect is N
Same ID + Intersect is N = Additional intersect is -
Same ID + Duplicate is N = Additional intersect is -

I have removed the columns that do not affect what I am trying to produce (and I aplogise for adding them in as looking back it over-complicates things here).

I shall try and sum up the process again: There are 21004 records of which there are duplicates (9144). I require to determine which ones out of the 9144 I need to deal with and which I can ignore.

I have revised the method to make it easier (I hope)

To be classified as a non-duplicate the following needs to be true:

  • Unique ID
  • if x amounts appear of IDs then the first is classified as non-duplicate

To be a duplicate the following needs to be true:

  • Same ID
  • if x amounts appear of IDs then the 2nd one onwards (2nd,3rd...) are all classified as duplicates.

To be an additional intersect the following needs to be true:

  • Record needs to be a duplicate
  • Grid ref needs to be unique
  • if x amounts of grid refs (needs to be different to the same ID classified as non-duplicate) appear then the first is classified as an additional intersect
  • Intersect = Y
 
Upvote 0
Maybe:


Excel 2010
ABCDEF
1IDPos_XPos_YIntersect?Duplicate?Additional Intersect?
2210182300243130205587YN 
3210182300243130205587YYN
4210182300242677205715YYY
5210182300243785205306YYY
Sheet1
Cell Formulas
RangeFormula
E2=IF(COUNTIF(A$2:A2,A1)>1,"Y","N")
F2=IF(E2="Y",IF(AND(D2="Y",COUNTIFS(A$2:A2,A2,B$2:B2,B2,C$2:C2,C2)>1),"N","Y"),"")
 
Upvote 0

Forum statistics

Threads
1,213,513
Messages
6,114,064
Members
448,545
Latest member
kj9

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top