Excel User Group
Microsoft Excel blogs, forums, files. Read, ask questions, provide answers.

Finding unique and common records in two different data sets

Latest post Tue, Apr 21 2009 11:07 AM by NoHelp. 6 replies.
  • Wed, Apr 15 2009 7:04 PM

    • NoHelp
    • Top 200 Contributor
    • Joined on Wed, Apr 15 2009
    • Posts 5
    • Points 57

    Finding unique and common records in two different data sets

    What is the easiest way to compare two sets of data and identify both (1) the records that appear in both data sets, and (2) those that are uniquely in each.

    Example:  Assume that I have 2 different spreadsheets with address information which I believe have a large number of entries in common.  I need to complie 3 sublists:  Addresses that appear in both spreadsheets, addresses that appear only in spreadsheet "A" and those that appear only in spreadsheet "B".

    I can think of some manual ways of doing this, but I have the sense that I am re-inventing the wheel here.  Since this task will have to be done repeatedly, I would be willing to spend some time writing functions to accomplish this.

    Any help or pointers would be appreciated.  (And bonus points if the solution works in excel 2003 and earlier).

     

    Thank you in advance.

    • Post Points: 21
  • Thu, Apr 16 2009 3:10 AM In reply to

    • Nick Hodge
    • Top 10 Contributor
    • Joined on Sun, Dec 23 2007
    • Southampton
    • Posts 543
    • Points 9,287
    • MVP

    Re: Finding unique and common records in two different data sets

    It depends what you mean by 'manual' and 'Spreadsheets' (I will presume they are in the same workbook and called Sheet1 and Sheet2) You could certainly achieve this in fairly short time using helper columns, so we will presume your addresses (in both sheets) are in Columns A to E (the rows doesn't really matter, but we will presume the actual data starts in row 2 and finishes in row 100 on both sheets).

    Starting with Sheet1

    In F2 type

    =A2&B2&C2&D2&E2

    Do the same in Sheet2

    This will concatenate the addresses into one long string making it easier to test for uniqueness

    Now in G2 in Sheet1 type

    =IF(ISNA(VLOOKUP(F2,Sheet2!$F$2:$F$100,1,FALSE)),"","x")

    Do the same in Sheet2, (replacing the Sheet2! reference with Sheet1)

    This will place 'x' in column G for every address in Sheet1 that also appears in Sheet2 and vice versa.

    Now, on both sheets copy the data in Column G and then 'Edit>Paste Special...+Values' over the same data to kill the formula

    If you now sort by column G on sheet1 all the 'x' entries will appear together. Copy these to a new list (duplicates on both lists) and delete all the rows with 'x' on both sheets and the remaining data on Sheet1 are unique to that sheet and the data on Sheet2 unique to that.

    Just in case your next question is duplicates within the same list, you should use a COUNTIF function to identify these or maybe look at this page.

    http://www.cpearson.com/Excel/Duplicates.aspx

     

    Regards
    Nick Hodge
    Microsoft MVP, Excel
    Southampton, UK

    • Post Points: 21
  • Thu, Apr 16 2009 6:05 AM In reply to

    Re: Finding unique and common records in two different data sets

    When building keys in helper columns, I always add a seperator to prevent errors.

    For example 101&1 will give the same as 10&11 (1011)

    but 101&"-"&1 will differ from 10&"-"&11 (101-1 and 10-11)

    Not a very common occurence, but it has caused issues in the past...

    • Post Points: 21
  • Thu, Apr 16 2009 10:44 PM In reply to

    • Jim Cone
    • Top 10 Contributor
    • Joined on Tue, Jan 15 2008
    • Portland, Oregon - USA
    • Posts 233
    • Points 3,320

    Re: Finding unique and common records in two different data sets

    Simon,
    Very good point.  I made some code changes because of it. 
    Thanks,
    Jim Cone

    • Post Points: 21
  • Fri, Apr 17 2009 6:53 AM In reply to

    • NoHelp
    • Top 200 Contributor
    • Joined on Wed, Apr 15 2009
    • Posts 5
    • Points 57

    Re: Finding unique and common records in two different data sets

    Thank you to all of you.

    I assume that to use the VLOOKUP function, I have to sort both sheets based on the concatenated keys first, correct?

    Is there a way to automatically determine the last (non-empty) row of the data and use that in a formula to make this easier for an excel novice to use once I write it?

     

    Thanks

    • Post Points: 21
  • Fri, Apr 17 2009 9:22 AM In reply to

    • Nick Hodge
    • Top 10 Contributor
    • Joined on Sun, Dec 23 2007
    • Southampton
    • Posts 543
    • Points 9,287
    • MVP

    Re: Finding unique and common records in two different data sets

    As you are looking for exact duplicates we have use the FALSE final parameter, therefore the data does not need to be sorted. You only need to sort when using the TRUE parameter to find either the next highest or next lowest determined by sort order.

    To make the range dynamic you could use an OFFSET(COUNTA(...)) function as your lookup range (or name a range using the function and use that in your function). Follow the instructions here.

    http://www.contextures.com/xlNames01.html#Dynamic

    Regards
    Nick Hodge
    Microsoft MVP, Excel
    Southampton, UK

    • Post Points: 21
  • Tue, Apr 21 2009 11:07 AM In reply to

    • NoHelp
    • Top 200 Contributor
    • Joined on Wed, Apr 15 2009
    • Posts 5
    • Points 57

    Re: Finding unique and common records in two different data sets

    Thanks to all of you.  This worked as advertised, and I appreciate the help.  I think I need to play with the "dynamic range" feature a little more before I try to automate this task, but it really wasn't that hard once you showed me the way.  While I have used VLOOKUP regularly, I had not used it with the "FALSE" value set for an unsorted list--so I learned something useful in the process.

     

    Thanks to everyone who posted.

     

     

    • Post Points: 5
Page 1 of 1 (7 items) | RSS
Copyright Excel User Group and the relevant contributors, 2010. All Trade Marks acknowledged. This site is a peer-to-peer site and NOT affiliated in any way to Microsoft Corp. All rights reserved.