TITLE
    Eliminating Duplicate Records (DB)
Article ID:
Created:
Modified:
55339
6/13/93
3/11/98

TOPIC

    Using a spreadsheet document, it is possible to automatically identify duplicate records in a ClarisWorks database.

    This information was provided by Claris Corporation on 16 March 1998, and incorporated into Apple Computer's Tech Info Library.


DISCUSSION

    Although ClarisWorks has a sophisticated database environment, there are certain useful features usually found only in dedicated database programs like FileMaker Pro. One of these useful features is the ability to identify duplicate records.

    However, ClarisWorks has a significant advantage because of its modular document structure: you can look at database data in the spreadsheet environment and make direct comparisons between records, which are treated as separate rows in the spreadsheet. This method allows you to find and eliminate duplicates from your ClarisWorks database.

    A strong understanding of ClarisWorks database and spreadsheet creation and usage is necessary to implement this article.

    1. From a copy of your file choose "Show all Records" from the Organize menu. Sort the database by the field which has the duplicate values. (If you are searching for duplicate addresses, for instance, you would sort by Address.)

    2. Define a the following field in your database:

    DUPLICATECHECK (Number)

    3. Choose Save As... from the Edit menu. From the Save as... dropdown menu choose DIF. Name the file DATA.DIF. Click the Save button. You have now saved a data file for use later. Now Delete all of the records in the copy of your file by using the Clear command in the Edit menu.


    4. Use the Open command to create a new spreadsheet document using the DATA.DIF file just created.

    5. In the Spreadsheet, click in row 2 of the DuplicateCheck column and enter the following formula:

    =IF(AND(A2=A3,A1<>A2),1,IF(AND(A2<>A3,A2<>A1),2,0))

    6. Use the Fill Down command to fill the formula down for as many records as you had in your database.

    7. Choose Select All from the Edit menu to select your entire spreadsheet then choose Copy from the Edit menu.

    8. Switch back to the empty DB document copy (use the View menu) and choose Paste from the Edit menu.

    9. Use the Find command to find all zeros in the DuplicateCheck field.

    10. Choose Clear from the edit menu. Your duplicates will be gone.

    Explanation:
    To identify duplicates, you need to be able to directly relate information between two records in your database. That is not possible in the Database document itself, but we can use a spreadsheet formula to perform that comparison for us, because any cell in the spreadsheet can be referenced from any other cell.


    In this case, you export the information in the DIF format so that you can see the field names below the column titles. The formula is designed to compare the entries in column A for unique entries: if you need to compare a different field, you can substitute that column for column A in the formula. If you need to compare more than one field, you can use the Concat function with the previous formula in the following manner:

    IF(AND(Concat(A2,B2)=Concat(A3,B3), Concat(A1,B1)<>Concat(A2,B2)),1, IF(AND(Concat(A2,B2)<>Concat(A3,B3),Concat(A2,B2)<>Concat(A1,B1)),2,0))

    Both formulas always result in either the value 0, 1 or 2. The number 2 is used for entries in your database that are already unique. The number 1 is used for entries in your database which are the first in a series of entries (the first duplicate in a group of duplicates). The number 0 is used to identify the second and all subsequent duplicates in a group of duplicates.


Document Information
Product Area: Apple Software
Category: Claris Software
Sub Category: ClarisWorks
Keywords: kclaris

Copyright © 2000 Apple Computer, Inc. All rights reserved.