Deduplicate Results

The Copac database brings together records from different libraries and where possible merges records for the same item together (with the exception of pre-1800 material). However due to variations or errors in catalogue records from the many data sources used to create Copac, there can still be duplicate records for the same item, where even records with the same ISBN may be listed separately. You may therefore choose to select a deduplication option for your results to merge additional records, with the understanding that this is likely to introduce varying levels of misconsolidation.

  • You can choose to have the results deduplicated by ISBN, or one of three other multi-field options, explained below
  • Deduplication within Copac errs on the side of caution, resulting in the presence of duplicate records
  • Deduplication of CCM Tools search results will remove some additional duplication of records, but needs to be used with consideration for what you wish the results to show; even deduplication by ISBN will sometimes cause records for different items to be brought together i.e. misconsolidation
  • The presence of duplicates will mean documents may appear rarer than is actually the case; but this does allow erring on the side of caution

ISBN deduplication

The ISBN deduplication merges records sharing the same ISBN. Errors and variations in the records can mean this results in some misconsolidations.

Multi-field deduplication

There are three levels of multi-field deduplication offering increasingly broad matching of records. The likelihood of records matching incorrectly will increase with each level; but cataloguing variations also mean the more stringent matches will retain greater numbers of duplicate records:

  • Level 1: uses Date, Title, Pagination, Edition, Author, Publisher
  • Level 2: uses Date, Title, Author, Publisher
  • Level 3: uses Title, Author

The fields are matched as follows:

  • Date: Monographs must have a date in common
  • Title: A title match is required. A successful title match permits one insertion, deletion or change of character for every twenty characters in the title
  • Pagination. If there is pagination information then this must match
  • Edition. If there is edition information then this must match
  • Author: If there is an author this must match. If an author match is carried out we use a fuzzy match of corporate authors.
  • Publisher: A fuzzy match on publisher is required.

If you have questions about the deduplication procedure contact: help@jisc.ac.uk, mentioning CCM Tools in the subject line of your email. If you would like to explore the deduplication options, we have a worksheet available which demonstrates the options.