Ways to check the differences between references in two bib files?

1,076

JabRef has a 'detect duplicate' feature: http://help.jabref.org/en/FindDuplicates

The results vary. It detects the duplicate fine for the following two entries

@Article{Sigfridsson1998,
  author  = {Emma Sigfridsson and Ulf Ryde},
  title   = {Comparison of methods for deriving atomic charges from the electrostatic potential and moments},
  journal = {Journal of Computational Chemistry},
  year    = {1998},
  volume  = {19},
  number  = {4},
  pages   = {377--395},
}

@Article{SigfridssonRyde,
  author  = {Emma Sigfridsson and Ulf Ryde},
  title   = {Comparison of Methods for Deriving Atomic Charges from the Electrostatic Potential and Moments},
  journal = {J. Comput. Chem.},
  year    = {1998},
  volume  = {19},
  number  = {4},
  pages   = {377--395},
  doi     = {10.1002/(sici)1096-987x(199803)19:4<377::aid-jcc1>3.0.co;2-p},
}

But if there are more differences between the entries (remove number in of the two for example) it will not realise the two are duplicates any more.

Duplicate detection is quite a hard task: Obviously you don't want to create too many false positives, while at the same time you want to find less obvious duplication such as typos, abbreviations, ... If you have a more robust idea how this should work I'm sure the JabRef developers would not mind a feature request (or even better a pull request): https://github.com/JabRef/jabref/issues/

Other tools are mentioned in Cleaning up a .bib file, Find Duplicated article titles in my .bib file and Find and match corresponding arXiv preprints and journal articles. Some of these try to retrieve information for an entry from an external source to detect duplicates. Other just rely on duplication of field contents or title comparison.

Share:
1,076

Related videos on Youtube

PHC_123
Author by

PHC_123

Updated on May 30, 2020

Comments

  • PHC_123
    PHC_123 over 3 years

    I am wondering if there is any existing ways to check if the same references in two different bib files has different years? For example, if in one file, the reference is in press, and then in the other it is 2017, and there should be a warning saying that the year is different?

    Thanks in advance

    • moewe
      moewe over 5 years
      JabRef can detect some duplicates, maybe that can help.
    • naphaneal
      naphaneal over 5 years
      on the CLI: diff ref1.bib ref2.bib | grep '<authoryear>' where <authoryear> is the identifier given for the used reference. don't know if this works on WIN CMD. or you could check the warning log, or see if your IDE offers diff in the menu bar.
    • moewe
      moewe over 5 years
      tex.stackexchange.com/q/76420/35864 and tex.stackexchange.com/q/300962/35864 explain several methods of different levels of sophistication to spot duplicate entries. See also tex.stackexchange.com/q/35334/35864
    • moewe
      moewe over 5 years
      Any news here? As it stands now the question is pretty much a duplicate of the questions I linked in my comment above. If there is no response here in due time, I will vote to close as a duplicate.
    • PHC_123
      PHC_123 over 5 years
      @meowe I think jafRef can't report the inconsistency of entries without working manually (but maybe I just don't know). But thank you for the links to the previous questions!
    • moewe
      moewe over 5 years
      I think I got it to work once when I tested it, but you are asking the programme to do a fuzzy comparison, so to avoid a flood of false-positives it might happen that a few positives are not flagged. I don't know about the algorithm they use, maybe you can ask the developers at github.com/JabRef/jabref/issues if there is a way to fine-tune that behaviour.
  • PHC_123
    PHC_123 over 5 years
    I noticed the function of JabRef, but as you stated it's sometimes not that sensitive to inconsistencies. And when it comes to other tools I tried, they simply really don't care about the contents or inconsistencies in the references, but only the keys (they throw away the contents, and only focus on the entry keys when they are manipulating the references)...I will ask them about the function, thank you for your advice!