Locality Sensitive Hashing


Contents:

  1. Exact duplicates and near-duplicates
  2. Duplicate detection: naive approach
  3. Detecting duplicates by hashing
  4. Adler32 hashcode
  5. Clarification
  6. Error rates for exact duplicate detection
  7. Properties of conventional hashcodes
  8. Locality-sensitive hashing: the idea
  9. Locality-sensitive hashing: how it works
  10. False positive and negative errors of LSH
  11. Hash-code length and number of hashtables
  12. Simhash algorithm
  13. Clarification