Identifying and Filtering Near-Duplicate Documents
Identifying and Filtering Near-Duplicate Documents
A. Broder
2000 · DOI: 10.1007/3-540-45123-4_1
Annual Symposium on Combinatorial Pattern Matching · 473 Citations
TLDR
The algorithm for filtering near-duplicate documents discussed here has been successfully implemented and has been used for the last three years in the context of the AltaVista search engine.
