UPDF AI

On the resemblance and containment of documents

A. Broder

1997 · DOI: 10.1109/SEQUEN.1997.666900
2,265 citations

TLDR

The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that could be done independently for each document.