Google Patent On Detecting Duplicate And Near-Duplicate Files|
Links:
|
Subjects > Computers > Internet > Search Engine News
United States Patent 6,658,423
Pugh , et al. December 2, 2003
Abstract
Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.
Assignee: Google, Inc. (Mountain View, CA) Appl. No.: 768947 Filed: January 24, 2001
Current U.S. Class: 707/102; 707/3 Intern'l Class: G06F 017/30; G06F 007/00 Field of Search: 707/1,4,102,203,3,6,103
See also:
I've finally gotten permission from Google to talk about this work, so I've put up [a web page with more information about this patent].
|
http://images.amazon.com/images/P/B0001XQNSE.01-A1KDZ23Y0QWKQ3.MZZZZZZZ.jpg
|
Search for books about:
|
Interested in Shared Source License For FlexWiki?