[Home]Google Patent On Detecting Duplicate Files - Conclusions

Contents | (Visit Preferences to set your user name.) | Related To Google Patent On Detecting Duplicate Files - Conclusions | RecentChanges | Preferences | Index | Login | Logout

Featured: Featured Stories | Picture Gallery

Nancy Butcher | Elbow
Google
Chat11.com Web Bible11.com MyBibleCenter.com

Cover of ISBN 0596004478 Cover of ISBN 0764567586 Cover of ISBN 0072231742 Cover of ISBN 091096551X

Links:

Google Patents: Conclusions For Detecting duplicate and near-duplicate files

Subjects > Computers (Search for Computers) > Internet (Search for Internet) > Search Engine News (Search for Search Engine)

Back to Google Patent On Detecting Duplicate And Near-Duplicate Files

United States Patent 6,658,423

See also:

Conclusions

As can be appreciated from the foregoing, improved near-duplicate detection techniques are disclosed. These near-duplicate detection techniques are robust, and reduce processing and storage requirements. Such reduced processing and storage requirements is particularly important when processing large document collections.

The near-duplicate detection techniques have a number of important practical applications. In the context of a search engine for example, these techniques can be used during a crawling operation to speed-up the crawling and to save bandwidth by not crawling near-duplicate Web pages or sites, as determined from documents uncovered in a previous crawl. Further, by reducing the number of Web pages or sites crawled, these techniques can be used to reduce storage requirements of a repository, and therefore, other downstream stored data structures. These techniques can instead be used later, in response to a query, in which case a user is not annoyed with near-duplicate search results. These techniques may also he used to "fix" broken links. That is, if a document (e.g., a Web page) doesn't exist (at a particular location or URL) anymore, a link to a near-duplicate page can be provided.


http://images.amazon.com/images/P/B0001XQNSE.01-A1KDZ23Y0QWKQ3.MZZZZZZZ.jpg



Contents | (Visit Preferences to set your user name.) | Related To Google Patent On Detecting Duplicate Files - Conclusions | RecentChanges | Preferences | Index | Login | Logout
Edit this www.chat11.com page | View other versions
Last edited April 8, 2007 3:01 am (diff)
Search:
Sign up for PayPal and start accepting credit card payments
instantly.
Bobsgear - Get A Free Enterrpise Wiki Space!
Review: The Bobsgear Project was started to develop a variety of Confluence plugins. This installation of the Confluence Enterprise wiki includes flexible attachments, many Confluence plugins, personal blogs, interesting articles, and more. Bobsgear already has spaces related to politics, art and photography wiki, technical issues wiki, ediscovery wiki, health, Christian theology and Sabbath School wiki, the bible, book reviews, and quotations. Bobsgear allows free signup, and invites anyone to create a free hosted Confluence wiki space.


NEW USERS CLICK HERE! for a quick introduction to Wiki.

 

 Interested in Directory/Science?
961 total hits since 3/2007
Recently accessed pages: After System Update, Internet Explorer Opens With A Blank Page Avatar091003 Caller ID Technical FAQ Contents Dogs - Welsh Terrier Groucho Marx HomePages/Magic11221959 HowToImportFromOtherWiki Microsoft Office 2003 Parent Teacher Edition EULA Microsoft Virtual PC Description Minispread Modeling Wax NcFTP OneNote Paul Harvey Praises Ellen G. White Nutritional Advice Saint Petersburg Russian Missionary Work Work By Doctor John Elloway Sony Vaio Media Software End User License Agreement Stretching FAQ B.3 - Groin And Inner-Thigh Stretch The Two Paths To Manhood And Womanhood VisitorVille

Elapsed:0