[Home]Google Patent On Detecting Duplicate Files - Conclusions

Contents | (Visit Preferences to set your user name.) | Related To Google Patent On Detecting Duplicate Files - Conclusions | RecentChanges | Preferences | Index | Login | Logout

Featured: Featured Stories | Picture Gallery

Garden | Ottoman Empire
Google
Chat11.com Web Bible11.com MyBibleCenter.com
Search For Books About:
Computers, Engine News, Internet
Search The Net:
Computers
Internet
Engine News

Cover of ISBN 0596004478 Cover of ISBN 0764567586 Cover of ISBN 0072231742 Cover of ISBN 091096551X

Links:

Google Patents: Conclusions For Detecting duplicate and near-duplicate files

Subjects > Computers > Internet > Search Engine News

Back to Google Patent On Detecting Duplicate And Near-Duplicate Files

United States Patent 6,658,423

See also:

Conclusions

As can be appreciated from the foregoing, improved near-duplicate detection techniques are disclosed. These near-duplicate detection techniques are robust, and reduce processing and storage requirements. Such reduced processing and storage requirements is particularly important when processing large document collections.

The near-duplicate detection techniques have a number of important practical applications. In the context of a search engine for example, these techniques can be used during a crawling operation to speed-up the crawling and to save bandwidth by not crawling near-duplicate Web pages or sites, as determined from documents uncovered in a previous crawl. Further, by reducing the number of Web pages or sites crawled, these techniques can be used to reduce storage requirements of a repository, and therefore, other downstream stored data structures. These techniques can instead be used later, in response to a query, in which case a user is not annoyed with near-duplicate search results. These techniques may also he used to "fix" broken links. That is, if a document (e.g., a Web page) doesn't exist (at a particular location or URL) anymore, a link to a near-duplicate page can be provided.


http://images.amazon.com/images/P/B0001XQNSE.01-A1KDZ23Y0QWKQ3.MZZZZZZZ.jpg


Search for books about:

Computers, Engine News, Internet

Search The Net:
Computers
Internet
Engine News

Contents | (Visit Preferences to set your user name.) | Related To Google Patent On Detecting Duplicate Files - Conclusions | RecentChanges | Preferences | Index | Login | Logout
Edit this www.chat11.com page | View other versions
Last edited April 8, 2007 3:01 am (diff)
Search:
Sign up for PayPal and start accepting credit card payments
instantly.
Bobsgear - Get A Free Enterrpise Wiki Space!
Review: The Bobsgear Project was started to develop a variety of Confluence plugins. This installation of the Confluence Enterprise wiki includes flexible attachments, many Confluence plugins, personal blogs, interesting articles, and more. Bobsgear already has spaces related to politics, art and photography wiki, technical issues wiki, ediscovery wiki, health, Christian theology and Sabbath School wiki, the bible, book reviews, and quotations. Bobsgear allows free signup, and invites anyone to create a free hosted Confluence wiki space.


NEW USERS CLICK HERE! for a quick introduction to Wiki.

 

 Interested in HSI PSYC 365-7691- Theories Of Personality?
356 total hits since 3/2007
Recently accessed pages: Affiliate Programs Aljazeer Being Understood In A Foreign Language Bone Marrow And Cord Blood Donor Registry Cicada Net Contents DirectX Eugenic Courtship HomePages/KanijFatema Negative Calorie Foods Ophthalmic Migraine Outlook Timeout Receiving Pop3 Email Sculpey III Sony AR VGN28GP Stretching FAQ/Table Of Contents Terrorism What About Amusements In A Hospital Or Health Sanitarium

Elapsed:1