Google Patent On Detecting Duplicate Files - Conclusions|
Links:
|
Subjects > Computers > Internet > Search Engine News
Back to Google Patent On Detecting Duplicate And Near-Duplicate Files
United States Patent 6,658,423
See also:
As can be appreciated from the foregoing, improved near-duplicate detection techniques are disclosed. These near-duplicate detection techniques are robust, and reduce processing and storage requirements. Such reduced processing and storage requirements is particularly important when processing large document collections.
The near-duplicate detection techniques have a number of important practical applications. In the context of a search engine for example, these techniques can be used during a crawling operation to speed-up the crawling and to save bandwidth by not crawling near-duplicate Web pages or sites, as determined from documents uncovered in a previous crawl. Further, by reducing the number of Web pages or sites crawled, these techniques can be used to reduce storage requirements of a repository, and therefore, other downstream stored data structures. These techniques can instead be used later, in response to a query, in which case a user is not annoyed with near-duplicate search results. These techniques may also he used to "fix" broken links. That is, if a document (e.g., a Web page) doesn't exist (at a particular location or URL) anymore, a link to a near-duplicate page can be provided.
|
http://images.amazon.com/images/P/B0001XQNSE.01-A1KDZ23Y0QWKQ3.MZZZZZZZ.jpg
|
Search for books about:
|
Interested in HSI PSYC 365-7691- Theories Of Personality?