Welcome to P K Kelkar Library, Online Public Access Catalogue (OPAC)

Normal view MARC view ISBD view

On the efficient determination of most near neighbors : horseshoes, hand grenades, Web search, and other situations when close is close enough /

By: Manasse, Mark S.
Material type: materialTypeLabelBookSeries: Synthesis digital library of engineering and computer science: ; Synthesis lectures on information concepts, retrieval, and services: # 24.Publisher: San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, c2012Description: 1 electronic text (xv, 72 p.) : ill., digital file.ISBN: 9781608450893 (electronic bk.).Subject(s): Internet searching | Nearest neighbor analysis (Statistics) | Statistical matching | nearest neighbor | search algorithms | information retrieval | IR | multi-dimensionalDDC classification: 025.04 Online resources: Abstract with links to resource Also available in print.
Contents:
1. Introduction -- 1.1 On similarity, resemblance, look-alikes, and entity resolution -- 1.2 You must know at least this much math to read this book -- 1.3 Cumulative distribution and probability density functions --
2. Comparing web pages for similarity: an overview -- 2.1 Choosing the features of a web page to compare -- 2.2 Turning features into integers (Rabin Hashing) -- 2.3 How should we measure the proximity of features? -- 2.4 Feature reduction -- 2.5 Putting it together with supershingling --
3. A personal history of web search -- 3.1 Complexity issues and implementation -- 3.2 Implementing duplicate suppression -- 3.3 Rabin Hashing revisited --
4. Uniform sampling after Alta Vista -- 4.1 Using less randomness, to improve sampling efficiency -- 4.2 Conjectures vs. theorems -- 4.3 Finding the first point of divergence efficiently -- 4.4 Uniform consistent sampling summarized --
5. Why weight (and how)? -- 5.1 Constant expected time weighted consistent sampling -- 5.2 Constant time weighted consistent sampling -- 5.3 Accelerating weighted sampling --
6. A few applications -- 6.1 Web deduplication -- 6.2 File systems: winnowing and friends -- 6.3 Further applications --
Afterword -- Bibliography -- Author's biography.
Abstract: The time-worn aphorism "close only counts in horseshoes and hand-grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This lecture is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages--and a few other situations in which we have found that inexact matching is good enough; where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors.We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested.
    average rating: 0.0 (0 votes)
Item type Current location Call number Status Date due Barcode Item holds
E books E books PK Kelkar Library, IIT Kanpur
Available EBKE451
Total holds: 0

Mode of access: World Wide Web.

System requirements: Adobe Acrobat Reader.

Part of: Synthesis digital library of engineering and computer science.

Series from website.

Includes bibliographical references (p. 69-70).

1. Introduction -- 1.1 On similarity, resemblance, look-alikes, and entity resolution -- 1.2 You must know at least this much math to read this book -- 1.3 Cumulative distribution and probability density functions --

2. Comparing web pages for similarity: an overview -- 2.1 Choosing the features of a web page to compare -- 2.2 Turning features into integers (Rabin Hashing) -- 2.3 How should we measure the proximity of features? -- 2.4 Feature reduction -- 2.5 Putting it together with supershingling --

3. A personal history of web search -- 3.1 Complexity issues and implementation -- 3.2 Implementing duplicate suppression -- 3.3 Rabin Hashing revisited --

4. Uniform sampling after Alta Vista -- 4.1 Using less randomness, to improve sampling efficiency -- 4.2 Conjectures vs. theorems -- 4.3 Finding the first point of divergence efficiently -- 4.4 Uniform consistent sampling summarized --

5. Why weight (and how)? -- 5.1 Constant expected time weighted consistent sampling -- 5.2 Constant time weighted consistent sampling -- 5.3 Accelerating weighted sampling --

6. A few applications -- 6.1 Web deduplication -- 6.2 File systems: winnowing and friends -- 6.3 Further applications --

Afterword -- Bibliography -- Author's biography.

Abstract freely available; full-text restricted to subscribers or individual document purchasers.

Compendex

INSPEC

Google scholar

Google book search

The time-worn aphorism "close only counts in horseshoes and hand-grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This lecture is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages--and a few other situations in which we have found that inexact matching is good enough; where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors.We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested.

Also available in print.

Title from PDF t.p. (viewed on December 10, 2012).

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha