000 04626nam a2200649 i 4500
001 6813539
003 IEEE
005 20200413152908.0
006 m eo d
007 cr cn |||m|||a
008 121210s2012 caua foab 000 0 eng d
020 _a9781608450893 (electronic bk.)
020 _z9781608450886 (pbk.)
024 7 _a10.2200/S00444ED1V01Y201208ICR024
_2doi
035 _a(CaBNVSL)swl00401758
035 _a(OCoLC)819423537
040 _aCaBNVSL
_cCaBNVSL
_dCaBNVSL
050 4 _aTK5105.884
_b.M256 2012
082 0 4 _a025.04
_223
100 1 _aManasse, Mark S.
245 1 0 _aOn the efficient determination of most near neighbors
_h[electronic resource] :
_bhorseshoes, hand grenades, Web search, and other situations when close is close enough /
_cMark S. Manasse.
260 _aSan Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) :
_bMorgan & Claypool,
_cc2012.
300 _a1 electronic text (xv, 72 p.) :
_bill., digital file.
490 1 _aSynthesis lectures on information concepts, retrieval, and services,
_x1947-9468 ;
_v# 24
538 _aMode of access: World Wide Web.
538 _aSystem requirements: Adobe Acrobat Reader.
500 _aPart of: Synthesis digital library of engineering and computer science.
500 _aSeries from website.
504 _aIncludes bibliographical references (p. 69-70).
505 0 _a1. Introduction -- 1.1 On similarity, resemblance, look-alikes, and entity resolution -- 1.2 You must know at least this much math to read this book -- 1.3 Cumulative distribution and probability density functions --
505 8 _a2. Comparing web pages for similarity: an overview -- 2.1 Choosing the features of a web page to compare -- 2.2 Turning features into integers (Rabin Hashing) -- 2.3 How should we measure the proximity of features? -- 2.4 Feature reduction -- 2.5 Putting it together with supershingling --
505 8 _a3. A personal history of web search -- 3.1 Complexity issues and implementation -- 3.2 Implementing duplicate suppression -- 3.3 Rabin Hashing revisited --
505 8 _a4. Uniform sampling after Alta Vista -- 4.1 Using less randomness, to improve sampling efficiency -- 4.2 Conjectures vs. theorems -- 4.3 Finding the first point of divergence efficiently -- 4.4 Uniform consistent sampling summarized --
505 8 _a5. Why weight (and how)? -- 5.1 Constant expected time weighted consistent sampling -- 5.2 Constant time weighted consistent sampling -- 5.3 Accelerating weighted sampling --
505 8 _a6. A few applications -- 6.1 Web deduplication -- 6.2 File systems: winnowing and friends -- 6.3 Further applications --
505 8 _aAfterword -- Bibliography -- Author's biography.
506 1 _aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510 0 _aCompendex
510 0 _aINSPEC
510 0 _aGoogle scholar
510 0 _aGoogle book search
520 3 _aThe time-worn aphorism "close only counts in horseshoes and hand-grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This lecture is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages--and a few other situations in which we have found that inexact matching is good enough; where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors.We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested.
530 _aAlso available in print.
588 _aTitle from PDF t.p. (viewed on December 10, 2012).
650 0 _aInternet searching.
650 0 _aNearest neighbor analysis (Statistics)
650 0 _aStatistical matching.
653 _anearest neighbor
653 _asearch algorithms
653 _ainformation retrieval
653 _aIR
653 _amulti-dimensional
776 0 8 _iPrint version:
_z9781608450886
830 0 _aSynthesis digital library of engineering and computer science.
830 0 _aSynthesis lectures on information concepts, retrieval, and services ;
_v# 24.
_x1947-9468
856 4 2 _3Abstract with links to resource
_uhttp://ieeexplore.ieee.org/servlet/opac?bknumber=6813539
999 _c561951
_d561951