MARC View

000			05427nam a2200697 i 4500
001			7302713
003			IEEE
005			20200413152918.0
006			m eo d
007			cr cn \|\|\|m\|\|\|a
008			150917s2015 cau foab 000 0 eng d
020			_a9781627058094 _qebook
020			_z9781627058087 _qprint
024	7		_a10.2200/S00661ED1V01Y201508ICR044 _2doi
035			_a(CaBNVSL)swl00405557
035			_a(OCoLC)921518060
040			_aCaBNVSL _beng _erda _cCaBNVSL _dCaBNVSL
050		4	_aTK5105.884 _b.M256 2015
082	0	4	_a025.04 _223
100	1		_aManasse, Mark S., _eauthor.
245	1	0	_aOn the efficient determination of most near neighbors : _bhorseshoes, hand grenades, Web search, and other situations when close is close enough / _cMark S. Manasse.
250			_aSecond edition.
264		1	_aSan Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) : _bMorgan & Claypool, _c2015.
300			_a1 PDF (xix, 80 pages)
336			_atext _2rdacontent
337			_aelectronic _2isbdmedia
338			_aonline resource _2rdacarrier
490	1		_aSynthesis lectures on information concepts, retrieval, and services, _x1947-9468 ; _v# 44
538			_aMode of access: World Wide Web.
538			_aSystem requirements: Adobe Acrobat Reader.
500			_aPart of: Synthesis digital library of engineering and computer science.
504			_aIncludes bibliographical references (pages 75-77).
505	0		_a1. Introduction -- 1.1 On similarity, resemblance, look-alikes, and entity resolution -- 1.2 You must know at least this much math to read this book -- 1.3 Cumulative distribution and probability density functions --
505	8		_a2. Comparing web pages for similarity: an overview -- 2.1 Choosing the features of a web page to compare -- 2.2 Turning features into integers (Rabin hashing) -- 2.3 How should we measure the proximity of features? -- 2.4 Feature reduction -- 2.5 Putting it together with supershingling --
505	8		_a3. A personal history of web search -- 3.1 Complexity issues and implementation -- 3.2 Implementing duplicate suppression -- 3.3 Rabin hashing revisited --
505	8		_a4. Uniform sampling after Alta Vista -- 4.1 Using less randomness to improve sampling efficiency -- 4.2 Conjectures vs. theorems -- 4.3 Finding the first point of divergence efficiently -- 4.4 Uniform consistent sampling summarized --
505	8		_a5. Why weight (and how)? -- 5.1 Constant expected-time consistent weighted sampling -- 5.2 Constant time consistent weighted sampling -- 5.3 Accelerating weighted sampling --
505	8		_a6. A few applications -- 6.1 Web deduplication -- 6.2 File systems: winnowing and friends -- 6.3 Further applications --
505	8		_a7. Forks in the road: Flajolet and slightly biased sampling -- 7.1 Flajolet-Martin -- 7.2 Li's rediscovery -- 7.3 Approximation by randomized rounding -- 7.4 Scaling --
505	8		_aAfterword -- Bibliography -- Author's biography.
506	1		_aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510	0		_aCompendex
510	0		_aINSPEC
510	0		_aGoogle scholar
510	0		_aGoogle book search
520	3		_aThe time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages, and a few other situations in which we have found that inexact matching is good enough - where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.
530			_aAlso available in print.
588			_aTitle from PDF title page (viewed on September 17, 2015).
650		0	_aInternet searching.
650		0	_aNearest neighbor analysis (Statistics)
650		0	_aStatistical matching.
653			_anearest neighbor
653			_asearch algorithms
653			_ainformation retrieval
653			_aIR
653			_amulti-dimensional
776	0	8	_iPrint version: _z9781627058087
830		0	_aSynthesis digital library of engineering and computer science.
830		0	_aSynthesis lectures on information concepts, retrieval, and services ; _v# 44. _x1947-9468
856	4	2	_3Abstract with links to resource _uhttp://ieeexplore.ieee.org/servlet/opac?bknumber=7302713
999			_c562157 _d562157