000 -LEADER |
fixed length control field |
05427nam a2200697 i 4500 |
001 - CONTROL NUMBER |
control field |
7302713 |
003 - CONTROL NUMBER IDENTIFIER |
control field |
IEEE |
005 - DATE AND TIME OF LATEST TRANSACTION |
control field |
20200413152918.0 |
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS |
fixed length control field |
m eo d |
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION |
fixed length control field |
cr cn |||m|||a |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
fixed length control field |
150917s2015 cau foab 000 0 eng d |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
International Standard Book Number |
9781627058094 |
Qualifying information |
ebook |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
Canceled/invalid ISBN |
9781627058087 |
Qualifying information |
print |
024 7# - OTHER STANDARD IDENTIFIER |
Standard number or code |
10.2200/S00661ED1V01Y201508ICR044 |
Source of number or code |
doi |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(CaBNVSL)swl00405557 |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(OCoLC)921518060 |
040 ## - CATALOGING SOURCE |
Original cataloging agency |
CaBNVSL |
Language of cataloging |
eng |
Description conventions |
rda |
Transcribing agency |
CaBNVSL |
Modifying agency |
CaBNVSL |
050 #4 - LIBRARY OF CONGRESS CALL NUMBER |
Classification number |
TK5105.884 |
Item number |
.M256 2015 |
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER |
Classification number |
025.04 |
Edition number |
23 |
100 1# - MAIN ENTRY--PERSONAL NAME |
Personal name |
Manasse, Mark S., |
Relator term |
author. |
245 10 - TITLE STATEMENT |
Title |
On the efficient determination of most near neighbors : |
Remainder of title |
horseshoes, hand grenades, Web search, and other situations when close is close enough / |
Statement of responsibility, etc. |
Mark S. Manasse. |
250 ## - EDITION STATEMENT |
Edition statement |
Second edition. |
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE |
Place of production, publication, distribution, manufacture |
San Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) : |
Name of producer, publisher, distributor, manufacturer |
Morgan & Claypool, |
Date of production, publication, distribution, manufacture, or copyright notice |
2015. |
300 ## - PHYSICAL DESCRIPTION |
Extent |
1 PDF (xix, 80 pages) |
336 ## - CONTENT TYPE |
Content type term |
text |
Source |
rdacontent |
337 ## - MEDIA TYPE |
Media type term |
electronic |
Source |
isbdmedia |
338 ## - CARRIER TYPE |
Carrier type term |
online resource |
Source |
rdacarrier |
490 1# - SERIES STATEMENT |
Series statement |
Synthesis lectures on information concepts, retrieval, and services, |
International Standard Serial Number |
1947-9468 ; |
Volume/sequential designation |
# 44 |
538 ## - SYSTEM DETAILS NOTE |
System details note |
Mode of access: World Wide Web. |
538 ## - SYSTEM DETAILS NOTE |
System details note |
System requirements: Adobe Acrobat Reader. |
500 ## - GENERAL NOTE |
General note |
Part of: Synthesis digital library of engineering and computer science. |
504 ## - BIBLIOGRAPHY, ETC. NOTE |
Bibliography, etc. note |
Includes bibliographical references (pages 75-77). |
505 0# - FORMATTED CONTENTS NOTE |
Formatted contents note |
1. Introduction -- 1.1 On similarity, resemblance, look-alikes, and entity resolution -- 1.2 You must know at least this much math to read this book -- 1.3 Cumulative distribution and probability density functions -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
2. Comparing web pages for similarity: an overview -- 2.1 Choosing the features of a web page to compare -- 2.2 Turning features into integers (Rabin hashing) -- 2.3 How should we measure the proximity of features? -- 2.4 Feature reduction -- 2.5 Putting it together with supershingling -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
3. A personal history of web search -- 3.1 Complexity issues and implementation -- 3.2 Implementing duplicate suppression -- 3.3 Rabin hashing revisited -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
4. Uniform sampling after Alta Vista -- 4.1 Using less randomness to improve sampling efficiency -- 4.2 Conjectures vs. theorems -- 4.3 Finding the first point of divergence efficiently -- 4.4 Uniform consistent sampling summarized -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
5. Why weight (and how)? -- 5.1 Constant expected-time consistent weighted sampling -- 5.2 Constant time consistent weighted sampling -- 5.3 Accelerating weighted sampling -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
6. A few applications -- 6.1 Web deduplication -- 6.2 File systems: winnowing and friends -- 6.3 Further applications -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
7. Forks in the road: Flajolet and slightly biased sampling -- 7.1 Flajolet-Martin -- 7.2 Li's rediscovery -- 7.3 Approximation by randomized rounding -- 7.4 Scaling -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
Afterword -- Bibliography -- Author's biography. |
506 1# - RESTRICTIONS ON ACCESS NOTE |
Terms governing access |
Abstract freely available; full-text restricted to subscribers or individual document purchasers. |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Compendex |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
INSPEC |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Google scholar |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Google book search |
520 3# - SUMMARY, ETC. |
Summary, etc. |
The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages, and a few other situations in which we have found that inexact matching is good enough - where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear. |
530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE |
Additional physical form available note |
Also available in print. |
588 ## - SOURCE OF DESCRIPTION NOTE |
Source of description note |
Title from PDF title page (viewed on September 17, 2015). |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Internet searching. |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Nearest neighbor analysis (Statistics) |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Statistical matching. |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
nearest neighbor |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
search algorithms |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
information retrieval |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
IR |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
multi-dimensional |
776 08 - ADDITIONAL PHYSICAL FORM ENTRY |
Relationship information |
Print version: |
International Standard Book Number |
9781627058087 |
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE |
Uniform title |
Synthesis digital library of engineering and computer science. |
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE |
Uniform title |
Synthesis lectures on information concepts, retrieval, and services ; |
Volume/sequential designation |
# 44. |
International Standard Serial Number |
1947-9468 |
856 42 - ELECTRONIC LOCATION AND ACCESS |
Materials specified |
Abstract with links to resource |
Uniform Resource Identifier |
http://ieeexplore.ieee.org/servlet/opac?bknumber=7302713 |