000 -LEADER |
fixed length control field |
04626nam a2200649 i 4500 |
001 - CONTROL NUMBER |
control field |
6813539 |
003 - CONTROL NUMBER IDENTIFIER |
control field |
IEEE |
005 - DATE AND TIME OF LATEST TRANSACTION |
control field |
20200413152908.0 |
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS |
fixed length control field |
m eo d |
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION |
fixed length control field |
cr cn |||m|||a |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
fixed length control field |
121210s2012 caua foab 000 0 eng d |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
International Standard Book Number |
9781608450893 (electronic bk.) |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
Canceled/invalid ISBN |
9781608450886 (pbk.) |
024 7# - OTHER STANDARD IDENTIFIER |
Standard number or code |
10.2200/S00444ED1V01Y201208ICR024 |
Source of number or code |
doi |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(CaBNVSL)swl00401758 |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(OCoLC)819423537 |
040 ## - CATALOGING SOURCE |
Original cataloging agency |
CaBNVSL |
Transcribing agency |
CaBNVSL |
Modifying agency |
CaBNVSL |
050 #4 - LIBRARY OF CONGRESS CALL NUMBER |
Classification number |
TK5105.884 |
Item number |
.M256 2012 |
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER |
Classification number |
025.04 |
Edition number |
23 |
100 1# - MAIN ENTRY--PERSONAL NAME |
Personal name |
Manasse, Mark S. |
245 10 - TITLE STATEMENT |
Title |
On the efficient determination of most near neighbors |
Medium |
[electronic resource] : |
Remainder of title |
horseshoes, hand grenades, Web search, and other situations when close is close enough / |
Statement of responsibility, etc. |
Mark S. Manasse. |
260 ## - PUBLICATION, DISTRIBUTION, ETC. |
Place of publication, distribution, etc. |
San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : |
Name of publisher, distributor, etc. |
Morgan & Claypool, |
Date of publication, distribution, etc. |
c2012. |
300 ## - PHYSICAL DESCRIPTION |
Extent |
1 electronic text (xv, 72 p.) : |
Other physical details |
ill., digital file. |
490 1# - SERIES STATEMENT |
Series statement |
Synthesis lectures on information concepts, retrieval, and services, |
International Standard Serial Number |
1947-9468 ; |
Volume/sequential designation |
# 24 |
538 ## - SYSTEM DETAILS NOTE |
System details note |
Mode of access: World Wide Web. |
538 ## - SYSTEM DETAILS NOTE |
System details note |
System requirements: Adobe Acrobat Reader. |
500 ## - GENERAL NOTE |
General note |
Part of: Synthesis digital library of engineering and computer science. |
500 ## - GENERAL NOTE |
General note |
Series from website. |
504 ## - BIBLIOGRAPHY, ETC. NOTE |
Bibliography, etc. note |
Includes bibliographical references (p. 69-70). |
505 0# - FORMATTED CONTENTS NOTE |
Formatted contents note |
1. Introduction -- 1.1 On similarity, resemblance, look-alikes, and entity resolution -- 1.2 You must know at least this much math to read this book -- 1.3 Cumulative distribution and probability density functions -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
2. Comparing web pages for similarity: an overview -- 2.1 Choosing the features of a web page to compare -- 2.2 Turning features into integers (Rabin Hashing) -- 2.3 How should we measure the proximity of features? -- 2.4 Feature reduction -- 2.5 Putting it together with supershingling -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
3. A personal history of web search -- 3.1 Complexity issues and implementation -- 3.2 Implementing duplicate suppression -- 3.3 Rabin Hashing revisited -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
4. Uniform sampling after Alta Vista -- 4.1 Using less randomness, to improve sampling efficiency -- 4.2 Conjectures vs. theorems -- 4.3 Finding the first point of divergence efficiently -- 4.4 Uniform consistent sampling summarized -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
5. Why weight (and how)? -- 5.1 Constant expected time weighted consistent sampling -- 5.2 Constant time weighted consistent sampling -- 5.3 Accelerating weighted sampling -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
6. A few applications -- 6.1 Web deduplication -- 6.2 File systems: winnowing and friends -- 6.3 Further applications -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
Afterword -- Bibliography -- Author's biography. |
506 1# - RESTRICTIONS ON ACCESS NOTE |
Terms governing access |
Abstract freely available; full-text restricted to subscribers or individual document purchasers. |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Compendex |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
INSPEC |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Google scholar |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Google book search |
520 3# - SUMMARY, ETC. |
Summary, etc. |
The time-worn aphorism "close only counts in horseshoes and hand-grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This lecture is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages--and a few other situations in which we have found that inexact matching is good enough; where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors.We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. |
530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE |
Additional physical form available note |
Also available in print. |
588 ## - SOURCE OF DESCRIPTION NOTE |
Source of description note |
Title from PDF t.p. (viewed on December 10, 2012). |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Internet searching. |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Nearest neighbor analysis (Statistics) |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Statistical matching. |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
nearest neighbor |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
search algorithms |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
information retrieval |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
IR |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
multi-dimensional |
776 08 - ADDITIONAL PHYSICAL FORM ENTRY |
Relationship information |
Print version: |
International Standard Book Number |
9781608450886 |
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE |
Uniform title |
Synthesis digital library of engineering and computer science. |
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE |
Uniform title |
Synthesis lectures on information concepts, retrieval, and services ; |
Volume/sequential designation |
# 24. |
International Standard Serial Number |
1947-9468 |
856 42 - ELECTRONIC LOCATION AND ACCESS |
Materials specified |
Abstract with links to resource |
Uniform Resource Identifier |
http://ieeexplore.ieee.org/servlet/opac?bknumber=6813539 |