000 -LEADER |
fixed length control field |
09141nam a2200673 i 4500 |
001 - CONTROL NUMBER |
control field |
6813533 |
003 - CONTROL NUMBER IDENTIFIER |
control field |
IEEE |
005 - DATE AND TIME OF LATEST TRANSACTION |
control field |
20200413152911.0 |
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS |
fixed length control field |
m eo d |
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION |
fixed length control field |
cr cn |||m|||a |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
fixed length control field |
130814s2013 caua foab 001 0 eng d |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
International Standard Book Number |
9781627050791 (electronic bk.) |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
Canceled/invalid ISBN |
9781627050784 (pbk.) |
024 7# - OTHER STANDARD IDENTIFIER |
Standard number or code |
10.2200/S00494ED1V01Y201304ICR027 |
Source of number or code |
doi |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(CaBNVSL)swl00402647 |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(OCoLC)855858906 |
040 ## - CATALOGING SOURCE |
Original cataloging agency |
CaBNVSL |
Transcribing agency |
CaBNVSL |
Modifying agency |
CaBNVSL |
050 #4 - LIBRARY OF CONGRESS CALL NUMBER |
Classification number |
ZA3075 |
Item number |
.R645 2013 |
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER |
Classification number |
025.04 |
Edition number |
23 |
090 ## - LOCALLY ASSIGNED LC-TYPE CALL NUMBER (OCLC); LOCAL CALL NUMBER (RLIN) |
Classification number (OCLC) (R) ; Classification number, CALL (RLIN) (NR) |
|
Local cutter number (OCLC) ; Book number/undivided call number, CALL (RLIN) |
MoCl |
100 1# - MAIN ENTRY--PERSONAL NAME |
Personal name |
Roelleke, Thomas. |
245 10 - TITLE STATEMENT |
Title |
Information retrieval models |
Medium |
[electronic resource] : |
Remainder of title |
foundations and relationships / |
Statement of responsibility, etc. |
Thomas Roelleke. |
260 ## - PUBLICATION, DISTRIBUTION, ETC. |
Place of publication, distribution, etc. |
San Rafael, Calif. (1537 Fourth Street, San Rafael, CA 94901 USA) : |
Name of publisher, distributor, etc. |
Morgan & Claypool, |
Date of publication, distribution, etc. |
c2013. |
300 ## - PHYSICAL DESCRIPTION |
Extent |
1 electronic text (xxi, 141 p.) : |
Other physical details |
ill., digital file. |
490 1# - SERIES STATEMENT |
Series statement |
Synthesis lectures on information concepts, retrieval, and services, |
International Standard Serial Number |
1947-9468 ; |
Volume/sequential designation |
# 27 |
538 ## - SYSTEM DETAILS NOTE |
System details note |
Mode of access: World Wide Web. |
538 ## - SYSTEM DETAILS NOTE |
System details note |
System requirements: Adobe Acrobat Reader. |
500 ## - GENERAL NOTE |
General note |
Part of: Synthesis digital library of engineering and computer science. |
500 ## - GENERAL NOTE |
General note |
Series from website. |
504 ## - BIBLIOGRAPHY, ETC. NOTE |
Bibliography, etc. note |
Includes bibliographical references (p. 127-134) and index. |
505 0# - FORMATTED CONTENTS NOTE |
Formatted contents note |
1. Introduction -- 1.1 Structure and contribution of this book -- 1.2 Background: a timeline of IR models -- 1.3 Notation -- 1.3.1 The notation issue "term frequency" -- 1.3.2 Notation: Zhai's book and this book -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
2. Foundations of IR models -- 2.1 TF-IDF -- 2.1.1 TF variants -- 2.1.2 TFlog: Logarithmic TF -- 2.1.3 TFfrac: fractional (ratio-based) TF -- 2.1.4 IDF variants -- 2.1.5 Term weight and RSV -- 2.1.6 Other TF variants: lifted TF and pivoted TF -- 2.1.7 Semi-subsumed event occurrences: a semantics of the BM25-TF -- 2.1.8 Probabilistic IDF: The probability of being informative -- 2.1.9 Summary -- 2.2 PRF: the probability of relevance framework -- 2.2.1 Feature independence assumption -- 2.2.2 Non-query term assumption -- 2.2.3 Term frequency split -- 2.2.4 Probability ranking principle (PRP) -- 2.2.5 Summary -- 2.3 BIR: binary independence retrieval -- 2.3.1 Term weight and RSV -- 2.3.2 Missing relevance information -- 2.3.3 Variants of the BIR term weight -- 2.3.4 Smooth variants of the BIR term weight -- 2.3.5 RSJ term weight -- 2.3.6 On theoretical arguments for 0.5 in the RSJ term weight -- 2.3.7 Summary -- 2.4 Poisson and 2-Poisson -- 2.4.1 Poisson probability -- 2.4.2 Poisson analogy: sunny days and term occurrences -- 2.4.3 Poisson example: toy data -- 2.4.4 Poisson example: TREC-2 -- 2.4.5 Binomial probability -- 2.4.6 Relationship between Poisson and binomial probability -- 2.4.7 Poisson PRF -- 2.4.8 Term weight and RSV -- 2.4.9 2-Poisson -- 2.4.10 Summary -- 2.5 BM25 -- 2.5.1 BM25-TF -- 2.5.2 BM25-TF and pivoted TF -- 2.5.3 BM25: literature and Wikipedia end 2012 -- 2.5.4 Term weight and RSV -- 2.5.5 Summary -- 2.6 LM: language modeling -- 2.6.1 Probability mixtures -- 2.6.2 Term weight and RSV: LM1 -- 2.6.3 Term weight and RSV: LM (normalized) -- 2.6.4 Term weight and RSV: JM-LM -- 2.6.5 Term weight and RSV: Dirich-LM -- 2.6.6 Term weight and RSV: LM2 -- 2.6.7 Summary -- 2.7 PIN's: probabilistic inference networks -- 2.7.1 The Turtle/Croft link matrix -- 2.7.2 Term weight and RSV -- 2.7.3 Summary -- 2.8 Divergence-based models and DFR -- 2.8.1 DFR: divergence from randomness -- 2.8.2 DFR: sampling over documents and locations -- 2.8.3 DFR: binomial transformation step -- 2.8.4 DFR and KL-divergence -- 2.8.5 Poisson as a model of randomness: P(Kt [greater than] 0/d,c): DFR-1 -- 2.8.6 Poisson as a model of randomness: P(Kt [equals] TFd/d,c): DFR-2 -- 2.8.7 DFR: elite documents -- 2.8.8 DFR: example -- 2.8.9 Term weights and RSV's -- 2.8.10 KL-divergence retrieval model -- 2.8.11 Summary -- 2.9 Relevance-based models -- 2.9.1 Rocchio's relevance feedback model -- 2.9.2 The PRF -- 2.9.3 Lavrenko's relevance-based language models -- 2.10 Precision and recall -- 2.10.1 Precision and recall: conditional probabilities -- 2.10.2 Averages: total probabilities -- 2.11 Summary -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
3. Relationships between IR models -- 3.1 PRF: the probability of relevance framework -- 3.1.1 Estimation of term probabilities -- 3.2 P(d - q): the probability that d implies q -- 3.3 The vector-space model (VSM) -- 3.3.1 VSM and probabilities -- 3.4 The generalised vector-space model (GVSM) -- 3.4.1 GVSM and probabilities -- 3.5 A general matrix framework -- 3.5.1 Term-document matrix -- 3.5.2 On the notation issue "term frequency" -- 3.5.3 Document-document matrix -- 3.5.4 Co-occurrence matrices -- 3.6 A parallel derivation of probabilistic retrieval models -- 3.7 The Poisson bridge: Pd(t/u) avgtf(t,u) [equals] PL(t/u) avgdl(u) -- 3.8 Query term probability assumptions -- 3.8.1 Query term mixture assumption -- 3.8.2 Query term burstiness assumption -- 3.8.3 Query term BIR assumption -- 3.9 TF-IDF -- 3.9.1 TF-IDF and BIR -- 3.9.2 TF-IDF and Poisson -- 3.9.3 TF-IDF and BM25 -- 3.9.4 TF-IDF and LM -- 3.9.5 TF-IDF and LM: side-by-side -- 3.9.6 TF-IDF and PIN's -- 3.9.7 TF-IDF and divergence -- 3.9.8 TF-IDF and DFR: risk times gain -- 3.9.9 TF-IDF and DFR: gaps between term occurrences -- 3.10 More relationships: BM25 and LM, LM and PIN's -- 3.11 Information theory -- 3.11.1 Entropy -- 3.11.2 Joint entropy -- 3.11.3 Conditional entropy -- 3.11.4 Mutual information (MI) -- 3.11.5 Cross entropy -- 3.11.6 KL-divergence -- 3.11.7 Query clarity: divergence(query collection) -- 3.11.8 LM = Clarity(query) - Divergence(query doc) -- 3.11.9 TF-IDF = Clarity(doc) - Divergence(doc query) -- 3.12 Summary -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
4. Summary & research outlook -- 4.1 Summary -- 4.2 Research outlook -- 4.2.1 Retrieval models -- 4.2.2 Evaluation models -- 4.2.3 A unified framework for retrieval and evaluation -- 4.2.4 Model combinations and "new" models -- 4.2.5 Dependence-aware models -- 4.2.6 "Query-log" and other more-evidence models -- 4.2.7 Phase-2 models: retrieval result condensation models -- 4.2.8 A theoretical framework to predict ranking quality -- 4.2.9 MIR: math for IR -- 4.2.10 AIR: abstraction for IR -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
Bibliography -- Author's biography -- Index. |
506 1# - RESTRICTIONS ON ACCESS NOTE |
Terms governing access |
Abstract freely available; full-text restricted to subscribers or individual document purchasers. |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Compendex |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
INSPEC |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Google scholar |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Google book search |
520 3# - SUMMARY, ETC. |
Summary, etc. |
Information Retrieval (IR) models are a core component of IR research and IR systems. The past decade brought a consolidation of the family of IR models, which by 2000 consisted of relatively isolated views on TF-IDF (Term-Frequency times Inverse-Document-Frequency) as the weighting scheme in the vector-space model (VSM), the probabilistic relevance framework (PRF), the binary independence retrieval (BIR) model, BM25 (Best-Match Version 25, the main instantiation of the PRF/BIR), and language modelling (LM). Also, the early 2000s saw the arrival of divergence from randomness (DFR). Regarding intuition and simplicity, though LM is clear from a probabilistic point of view, several people stated: "It is easy to understand TF-IDF and BM25. For LM, however, we understand the math, but we do not fully understand why it works." This book takes a horizontal approach gathering the foundations of TF-IDF, PRF, BIR, Poisson, BM25, LM, probabilistic inference networks (PIN's), and divergence-based models. The aim is to create a consolidated and balanced view on the main models. A particular focus of this book is on the "relationships between models." This includes an overview over the main frameworks (PRF, logical IR, VSM, generalized VSM) and a pairing of TF-IDF with other models. It becomes evident that TF-IDF and LM measure the same, namely the dependence (overlap) between document and query. The Poisson probability helps to establish probabilistic, non-heuristic roots for TF-IDF, and the Poisson parameter, average term frequency, is a binding link between several retrieval models and model parameters. |
530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE |
Additional physical form available note |
Also available in print. |
588 ## - SOURCE OF DESCRIPTION NOTE |
Source of description note |
Title from PDF t.p. (viewed on August 14, 2013). |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Information retrieval |
General subdivision |
Mathematical models. |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
Information Retrieval (IR) Models |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
Foundations & Relationships |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
TF-IDF |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
probability of relevance framework (PRF) |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
Poisson |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
BM25 |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
language modelling (LM) |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
divergence from randomness (DFR) |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
probabilistic roots of IR models |
776 08 - ADDITIONAL PHYSICAL FORM ENTRY |
Relationship information |
Print version: |
International Standard Book Number |
9781627050784 |
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE |
Uniform title |
Synthesis digital library of engineering and computer science. |
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE |
Uniform title |
Synthesis lectures on information concepts, retrieval, and services ; |
Volume/sequential designation |
# 27. |
International Standard Serial Number |
1947-9468 |
856 42 - ELECTRONIC LOCATION AND ACCESS |
Materials specified |
Abstract with links to resource |
Uniform Resource Identifier |
http://ieeexplore.ieee.org/servlet/opac?bknumber=6813533 |
856 40 - ELECTRONIC LOCATION AND ACCESS |
Materials specified |
Abstract with links to full text |
Uniform Resource Identifier |
http://dx.doi.org/10.2200/S00494ED1V01Y201304ICR027 |