000 | 06856nam a2201477 i 4500 | ||
---|---|---|---|
001 | 7374857 | ||
003 | IEEE | ||
005 | 20200413152920.0 | ||
006 | m eo d | ||
007 | cr cn |||m|||a | ||
008 | 160122s2016 caua foab 000 0 eng d | ||
020 |
_a9781627058131 _qebook |
||
020 |
_z9781627058124 _qprint |
||
024 | 7 |
_a10.2200/S00662ED1V01Y201508ICR045 _2doi |
|
035 | _a(CaBNVSL)swl00406108 | ||
035 | _a(OCoLC)935806030 | ||
040 |
_aCaBNVSL _beng _erda _cCaBNVSL _dCaBNVSL |
||
050 | 4 |
_aTK5105.884 _b.C257 2016 |
|
082 | 0 | 4 |
_a005.758 _223 |
100 | 1 |
_aCambazoglu, B. Barla., _eauthor. |
|
245 | 1 | 0 |
_aScalability challenges in web search engines / _cB. Barla Cambazoglu. Ricardo Baeza-Yates. |
264 | 1 |
_aSan Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) : _bMorgan & Claypool, _c2016. |
|
300 |
_a1 PDF (xv, 122 pages) : _billustrations. |
||
336 |
_atext _2rdacontent |
||
337 |
_aelectronic _2isbdmedia |
||
338 |
_aonline resource _2rdacarrier |
||
490 | 1 |
_aSynthesis lectures on information concepts, retrieval, and services, _x1947-9468 ; _v# 45 |
|
538 | _aMode of access: World Wide Web. | ||
538 | _aSystem requirements: Adobe Acrobat Reader. | ||
500 | _aPart of: Synthesis digital library of engineering and computer science. | ||
504 | _aIncludes bibliographical references (pages 93-120). | ||
505 | 0 | _a1. Introduction -- 1.1 Web search business -- 1.2 Basic search engine architecture -- 1.3 Scalability issues -- | |
505 | 8 | _a2. The web crawling system -- 2.1 Basic web crawling architecture -- 2.2 Extending the web repository -- 2.3 Refreshing the web repository -- 2.4 Managing the web repository -- 2.5 Distributed web crawling -- 2.6 Factors affecting crawling performance -- 2.7 Literature on web crawling -- 2.8 Open issues in web crawling -- | |
505 | 8 | _a3. The indexing system -- 3.1 Basic indexing architecture -- 3.2 Inverted index -- 3.3 Compressing an inverted index -- 3.4 Constructing an inverted index -- 3.5 Updating an inverted index -- 3.6 Partitioning an inverted index -- 3.7 Literature on indexing -- 3.8 Open issues in indexing -- | |
505 | 8 | _a4. The query processing system -- 4.1 Basic query processing architecture -- 4.2 Query processing on a search node -- 4.3 Query processing in a search cluster -- 4.4 Architectural optimizations -- 4.5 Caching -- 4.6 Query processing on multiple search sites -- 4.7 Literature on query processing -- 4.8 Open issues in query processing -- | |
505 | 8 | _a5. Concluding remarks -- Bibliography -- Authors' biographies. | |
506 | 1 | _aAbstract freely available; full-text restricted to subscribers or individual document purchasers. | |
510 | 0 | _aCompendex | |
510 | 0 | _aINSPEC | |
510 | 0 | _aGoogle scholar | |
510 | 0 | _aGoogle book search | |
520 | 3 | _aIn this book, we aim to provide a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. More specifically, we cover the issues involved in the design of three separate systems that are commonly available in every web-scale search engine: web crawling, indexing, and query processing systems. We present the performance challenges encountered in these systems and review a wide range of design alternatives employed as solution to these challenges, specifically focusing on algorithmic and architectural optimizations. We discuss the available optimizations at different computational granularities, ranging from a single computer node to a collection of data centers. We provide some hints to both the practitioners and theoreticians involved in the field about the way large-scale web search engines operate and the adopted design choices. Moreover, we survey the efficiency literature, providing pointers to a large number of relatively important research papers. Finally, we discuss some open research problems in the context of search engine efficiency. | |
530 | _aAlso available in print. | ||
588 | _aTitle from PDF title page (viewed on January 22, 2016). | ||
650 | 0 | _aWeb search engines. | |
650 | 0 |
_aComputer networks _xScalability. |
|
653 | _acache invalidation | ||
653 | _acentral broker | ||
653 | _acompression | ||
653 | _acontent spam | ||
653 | _adelay attacks | ||
653 | _adistributed crawling | ||
653 | _adistributed query processing | ||
653 | _aDNS cache | ||
653 | _adocument id reassignment | ||
653 | _adownload throughput | ||
653 | _adynamic index pruning | ||
653 | _aearly exit optimization | ||
653 | _aeffectiveness | ||
653 | _aefficiency | ||
653 | _aforward index | ||
653 | _aindex construction | ||
653 | _aindex maintenance | ||
653 | _aindex partitioning | ||
653 | _aindex replication | ||
653 | _aindexing | ||
653 | _ainverted index | ||
653 | _ainverted list cache | ||
653 | _ainverted list | ||
653 | _alink exchange | ||
653 | _alink farm | ||
653 | _alink spam | ||
653 | _amachine-learned ranking | ||
653 | _amatching | ||
653 | _amultisite web search | ||
653 | _anear duplicate detection | ||
653 | _apage cache | ||
653 | _aperformance | ||
653 | _aposition list | ||
653 | _aposting list | ||
653 | _aquery-independent feature | ||
653 | _aquery expansion | ||
653 | _aquery forwarding | ||
653 | _aquery interpretation | ||
653 | _aquery processing | ||
653 | _aquery rewriting | ||
653 | _arelevance | ||
653 | _aquery scheduling | ||
653 | _aresponse latency | ||
653 | _aresult cache | ||
653 | _aresult freshness | ||
653 | _aresult preparation | ||
653 | _aresult retrieval | ||
653 | _ascalability | ||
653 | _asearch center | ||
653 | _asearch cluster | ||
653 | _asearch engine result page | ||
653 | _asearch quality | ||
653 | _aselective search | ||
653 | _ashingles | ||
653 | _askip pointer | ||
653 | _asnippet | ||
653 | _asoft 404 page | ||
653 | _aspider trap | ||
653 | _astatic index pruning | ||
653 | _atext processing | ||
653 | _athroughput | ||
653 | _atiering | ||
653 | _atime-to-live | ||
653 | _atwo-phase ranking | ||
653 | _aURL-seen test | ||
653 | _aURL caching | ||
653 | _aweb change | ||
653 | _aweb coverage | ||
653 | _aweb crawler | ||
653 | _aweb frontier | ||
653 | _aweb graph | ||
653 | _aweb repository | ||
653 | _aweb search engine | ||
653 | _awebsite mirror | ||
700 | 1 |
_aBaeza-Yates, R. _q(Ricardo), _eauthor. |
|
776 | 0 | 8 |
_iPrint version: _z9781627058124 |
830 | 0 | _aSynthesis digital library of engineering and computer science. | |
830 | 0 |
_aSynthesis lectures on information concepts, retrieval, and services ; _v# 45. _x1947-9468 |
|
856 | 4 | 2 |
_3Abstract with links to resource _uhttp://ieeexplore.ieee.org/servlet/opac?bknumber=7374857 |
999 |
_c562179 _d562179 |