000 06856nam a2201477 i 4500
001 7374857
003 IEEE
005 20200413152920.0
006 m eo d
007 cr cn |||m|||a
008 160122s2016 caua foab 000 0 eng d
020 _a9781627058131
_qebook
020 _z9781627058124
_qprint
024 7 _a10.2200/S00662ED1V01Y201508ICR045
_2doi
035 _a(CaBNVSL)swl00406108
035 _a(OCoLC)935806030
040 _aCaBNVSL
_beng
_erda
_cCaBNVSL
_dCaBNVSL
050 4 _aTK5105.884
_b.C257 2016
082 0 4 _a005.758
_223
100 1 _aCambazoglu, B. Barla.,
_eauthor.
245 1 0 _aScalability challenges in web search engines /
_cB. Barla Cambazoglu. Ricardo Baeza-Yates.
264 1 _aSan Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) :
_bMorgan & Claypool,
_c2016.
300 _a1 PDF (xv, 122 pages) :
_billustrations.
336 _atext
_2rdacontent
337 _aelectronic
_2isbdmedia
338 _aonline resource
_2rdacarrier
490 1 _aSynthesis lectures on information concepts, retrieval, and services,
_x1947-9468 ;
_v# 45
538 _aMode of access: World Wide Web.
538 _aSystem requirements: Adobe Acrobat Reader.
500 _aPart of: Synthesis digital library of engineering and computer science.
504 _aIncludes bibliographical references (pages 93-120).
505 0 _a1. Introduction -- 1.1 Web search business -- 1.2 Basic search engine architecture -- 1.3 Scalability issues --
505 8 _a2. The web crawling system -- 2.1 Basic web crawling architecture -- 2.2 Extending the web repository -- 2.3 Refreshing the web repository -- 2.4 Managing the web repository -- 2.5 Distributed web crawling -- 2.6 Factors affecting crawling performance -- 2.7 Literature on web crawling -- 2.8 Open issues in web crawling --
505 8 _a3. The indexing system -- 3.1 Basic indexing architecture -- 3.2 Inverted index -- 3.3 Compressing an inverted index -- 3.4 Constructing an inverted index -- 3.5 Updating an inverted index -- 3.6 Partitioning an inverted index -- 3.7 Literature on indexing -- 3.8 Open issues in indexing --
505 8 _a4. The query processing system -- 4.1 Basic query processing architecture -- 4.2 Query processing on a search node -- 4.3 Query processing in a search cluster -- 4.4 Architectural optimizations -- 4.5 Caching -- 4.6 Query processing on multiple search sites -- 4.7 Literature on query processing -- 4.8 Open issues in query processing --
505 8 _a5. Concluding remarks -- Bibliography -- Authors' biographies.
506 1 _aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510 0 _aCompendex
510 0 _aINSPEC
510 0 _aGoogle scholar
510 0 _aGoogle book search
520 3 _aIn this book, we aim to provide a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. More specifically, we cover the issues involved in the design of three separate systems that are commonly available in every web-scale search engine: web crawling, indexing, and query processing systems. We present the performance challenges encountered in these systems and review a wide range of design alternatives employed as solution to these challenges, specifically focusing on algorithmic and architectural optimizations. We discuss the available optimizations at different computational granularities, ranging from a single computer node to a collection of data centers. We provide some hints to both the practitioners and theoreticians involved in the field about the way large-scale web search engines operate and the adopted design choices. Moreover, we survey the efficiency literature, providing pointers to a large number of relatively important research papers. Finally, we discuss some open research problems in the context of search engine efficiency.
530 _aAlso available in print.
588 _aTitle from PDF title page (viewed on January 22, 2016).
650 0 _aWeb search engines.
650 0 _aComputer networks
_xScalability.
653 _acache invalidation
653 _acentral broker
653 _acompression
653 _acontent spam
653 _adelay attacks
653 _adistributed crawling
653 _adistributed query processing
653 _aDNS cache
653 _adocument id reassignment
653 _adownload throughput
653 _adynamic index pruning
653 _aearly exit optimization
653 _aeffectiveness
653 _aefficiency
653 _aforward index
653 _aindex construction
653 _aindex maintenance
653 _aindex partitioning
653 _aindex replication
653 _aindexing
653 _ainverted index
653 _ainverted list cache
653 _ainverted list
653 _alink exchange
653 _alink farm
653 _alink spam
653 _amachine-learned ranking
653 _amatching
653 _amultisite web search
653 _anear duplicate detection
653 _apage cache
653 _aperformance
653 _aposition list
653 _aposting list
653 _aquery-independent feature
653 _aquery expansion
653 _aquery forwarding
653 _aquery interpretation
653 _aquery processing
653 _aquery rewriting
653 _arelevance
653 _aquery scheduling
653 _aresponse latency
653 _aresult cache
653 _aresult freshness
653 _aresult preparation
653 _aresult retrieval
653 _ascalability
653 _asearch center
653 _asearch cluster
653 _asearch engine result page
653 _asearch quality
653 _aselective search
653 _ashingles
653 _askip pointer
653 _asnippet
653 _asoft 404 page
653 _aspider trap
653 _astatic index pruning
653 _atext processing
653 _athroughput
653 _atiering
653 _atime-to-live
653 _atwo-phase ranking
653 _aURL-seen test
653 _aURL caching
653 _aweb change
653 _aweb coverage
653 _aweb crawler
653 _aweb frontier
653 _aweb graph
653 _aweb repository
653 _aweb search engine
653 _awebsite mirror
700 1 _aBaeza-Yates, R.
_q(Ricardo),
_eauthor.
776 0 8 _iPrint version:
_z9781627058124
830 0 _aSynthesis digital library of engineering and computer science.
830 0 _aSynthesis lectures on information concepts, retrieval, and services ;
_v# 45.
_x1947-9468
856 4 2 _3Abstract with links to resource
_uhttp://ieeexplore.ieee.org/servlet/opac?bknumber=7374857
999 _c562179
_d562179