Welcome to P K Kelkar Library, Online Public Access Catalogue (OPAC)

Normal view MARC view ISBD view

Scalability challenges in web search engines /

By: Cambazoglu, B. Barla [author.].
Contributor(s): Baeza-Yates, R [author.].
Material type: materialTypeLabelBookSeries: Synthesis digital library of engineering and computer science: ; Synthesis lectures on information concepts, retrieval, and services: # 45.Publisher: San Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, 2016.Description: 1 PDF (xv, 122 pages) : illustrations.Content type: text Media type: electronic Carrier type: online resourceISBN: 9781627058131.Subject(s): Web search engines | Computer networks -- Scalability | cache invalidation | central broker | compression | content spam | delay attacks | distributed crawling | distributed query processing | DNS cache | document id reassignment | download throughput | dynamic index pruning | early exit optimization | effectiveness | efficiency | forward index | index construction | index maintenance | index partitioning | index replication | indexing | inverted index | inverted list cache | inverted list | link exchange | link farm | link spam | machine-learned ranking | matching | multisite web search | near duplicate detection | page cache | performance | position list | posting list | query-independent feature | query expansion | query forwarding | query interpretation | query processing | query rewriting | relevance | query scheduling | response latency | result cache | result freshness | result preparation | result retrieval | scalability | search center | search cluster | search engine result page | search quality | selective search | shingles | skip pointer | snippet | soft 404 page | spider trap | static index pruning | text processing | throughput | tiering | time-to-live | two-phase ranking | URL-seen test | URL caching | web change | web coverage | web crawler | web frontier | web graph | web repository | web search engine | website mirrorDDC classification: 005.758 Online resources: Abstract with links to resource Also available in print.
Contents:
1. Introduction -- 1.1 Web search business -- 1.2 Basic search engine architecture -- 1.3 Scalability issues --
2. The web crawling system -- 2.1 Basic web crawling architecture -- 2.2 Extending the web repository -- 2.3 Refreshing the web repository -- 2.4 Managing the web repository -- 2.5 Distributed web crawling -- 2.6 Factors affecting crawling performance -- 2.7 Literature on web crawling -- 2.8 Open issues in web crawling --
3. The indexing system -- 3.1 Basic indexing architecture -- 3.2 Inverted index -- 3.3 Compressing an inverted index -- 3.4 Constructing an inverted index -- 3.5 Updating an inverted index -- 3.6 Partitioning an inverted index -- 3.7 Literature on indexing -- 3.8 Open issues in indexing --
4. The query processing system -- 4.1 Basic query processing architecture -- 4.2 Query processing on a search node -- 4.3 Query processing in a search cluster -- 4.4 Architectural optimizations -- 4.5 Caching -- 4.6 Query processing on multiple search sites -- 4.7 Literature on query processing -- 4.8 Open issues in query processing --
5. Concluding remarks -- Bibliography -- Authors' biographies.
Abstract: In this book, we aim to provide a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. More specifically, we cover the issues involved in the design of three separate systems that are commonly available in every web-scale search engine: web crawling, indexing, and query processing systems. We present the performance challenges encountered in these systems and review a wide range of design alternatives employed as solution to these challenges, specifically focusing on algorithmic and architectural optimizations. We discuss the available optimizations at different computational granularities, ranging from a single computer node to a collection of data centers. We provide some hints to both the practitioners and theoreticians involved in the field about the way large-scale web search engines operate and the adopted design choices. Moreover, we survey the efficiency literature, providing pointers to a large number of relatively important research papers. Finally, we discuss some open research problems in the context of search engine efficiency.
    average rating: 0.0 (0 votes)
Item type Current location Call number Status Date due Barcode Item holds
E books E books PK Kelkar Library, IIT Kanpur
Available EBKE679
Total holds: 0

Mode of access: World Wide Web.

System requirements: Adobe Acrobat Reader.

Part of: Synthesis digital library of engineering and computer science.

Includes bibliographical references (pages 93-120).

1. Introduction -- 1.1 Web search business -- 1.2 Basic search engine architecture -- 1.3 Scalability issues --

2. The web crawling system -- 2.1 Basic web crawling architecture -- 2.2 Extending the web repository -- 2.3 Refreshing the web repository -- 2.4 Managing the web repository -- 2.5 Distributed web crawling -- 2.6 Factors affecting crawling performance -- 2.7 Literature on web crawling -- 2.8 Open issues in web crawling --

3. The indexing system -- 3.1 Basic indexing architecture -- 3.2 Inverted index -- 3.3 Compressing an inverted index -- 3.4 Constructing an inverted index -- 3.5 Updating an inverted index -- 3.6 Partitioning an inverted index -- 3.7 Literature on indexing -- 3.8 Open issues in indexing --

4. The query processing system -- 4.1 Basic query processing architecture -- 4.2 Query processing on a search node -- 4.3 Query processing in a search cluster -- 4.4 Architectural optimizations -- 4.5 Caching -- 4.6 Query processing on multiple search sites -- 4.7 Literature on query processing -- 4.8 Open issues in query processing --

5. Concluding remarks -- Bibliography -- Authors' biographies.

Abstract freely available; full-text restricted to subscribers or individual document purchasers.

Compendex

INSPEC

Google scholar

Google book search

In this book, we aim to provide a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. More specifically, we cover the issues involved in the design of three separate systems that are commonly available in every web-scale search engine: web crawling, indexing, and query processing systems. We present the performance challenges encountered in these systems and review a wide range of design alternatives employed as solution to these challenges, specifically focusing on algorithmic and architectural optimizations. We discuss the available optimizations at different computational granularities, ranging from a single computer node to a collection of data centers. We provide some hints to both the practitioners and theoreticians involved in the field about the way large-scale web search engines operate and the adopted design choices. Moreover, we survey the efficiency literature, providing pointers to a large number of relatively important research papers. Finally, we discuss some open research problems in the context of search engine efficiency.

Also available in print.

Title from PDF title page (viewed on January 22, 2016).

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha