000 06749nam a2200697 i 4500
001 7065199
003 IEEE
005 20200413152917.0
006 m eo d
007 cr cn |||m|||a
008 150320s2015 caua foab 001 0 eng d
020 _a9781627052245
_qebook
020 _z9781627052238
_qprint
024 7 _a10.2200/S00578ED1V01Y201404DTM040
_2doi
035 _a(CaBNVSL)swl00404797
035 _a(OCoLC)905421798
040 _aCaBNVSL
_beng
_erda
_cCaBNVSL
_dCaBNVSL
050 4 _aQA76.9.D343
_bD654 2015
082 0 4 _a006.312
_223
100 1 _aDong, Xin Luna.,
_eauthor.
245 1 0 _aBig data integration /
_cXin Luna Dong, Divesh Srivastava.
264 1 _aSan Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) :
_bMorgan & Claypool,
_c2015.
300 _a1 PDF (xx, 178 pages) :
_billustrations.
336 _atext
_2rdacontent
337 _aelectronic
_2isbdmedia
338 _aonline resource
_2rdacarrier
490 1 _aSynthesis lectures on data management,
_x2153-5426 ;
_v# 40
538 _aMode of access: World Wide Web.
538 _aSystem requirements: Adobe Acrobat Reader.
500 _aPart of: Synthesis digital library of engineering and computer science.
504 _aIncludes bibliographical references (pages 165-173) and index.
505 0 _a1. Motivation: challenges and opportunities for BDI -- 1.1 Traditional data integration -- 1.1.1 The flights example: data sources -- 1.1.2 The flights example: data integration -- 1.1.3 Data integration: architecture & three major steps -- 1.2 BDI: challenges -- 1.2.1 The "V" dimensions -- 1.2.2 Case study: quantity of deep web data -- 1.2.3 Case study: extracted domain-specific data -- 1.2.4 Case study: quality of deep web data -- 1.2.5 Case study: surface web structured data -- 1.2.6 Case study: extracted knowledge triples -- 1.3 BDI: opportunities -- 1.3.1 Data redundancy -- 1.3.2 Long data -- 1.3.3 Big data platforms -- 1.4 Outline of book --
505 8 _a2. Schema alignment -- 2.1 Traditional schema alignment: a quick tour -- 2.1.1 Mediated schema -- 2.1.2 Attribute matching -- 2.1.3 Schema mapping -- 2.1.4 Query answering -- 2.2 Addressing the variety and velocity challenges -- 2.2.1 Probabilistic schema alignment -- 2.2.2 Pay-as-you-go user feedback -- 2.3 Addressing the variety and volume challenges -- 2.3.1 Integrating deep web data -- 2.3.2 Integrating web tables --
505 8 _a3. Record linkage -- 3.1 Traditional record linkage: a quick tour -- 3.1.1 Pairwise matching -- 3.1.2 Clustering -- 3.1.3 Blocking -- 3.2 Addressing the volume challenge -- 3.2.1 Using MapReduce to parallelize blocking -- 3.2.2 Meta-blocking: pruning pairwise matchings -- 3.3 Addressing the velocity challenge -- 3.3.1 Incremental record linkage -- 3.4 Addressing the variety challenge -- 3.4.1 Linking text snippets to structured data -- 3.5 Addressing the veracity challenge -- 3.5.1 Temporal record linkage -- 3.5.2 Record linkage with uniqueness constraints --
505 8 _a4. BDI: data fusion -- 4.1 Traditional data fusion: a quick tour -- 4.2 Addressing the veracity challenge -- 4.2.1 Accuracy of a source -- 4.2.2 Probability of a value being true -- 4.2.3 Copying between sources -- 4.2.4 The end-to-end solution -- 4.2.5 Extensions and alternatives -- 4.3 Addressing the volume challenge -- 4.3.1 A MapReduce-based framework for offline fusion -- 4.3.2 Online data fusion -- 4.4 Addressing the velocity challenge -- 4.5 Addressing the variety challenge --
505 8 _a5. BDI: emerging topics -- 5.1 Role of crowdsourcing -- 5.1.1 Leveraging transitive relations -- 5.1.2 Crowdsourcing the end-to-end workflow -- 5.1.3 Future work -- 5.2 Source selection -- 5.2.1 Static sources -- 5.2.2 Dynamic sources -- 5.2.3 Future work -- 5.3 Source profiling -- 5.3.1 The Bellman system -- 5.3.2 Summarizing sources -- 5.3.3 Future work --
505 8 _a6. Conclusions -- Bibliography -- Authors' biographies -- Index.
506 1 _aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510 0 _aCompendex
510 0 _aINSPEC
510 0 _aGoogle scholar
510 0 _aGoogle book search
520 3 _aThe big data era is upon us: data are being generated, analyzed, and used at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of big data. BDI differs from traditional data integration along the dimensions of volume, velocity, variety, and veracity. First, not only can data sources contain a huge volume of data, but also the number of data sources is now in the millions. Second, because of the rate at which newly collected data are made available, many of the data sources are very dynamic, and the number of data sources is also rapidly exploding. Third, data sources are extremely heterogeneous in their structure and content, exhibiting considerable variety even for substantially similar entities. Fourth, the data sources are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. This book explores the progress that has been made by the data integration community on the topics of schema alignment, record linkage and data fusion in addressing these novel challenges faced by big data integration. Each of these topics is covered in a systematic way: first starting with a quick tour of the topic in the context of traditional data integration, followed by a detailed, example-driven exposition of recent innovative techniques that have been proposed to address the BDI challenges of volume, velocity, variety, and veracity. Finally, it presents emerging topics and opportunities that are specific to BDI, identifying promising directions for the data integration community.
530 _aAlso available in print.
588 _aTitle from PDF title page (viewed on March 20, 2015).
650 0 _aBig data.
650 0 _aData integration (Computer science)
653 _abig data integration
653 _adata fusion
653 _arecord linkage
653 _aschema alignment
653 _avariety
653 _avelocity
653 _averacity
653 _avolume
700 1 _aSrivastava, Divesh.,
_eauthor.
776 0 8 _iPrint version:
_z9781627052238
830 0 _aSynthesis digital library of engineering and computer science.
830 0 _aSynthesis lectures on data management ;
_v# 40.
_x2153-5426
856 4 2 _3Abstract with links to resource
_uhttp://ieeexplore.ieee.org/servlet/opac?bknumber=7065199
999 _c562123
_d562123