000 -LEADER |
fixed length control field |
06749nam a2200697 i 4500 |
001 - CONTROL NUMBER |
control field |
7065199 |
003 - CONTROL NUMBER IDENTIFIER |
control field |
IEEE |
005 - DATE AND TIME OF LATEST TRANSACTION |
control field |
20200413152917.0 |
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS |
fixed length control field |
m eo d |
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION |
fixed length control field |
cr cn |||m|||a |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
fixed length control field |
150320s2015 caua foab 001 0 eng d |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
International Standard Book Number |
9781627052245 |
Qualifying information |
ebook |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
Canceled/invalid ISBN |
9781627052238 |
Qualifying information |
print |
024 7# - OTHER STANDARD IDENTIFIER |
Standard number or code |
10.2200/S00578ED1V01Y201404DTM040 |
Source of number or code |
doi |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(CaBNVSL)swl00404797 |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(OCoLC)905421798 |
040 ## - CATALOGING SOURCE |
Original cataloging agency |
CaBNVSL |
Language of cataloging |
eng |
Description conventions |
rda |
Transcribing agency |
CaBNVSL |
Modifying agency |
CaBNVSL |
050 #4 - LIBRARY OF CONGRESS CALL NUMBER |
Classification number |
QA76.9.D343 |
Item number |
D654 2015 |
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER |
Classification number |
006.312 |
Edition number |
23 |
100 1# - MAIN ENTRY--PERSONAL NAME |
Personal name |
Dong, Xin Luna., |
Relator term |
author. |
245 10 - TITLE STATEMENT |
Title |
Big data integration / |
Statement of responsibility, etc. |
Xin Luna Dong, Divesh Srivastava. |
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE |
Place of production, publication, distribution, manufacture |
San Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) : |
Name of producer, publisher, distributor, manufacturer |
Morgan & Claypool, |
Date of production, publication, distribution, manufacture, or copyright notice |
2015. |
300 ## - PHYSICAL DESCRIPTION |
Extent |
1 PDF (xx, 178 pages) : |
Other physical details |
illustrations. |
336 ## - CONTENT TYPE |
Content type term |
text |
Source |
rdacontent |
337 ## - MEDIA TYPE |
Media type term |
electronic |
Source |
isbdmedia |
338 ## - CARRIER TYPE |
Carrier type term |
online resource |
Source |
rdacarrier |
490 1# - SERIES STATEMENT |
Series statement |
Synthesis lectures on data management, |
International Standard Serial Number |
2153-5426 ; |
Volume/sequential designation |
# 40 |
538 ## - SYSTEM DETAILS NOTE |
System details note |
Mode of access: World Wide Web. |
538 ## - SYSTEM DETAILS NOTE |
System details note |
System requirements: Adobe Acrobat Reader. |
500 ## - GENERAL NOTE |
General note |
Part of: Synthesis digital library of engineering and computer science. |
504 ## - BIBLIOGRAPHY, ETC. NOTE |
Bibliography, etc. note |
Includes bibliographical references (pages 165-173) and index. |
505 0# - FORMATTED CONTENTS NOTE |
Formatted contents note |
1. Motivation: challenges and opportunities for BDI -- 1.1 Traditional data integration -- 1.1.1 The flights example: data sources -- 1.1.2 The flights example: data integration -- 1.1.3 Data integration: architecture & three major steps -- 1.2 BDI: challenges -- 1.2.1 The "V" dimensions -- 1.2.2 Case study: quantity of deep web data -- 1.2.3 Case study: extracted domain-specific data -- 1.2.4 Case study: quality of deep web data -- 1.2.5 Case study: surface web structured data -- 1.2.6 Case study: extracted knowledge triples -- 1.3 BDI: opportunities -- 1.3.1 Data redundancy -- 1.3.2 Long data -- 1.3.3 Big data platforms -- 1.4 Outline of book -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
2. Schema alignment -- 2.1 Traditional schema alignment: a quick tour -- 2.1.1 Mediated schema -- 2.1.2 Attribute matching -- 2.1.3 Schema mapping -- 2.1.4 Query answering -- 2.2 Addressing the variety and velocity challenges -- 2.2.1 Probabilistic schema alignment -- 2.2.2 Pay-as-you-go user feedback -- 2.3 Addressing the variety and volume challenges -- 2.3.1 Integrating deep web data -- 2.3.2 Integrating web tables -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
3. Record linkage -- 3.1 Traditional record linkage: a quick tour -- 3.1.1 Pairwise matching -- 3.1.2 Clustering -- 3.1.3 Blocking -- 3.2 Addressing the volume challenge -- 3.2.1 Using MapReduce to parallelize blocking -- 3.2.2 Meta-blocking: pruning pairwise matchings -- 3.3 Addressing the velocity challenge -- 3.3.1 Incremental record linkage -- 3.4 Addressing the variety challenge -- 3.4.1 Linking text snippets to structured data -- 3.5 Addressing the veracity challenge -- 3.5.1 Temporal record linkage -- 3.5.2 Record linkage with uniqueness constraints -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
4. BDI: data fusion -- 4.1 Traditional data fusion: a quick tour -- 4.2 Addressing the veracity challenge -- 4.2.1 Accuracy of a source -- 4.2.2 Probability of a value being true -- 4.2.3 Copying between sources -- 4.2.4 The end-to-end solution -- 4.2.5 Extensions and alternatives -- 4.3 Addressing the volume challenge -- 4.3.1 A MapReduce-based framework for offline fusion -- 4.3.2 Online data fusion -- 4.4 Addressing the velocity challenge -- 4.5 Addressing the variety challenge -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
5. BDI: emerging topics -- 5.1 Role of crowdsourcing -- 5.1.1 Leveraging transitive relations -- 5.1.2 Crowdsourcing the end-to-end workflow -- 5.1.3 Future work -- 5.2 Source selection -- 5.2.1 Static sources -- 5.2.2 Dynamic sources -- 5.2.3 Future work -- 5.3 Source profiling -- 5.3.1 The Bellman system -- 5.3.2 Summarizing sources -- 5.3.3 Future work -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
6. Conclusions -- Bibliography -- Authors' biographies -- Index. |
506 1# - RESTRICTIONS ON ACCESS NOTE |
Terms governing access |
Abstract freely available; full-text restricted to subscribers or individual document purchasers. |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Compendex |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
INSPEC |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Google scholar |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Google book search |
520 3# - SUMMARY, ETC. |
Summary, etc. |
The big data era is upon us: data are being generated, analyzed, and used at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Since the value of data explodes when it can be linked and fused with other data, addressing the big data integration (BDI) challenge is critical to realizing the promise of big data. BDI differs from traditional data integration along the dimensions of volume, velocity, variety, and veracity. First, not only can data sources contain a huge volume of data, but also the number of data sources is now in the millions. Second, because of the rate at which newly collected data are made available, many of the data sources are very dynamic, and the number of data sources is also rapidly exploding. Third, data sources are extremely heterogeneous in their structure and content, exhibiting considerable variety even for substantially similar entities. Fourth, the data sources are of widely differing qualities, with significant differences in the coverage, accuracy and timeliness of data provided. This book explores the progress that has been made by the data integration community on the topics of schema alignment, record linkage and data fusion in addressing these novel challenges faced by big data integration. Each of these topics is covered in a systematic way: first starting with a quick tour of the topic in the context of traditional data integration, followed by a detailed, example-driven exposition of recent innovative techniques that have been proposed to address the BDI challenges of volume, velocity, variety, and veracity. Finally, it presents emerging topics and opportunities that are specific to BDI, identifying promising directions for the data integration community. |
530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE |
Additional physical form available note |
Also available in print. |
588 ## - SOURCE OF DESCRIPTION NOTE |
Source of description note |
Title from PDF title page (viewed on March 20, 2015). |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Big data. |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Data integration (Computer science) |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
big data integration |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
data fusion |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
record linkage |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
schema alignment |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
variety |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
velocity |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
veracity |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
volume |
700 1# - ADDED ENTRY--PERSONAL NAME |
Personal name |
Srivastava, Divesh., |
Relator term |
author. |
776 08 - ADDITIONAL PHYSICAL FORM ENTRY |
Relationship information |
Print version: |
International Standard Book Number |
9781627052238 |
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE |
Uniform title |
Synthesis digital library of engineering and computer science. |
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE |
Uniform title |
Synthesis lectures on data management ; |
Volume/sequential designation |
# 40. |
International Standard Serial Number |
2153-5426 |
856 42 - ELECTRONIC LOCATION AND ACCESS |
Materials specified |
Abstract with links to resource |
Uniform Resource Identifier |
http://ieeexplore.ieee.org/servlet/opac?bknumber=7065199 |