000 -LEADER |
fixed length control field |
05679nam a22007451i 4500 |
001 - CONTROL NUMBER |
control field |
8540360 |
003 - CONTROL NUMBER IDENTIFIER |
control field |
IEEE |
005 - DATE AND TIME OF LATEST TRANSACTION |
control field |
20200413152928.0 |
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS |
fixed length control field |
m eo d |
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION |
fixed length control field |
cr cn |||m|||a |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
fixed length control field |
181128s2019 caua foab 000 0 eng d |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
International Standard Book Number |
9781681734477 |
Qualifying information |
ebook |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
Canceled/invalid ISBN |
9781681734484 |
Qualifying information |
hardcover |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
Canceled/invalid ISBN |
9781681734460 |
Qualifying information |
paperback |
024 7# - OTHER STANDARD IDENTIFIER |
Standard number or code |
10.2200/S00878ED1V01Y201810DTM052 |
Source of number or code |
doi |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(CaBNVSL)swl000408790 |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(OCoLC)1076493845 |
040 ## - CATALOGING SOURCE |
Original cataloging agency |
CaBNVSL |
Language of cataloging |
eng |
Description conventions |
rda |
Transcribing agency |
CaBNVSL |
Modifying agency |
CaBNVSL |
050 #4 - LIBRARY OF CONGRESS CALL NUMBER |
Classification number |
Z666.7 |
Item number |
.A233 2019 |
082 04 - DEWEY DECIMAL CLASSIFICATION NUMBER |
Classification number |
025.3 |
Edition number |
23 |
100 1# - MAIN ENTRY--PERSONAL NAME |
Personal name |
Abedjan, Ziawasch, |
Relator term |
author. |
245 10 - TITLE STATEMENT |
Title |
Data profiling / |
Statement of responsibility, etc. |
Ziawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock. |
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE |
Place of production, publication, distribution, manufacture |
[San Rafael, California] : |
Name of producer, publisher, distributor, manufacturer |
Morgan & Claypool, |
Date of production, publication, distribution, manufacture, or copyright notice |
2019. |
300 ## - PHYSICAL DESCRIPTION |
Extent |
1 PDF (xviii, 136 pages) : |
Other physical details |
illustrations. |
336 ## - CONTENT TYPE |
Content type term |
text |
Source |
rdacontent |
337 ## - MEDIA TYPE |
Media type term |
electronic |
Source |
isbdmedia |
338 ## - CARRIER TYPE |
Carrier type term |
online resource |
Source |
rdacarrier |
490 1# - SERIES STATEMENT |
Series statement |
Synthesis lectures on data management, |
International Standard Serial Number |
2153-5426 ; |
Volume/sequential designation |
# 52 |
538 ## - SYSTEM DETAILS NOTE |
System details note |
Mode of access: World Wide Web. |
538 ## - SYSTEM DETAILS NOTE |
System details note |
System requirements: Adobe Acrobat Reader. |
500 ## - GENERAL NOTE |
General note |
Part of: Synthesis digital library of engineering and computer science. |
504 ## - BIBLIOGRAPHY, ETC. NOTE |
Bibliography, etc. note |
Includes bibliographical references (pages 113-134). |
505 0# - FORMATTED CONTENTS NOTE |
Formatted contents note |
1. Discovering metadata -- 1.1 Motivation and overview -- 1.2 Data profiling and data mining -- 1.3 Use cases -- 1.4 Organization of this book -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
2. Data profiling tasks -- 2.1 Single-column analysis -- 2.2 Dependency discovery -- 2.3 Relaxed dependencies -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
3. Single-column analysis -- 3.1 Cardinalities -- 3.2 Value distributions -- 3.3 Data types, patterns, and domains -- 3.4 Data completeness -- 3.5 Approximate statistics -- 3.6 Summary and discussion -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
4. Dependency discovery -- 4.1 Dependency definitions -- 4.2 Search space and data structures -- 4.3 Discovering unique column combinations -- 4.4 Discovering functional dependencies -- 4.5 Discovering inclusion dependencies -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
5. Relaxed and other dependencies -- 5.1 Relaxing the extent of a dependency -- 5.1.1 Partial dependencies -- 5.1.2 Conditional dependencies -- 5.2 Relaxing attribute comparisons -- 5.2.1 Metric and matching dependencies -- 5.2.2 Order and sequential dependencies -- 5.3 Approximating the dependency discovery -- 5.4 Generalizing functional dependencies -- 5.4.1 Denial constraints -- 5.4.2 Multivalued dependencies -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
6. Use cases -- 6.1 Data exploration -- 6.2 Schema engineering -- 6.3 Data cleaning -- 6.4 Query optimization -- 6.5 Data integration -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
7. Profiling non-relational data -- 7.1 XML -- 7.2 RDF -- 7.3 Time series -- 7.4 Graphs -- 7.5 Text -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
8. Data profiling tools -- 8.1 Research prototypes -- 8.2 Commercial tools -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
9. Data profiling challenges -- 9.1 Functional challenges -- 9.1.1 Profiling dynamic data -- 9.1.2 Interactive profiling -- 9.1.3 Profiling for integration -- 9.1.4 Interpreting profiling results -- 9.2 Non-functional challenges -- 9.2.1 Efficiency and scalability -- 9.2.2 Profiling on new architectures -- 9.2.3 Benchmarking profiling methods -- |
505 8# - FORMATTED CONTENTS NOTE |
Formatted contents note |
10. Conclusions -- Bibliography -- Authors' biographies. |
506 ## - RESTRICTIONS ON ACCESS NOTE |
Terms governing access |
Abstract freely available; full-text restricted to subscribers or individual document purchasers. |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Compendex |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
INSPEC |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Google scholar |
510 0# - CITATION/REFERENCES NOTE |
Name of source |
Google book search |
520 3# - SUMMARY, ETC. |
Summary, etc. |
Data profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More complex types of metadata are statements about multiple columns and their correlation, such as candidate keys, functional dependencies, and other types of dependencies. This book provides a classification of the various types of profilable metadata, discusses popular data profiling tasks, and surveys state-of-the-art profiling algorithms. While most of the book focuses on tasks and algorithms for relational data profiling, we also briefly discuss systems and techniques for profiling non-relational data such as graphs and text. We conclude with a discussion of data profiling challenges and directions for future work in this area. |
530 ## - ADDITIONAL PHYSICAL FORM AVAILABLE NOTE |
Additional physical form available note |
Also available in print. |
588 ## - SOURCE OF DESCRIPTION NOTE |
Source of description note |
Title from PDF title page (viewed on November 28, 2018). |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Metadata. |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Data mining. |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
data analysis |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
data modeling |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
dependency discovery |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
data mining |
653 ## - INDEX TERM--UNCONTROLLED |
Uncontrolled term |
metadata |
700 1# - ADDED ENTRY--PERSONAL NAME |
Personal name |
Golab, Lukasz, |
Dates associated with a name |
1978-, |
Relator term |
author. |
700 1# - ADDED ENTRY--PERSONAL NAME |
Personal name |
Naumann, Felix, |
Relator term |
author. |
700 1# - ADDED ENTRY--PERSONAL NAME |
Personal name |
Papenbrock, Thorsten, |
Relator term |
author. |
776 08 - ADDITIONAL PHYSICAL FORM ENTRY |
Relationship information |
Print version: |
International Standard Book Number |
9781681734460 |
-- |
9781681734484 |
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE |
Uniform title |
Synthesis digital library of engineering and computer science. |
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE |
Uniform title |
Synthesis lectures on data management ; |
Volume/sequential designation |
# 52. |
International Standard Serial Number |
2153-5426 |
856 42 - ELECTRONIC LOCATION AND ACCESS |
Materials specified |
Abstract with links to resource |
Uniform Resource Identifier |
https://ieeexplore.ieee.org/servlet/opac?bknumber=8540360 |