000 05679nam a22007451i 4500
001 8540360
003 IEEE
005 20200413152928.0
006 m eo d
007 cr cn |||m|||a
008 181128s2019 caua foab 000 0 eng d
020 _a9781681734477
_qebook
020 _z9781681734484
_qhardcover
020 _z9781681734460
_qpaperback
024 7 _a10.2200/S00878ED1V01Y201810DTM052
_2doi
035 _a(CaBNVSL)swl000408790
035 _a(OCoLC)1076493845
040 _aCaBNVSL
_beng
_erda
_cCaBNVSL
_dCaBNVSL
050 4 _aZ666.7
_b.A233 2019
082 0 4 _a025.3
_223
100 1 _aAbedjan, Ziawasch,
_eauthor.
245 1 0 _aData profiling /
_cZiawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock.
264 1 _a[San Rafael, California] :
_bMorgan & Claypool,
_c2019.
300 _a1 PDF (xviii, 136 pages) :
_billustrations.
336 _atext
_2rdacontent
337 _aelectronic
_2isbdmedia
338 _aonline resource
_2rdacarrier
490 1 _aSynthesis lectures on data management,
_x2153-5426 ;
_v# 52
538 _aMode of access: World Wide Web.
538 _aSystem requirements: Adobe Acrobat Reader.
500 _aPart of: Synthesis digital library of engineering and computer science.
504 _aIncludes bibliographical references (pages 113-134).
505 0 _a1. Discovering metadata -- 1.1 Motivation and overview -- 1.2 Data profiling and data mining -- 1.3 Use cases -- 1.4 Organization of this book --
505 8 _a2. Data profiling tasks -- 2.1 Single-column analysis -- 2.2 Dependency discovery -- 2.3 Relaxed dependencies --
505 8 _a3. Single-column analysis -- 3.1 Cardinalities -- 3.2 Value distributions -- 3.3 Data types, patterns, and domains -- 3.4 Data completeness -- 3.5 Approximate statistics -- 3.6 Summary and discussion --
505 8 _a4. Dependency discovery -- 4.1 Dependency definitions -- 4.2 Search space and data structures -- 4.3 Discovering unique column combinations -- 4.4 Discovering functional dependencies -- 4.5 Discovering inclusion dependencies --
505 8 _a5. Relaxed and other dependencies -- 5.1 Relaxing the extent of a dependency -- 5.1.1 Partial dependencies -- 5.1.2 Conditional dependencies -- 5.2 Relaxing attribute comparisons -- 5.2.1 Metric and matching dependencies -- 5.2.2 Order and sequential dependencies -- 5.3 Approximating the dependency discovery -- 5.4 Generalizing functional dependencies -- 5.4.1 Denial constraints -- 5.4.2 Multivalued dependencies --
505 8 _a6. Use cases -- 6.1 Data exploration -- 6.2 Schema engineering -- 6.3 Data cleaning -- 6.4 Query optimization -- 6.5 Data integration --
505 8 _a7. Profiling non-relational data -- 7.1 XML -- 7.2 RDF -- 7.3 Time series -- 7.4 Graphs -- 7.5 Text --
505 8 _a8. Data profiling tools -- 8.1 Research prototypes -- 8.2 Commercial tools --
505 8 _a9. Data profiling challenges -- 9.1 Functional challenges -- 9.1.1 Profiling dynamic data -- 9.1.2 Interactive profiling -- 9.1.3 Profiling for integration -- 9.1.4 Interpreting profiling results -- 9.2 Non-functional challenges -- 9.2.1 Efficiency and scalability -- 9.2.2 Profiling on new architectures -- 9.2.3 Benchmarking profiling methods --
505 8 _a10. Conclusions -- Bibliography -- Authors' biographies.
506 _aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510 0 _aCompendex
510 0 _aINSPEC
510 0 _aGoogle scholar
510 0 _aGoogle book search
520 3 _aData profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More complex types of metadata are statements about multiple columns and their correlation, such as candidate keys, functional dependencies, and other types of dependencies. This book provides a classification of the various types of profilable metadata, discusses popular data profiling tasks, and surveys state-of-the-art profiling algorithms. While most of the book focuses on tasks and algorithms for relational data profiling, we also briefly discuss systems and techniques for profiling non-relational data such as graphs and text. We conclude with a discussion of data profiling challenges and directions for future work in this area.
530 _aAlso available in print.
588 _aTitle from PDF title page (viewed on November 28, 2018).
650 0 _aMetadata.
650 0 _aData mining.
653 _adata analysis
653 _adata modeling
653 _adependency discovery
653 _adata mining
653 _ametadata
700 1 _aGolab, Lukasz,
_d1978-,
_eauthor.
700 1 _aNaumann, Felix,
_eauthor.
700 1 _aPapenbrock, Thorsten,
_eauthor.
776 0 8 _iPrint version:
_z9781681734460
_z9781681734484
830 0 _aSynthesis digital library of engineering and computer science.
830 0 _aSynthesis lectures on data management ;
_v# 52.
_x2153-5426
856 4 2 _3Abstract with links to resource
_uhttps://ieeexplore.ieee.org/servlet/opac?bknumber=8540360
999 _c562332
_d562332