Welcome to P K Kelkar Library, Online Public Access Catalogue (OPAC)

Normal view MARC view ISBD view

Data profiling /

By: Abedjan, Ziawasch [author.].
Contributor(s): Golab, Lukasz 1978-, [author.] | Naumann, Felix [author.] | Papenbrock, Thorsten [author.].
Material type: materialTypeLabelBookSeries: Synthesis digital library of engineering and computer science: ; Synthesis lectures on data management: # 52.Publisher: [San Rafael, California] : Morgan & Claypool, 2019.Description: 1 PDF (xviii, 136 pages) : illustrations.Content type: text Media type: electronic Carrier type: online resourceISBN: 9781681734477.Subject(s): Metadata | Data mining | data analysis | data modeling | dependency discovery | data mining | metadataDDC classification: 025.3 Online resources: Abstract with links to resource Also available in print.
Contents:
1. Discovering metadata -- 1.1 Motivation and overview -- 1.2 Data profiling and data mining -- 1.3 Use cases -- 1.4 Organization of this book --
2. Data profiling tasks -- 2.1 Single-column analysis -- 2.2 Dependency discovery -- 2.3 Relaxed dependencies --
3. Single-column analysis -- 3.1 Cardinalities -- 3.2 Value distributions -- 3.3 Data types, patterns, and domains -- 3.4 Data completeness -- 3.5 Approximate statistics -- 3.6 Summary and discussion --
4. Dependency discovery -- 4.1 Dependency definitions -- 4.2 Search space and data structures -- 4.3 Discovering unique column combinations -- 4.4 Discovering functional dependencies -- 4.5 Discovering inclusion dependencies --
5. Relaxed and other dependencies -- 5.1 Relaxing the extent of a dependency -- 5.1.1 Partial dependencies -- 5.1.2 Conditional dependencies -- 5.2 Relaxing attribute comparisons -- 5.2.1 Metric and matching dependencies -- 5.2.2 Order and sequential dependencies -- 5.3 Approximating the dependency discovery -- 5.4 Generalizing functional dependencies -- 5.4.1 Denial constraints -- 5.4.2 Multivalued dependencies --
6. Use cases -- 6.1 Data exploration -- 6.2 Schema engineering -- 6.3 Data cleaning -- 6.4 Query optimization -- 6.5 Data integration --
7. Profiling non-relational data -- 7.1 XML -- 7.2 RDF -- 7.3 Time series -- 7.4 Graphs -- 7.5 Text --
8. Data profiling tools -- 8.1 Research prototypes -- 8.2 Commercial tools --
9. Data profiling challenges -- 9.1 Functional challenges -- 9.1.1 Profiling dynamic data -- 9.1.2 Interactive profiling -- 9.1.3 Profiling for integration -- 9.1.4 Interpreting profiling results -- 9.2 Non-functional challenges -- 9.2.1 Efficiency and scalability -- 9.2.2 Profiling on new architectures -- 9.2.3 Benchmarking profiling methods --
10. Conclusions -- Bibliography -- Authors' biographies.
Abstract: Data profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More complex types of metadata are statements about multiple columns and their correlation, such as candidate keys, functional dependencies, and other types of dependencies. This book provides a classification of the various types of profilable metadata, discusses popular data profiling tasks, and surveys state-of-the-art profiling algorithms. While most of the book focuses on tasks and algorithms for relational data profiling, we also briefly discuss systems and techniques for profiling non-relational data such as graphs and text. We conclude with a discussion of data profiling challenges and directions for future work in this area.
    average rating: 0.0 (0 votes)
Item type Current location Call number Status Date due Barcode Item holds
E books E books PK Kelkar Library, IIT Kanpur
Available EBKE832
Total holds: 0

Mode of access: World Wide Web.

System requirements: Adobe Acrobat Reader.

Part of: Synthesis digital library of engineering and computer science.

Includes bibliographical references (pages 113-134).

1. Discovering metadata -- 1.1 Motivation and overview -- 1.2 Data profiling and data mining -- 1.3 Use cases -- 1.4 Organization of this book --

2. Data profiling tasks -- 2.1 Single-column analysis -- 2.2 Dependency discovery -- 2.3 Relaxed dependencies --

3. Single-column analysis -- 3.1 Cardinalities -- 3.2 Value distributions -- 3.3 Data types, patterns, and domains -- 3.4 Data completeness -- 3.5 Approximate statistics -- 3.6 Summary and discussion --

4. Dependency discovery -- 4.1 Dependency definitions -- 4.2 Search space and data structures -- 4.3 Discovering unique column combinations -- 4.4 Discovering functional dependencies -- 4.5 Discovering inclusion dependencies --

5. Relaxed and other dependencies -- 5.1 Relaxing the extent of a dependency -- 5.1.1 Partial dependencies -- 5.1.2 Conditional dependencies -- 5.2 Relaxing attribute comparisons -- 5.2.1 Metric and matching dependencies -- 5.2.2 Order and sequential dependencies -- 5.3 Approximating the dependency discovery -- 5.4 Generalizing functional dependencies -- 5.4.1 Denial constraints -- 5.4.2 Multivalued dependencies --

6. Use cases -- 6.1 Data exploration -- 6.2 Schema engineering -- 6.3 Data cleaning -- 6.4 Query optimization -- 6.5 Data integration --

7. Profiling non-relational data -- 7.1 XML -- 7.2 RDF -- 7.3 Time series -- 7.4 Graphs -- 7.5 Text --

8. Data profiling tools -- 8.1 Research prototypes -- 8.2 Commercial tools --

9. Data profiling challenges -- 9.1 Functional challenges -- 9.1.1 Profiling dynamic data -- 9.1.2 Interactive profiling -- 9.1.3 Profiling for integration -- 9.1.4 Interpreting profiling results -- 9.2 Non-functional challenges -- 9.2.1 Efficiency and scalability -- 9.2.2 Profiling on new architectures -- 9.2.3 Benchmarking profiling methods --

10. Conclusions -- Bibliography -- Authors' biographies.

Abstract freely available; full-text restricted to subscribers or individual document purchasers.

Compendex

INSPEC

Google scholar

Google book search

Data profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More complex types of metadata are statements about multiple columns and their correlation, such as candidate keys, functional dependencies, and other types of dependencies. This book provides a classification of the various types of profilable metadata, discusses popular data profiling tasks, and surveys state-of-the-art profiling algorithms. While most of the book focuses on tasks and algorithms for relational data profiling, we also briefly discuss systems and techniques for profiling non-relational data such as graphs and text. We conclude with a discussion of data profiling challenges and directions for future work in this area.

Also available in print.

Title from PDF title page (viewed on November 28, 2018).

There are no comments for this item.

Log in to your account to post a comment.

Powered by Koha