000 | 05679nam a22007451i 4500 | ||
---|---|---|---|
001 | 8540360 | ||
003 | IEEE | ||
005 | 20200413152928.0 | ||
006 | m eo d | ||
007 | cr cn |||m|||a | ||
008 | 181128s2019 caua foab 000 0 eng d | ||
020 |
_a9781681734477 _qebook |
||
020 |
_z9781681734484 _qhardcover |
||
020 |
_z9781681734460 _qpaperback |
||
024 | 7 |
_a10.2200/S00878ED1V01Y201810DTM052 _2doi |
|
035 | _a(CaBNVSL)swl000408790 | ||
035 | _a(OCoLC)1076493845 | ||
040 |
_aCaBNVSL _beng _erda _cCaBNVSL _dCaBNVSL |
||
050 | 4 |
_aZ666.7 _b.A233 2019 |
|
082 | 0 | 4 |
_a025.3 _223 |
100 | 1 |
_aAbedjan, Ziawasch, _eauthor. |
|
245 | 1 | 0 |
_aData profiling / _cZiawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock. |
264 | 1 |
_a[San Rafael, California] : _bMorgan & Claypool, _c2019. |
|
300 |
_a1 PDF (xviii, 136 pages) : _billustrations. |
||
336 |
_atext _2rdacontent |
||
337 |
_aelectronic _2isbdmedia |
||
338 |
_aonline resource _2rdacarrier |
||
490 | 1 |
_aSynthesis lectures on data management, _x2153-5426 ; _v# 52 |
|
538 | _aMode of access: World Wide Web. | ||
538 | _aSystem requirements: Adobe Acrobat Reader. | ||
500 | _aPart of: Synthesis digital library of engineering and computer science. | ||
504 | _aIncludes bibliographical references (pages 113-134). | ||
505 | 0 | _a1. Discovering metadata -- 1.1 Motivation and overview -- 1.2 Data profiling and data mining -- 1.3 Use cases -- 1.4 Organization of this book -- | |
505 | 8 | _a2. Data profiling tasks -- 2.1 Single-column analysis -- 2.2 Dependency discovery -- 2.3 Relaxed dependencies -- | |
505 | 8 | _a3. Single-column analysis -- 3.1 Cardinalities -- 3.2 Value distributions -- 3.3 Data types, patterns, and domains -- 3.4 Data completeness -- 3.5 Approximate statistics -- 3.6 Summary and discussion -- | |
505 | 8 | _a4. Dependency discovery -- 4.1 Dependency definitions -- 4.2 Search space and data structures -- 4.3 Discovering unique column combinations -- 4.4 Discovering functional dependencies -- 4.5 Discovering inclusion dependencies -- | |
505 | 8 | _a5. Relaxed and other dependencies -- 5.1 Relaxing the extent of a dependency -- 5.1.1 Partial dependencies -- 5.1.2 Conditional dependencies -- 5.2 Relaxing attribute comparisons -- 5.2.1 Metric and matching dependencies -- 5.2.2 Order and sequential dependencies -- 5.3 Approximating the dependency discovery -- 5.4 Generalizing functional dependencies -- 5.4.1 Denial constraints -- 5.4.2 Multivalued dependencies -- | |
505 | 8 | _a6. Use cases -- 6.1 Data exploration -- 6.2 Schema engineering -- 6.3 Data cleaning -- 6.4 Query optimization -- 6.5 Data integration -- | |
505 | 8 | _a7. Profiling non-relational data -- 7.1 XML -- 7.2 RDF -- 7.3 Time series -- 7.4 Graphs -- 7.5 Text -- | |
505 | 8 | _a8. Data profiling tools -- 8.1 Research prototypes -- 8.2 Commercial tools -- | |
505 | 8 | _a9. Data profiling challenges -- 9.1 Functional challenges -- 9.1.1 Profiling dynamic data -- 9.1.2 Interactive profiling -- 9.1.3 Profiling for integration -- 9.1.4 Interpreting profiling results -- 9.2 Non-functional challenges -- 9.2.1 Efficiency and scalability -- 9.2.2 Profiling on new architectures -- 9.2.3 Benchmarking profiling methods -- | |
505 | 8 | _a10. Conclusions -- Bibliography -- Authors' biographies. | |
506 | _aAbstract freely available; full-text restricted to subscribers or individual document purchasers. | ||
510 | 0 | _aCompendex | |
510 | 0 | _aINSPEC | |
510 | 0 | _aGoogle scholar | |
510 | 0 | _aGoogle book search | |
520 | 3 | _aData profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More complex types of metadata are statements about multiple columns and their correlation, such as candidate keys, functional dependencies, and other types of dependencies. This book provides a classification of the various types of profilable metadata, discusses popular data profiling tasks, and surveys state-of-the-art profiling algorithms. While most of the book focuses on tasks and algorithms for relational data profiling, we also briefly discuss systems and techniques for profiling non-relational data such as graphs and text. We conclude with a discussion of data profiling challenges and directions for future work in this area. | |
530 | _aAlso available in print. | ||
588 | _aTitle from PDF title page (viewed on November 28, 2018). | ||
650 | 0 | _aMetadata. | |
650 | 0 | _aData mining. | |
653 | _adata analysis | ||
653 | _adata modeling | ||
653 | _adependency discovery | ||
653 | _adata mining | ||
653 | _ametadata | ||
700 | 1 |
_aGolab, Lukasz, _d1978-, _eauthor. |
|
700 | 1 |
_aNaumann, Felix, _eauthor. |
|
700 | 1 |
_aPapenbrock, Thorsten, _eauthor. |
|
776 | 0 | 8 |
_iPrint version: _z9781681734460 _z9781681734484 |
830 | 0 | _aSynthesis digital library of engineering and computer science. | |
830 | 0 |
_aSynthesis lectures on data management ; _v# 52. _x2153-5426 |
|
856 | 4 | 2 |
_3Abstract with links to resource _uhttps://ieeexplore.ieee.org/servlet/opac?bknumber=8540360 |
999 |
_c562332 _d562332 |