MARC View

000			05679nam a22007451i 4500
001			8540360
003			IEEE
005			20200413152928.0
006			m eo d
007			cr cn \|\|\|m\|\|\|a
008			181128s2019 caua foab 000 0 eng d
020			_a9781681734477 _qebook
020			_z9781681734484 _qhardcover
020			_z9781681734460 _qpaperback
024	7		_a10.2200/S00878ED1V01Y201810DTM052 _2doi
035			_a(CaBNVSL)swl000408790
035			_a(OCoLC)1076493845
040			_aCaBNVSL _beng _erda _cCaBNVSL _dCaBNVSL
050		4	_aZ666.7 _b.A233 2019
082	0	4	_a025.3 _223
100	1		_aAbedjan, Ziawasch, _eauthor.
245	1	0	_aData profiling / _cZiawasch Abedjan, Lukasz Golab, Felix Naumann, Thorsten Papenbrock.
264		1	_a[San Rafael, California] : _bMorgan & Claypool, _c2019.
300			_a1 PDF (xviii, 136 pages) : _billustrations.
336			_atext _2rdacontent
337			_aelectronic _2isbdmedia
338			_aonline resource _2rdacarrier
490	1		_aSynthesis lectures on data management, _x2153-5426 ; _v# 52
538			_aMode of access: World Wide Web.
538			_aSystem requirements: Adobe Acrobat Reader.
500			_aPart of: Synthesis digital library of engineering and computer science.
504			_aIncludes bibliographical references (pages 113-134).
505	0		_a1. Discovering metadata -- 1.1 Motivation and overview -- 1.2 Data profiling and data mining -- 1.3 Use cases -- 1.4 Organization of this book --
505	8		_a2. Data profiling tasks -- 2.1 Single-column analysis -- 2.2 Dependency discovery -- 2.3 Relaxed dependencies --
505	8		_a3. Single-column analysis -- 3.1 Cardinalities -- 3.2 Value distributions -- 3.3 Data types, patterns, and domains -- 3.4 Data completeness -- 3.5 Approximate statistics -- 3.6 Summary and discussion --
505	8		_a4. Dependency discovery -- 4.1 Dependency definitions -- 4.2 Search space and data structures -- 4.3 Discovering unique column combinations -- 4.4 Discovering functional dependencies -- 4.5 Discovering inclusion dependencies --
505	8		_a5. Relaxed and other dependencies -- 5.1 Relaxing the extent of a dependency -- 5.1.1 Partial dependencies -- 5.1.2 Conditional dependencies -- 5.2 Relaxing attribute comparisons -- 5.2.1 Metric and matching dependencies -- 5.2.2 Order and sequential dependencies -- 5.3 Approximating the dependency discovery -- 5.4 Generalizing functional dependencies -- 5.4.1 Denial constraints -- 5.4.2 Multivalued dependencies --
505	8		_a6. Use cases -- 6.1 Data exploration -- 6.2 Schema engineering -- 6.3 Data cleaning -- 6.4 Query optimization -- 6.5 Data integration --
505	8		_a7. Profiling non-relational data -- 7.1 XML -- 7.2 RDF -- 7.3 Time series -- 7.4 Graphs -- 7.5 Text --
505	8		_a8. Data profiling tools -- 8.1 Research prototypes -- 8.2 Commercial tools --
505	8		_a9. Data profiling challenges -- 9.1 Functional challenges -- 9.1.1 Profiling dynamic data -- 9.1.2 Interactive profiling -- 9.1.3 Profiling for integration -- 9.1.4 Interpreting profiling results -- 9.2 Non-functional challenges -- 9.2.1 Efficiency and scalability -- 9.2.2 Profiling on new architectures -- 9.2.3 Benchmarking profiling methods --
505	8		_a10. Conclusions -- Bibliography -- Authors' biographies.
506			_aAbstract freely available; full-text restricted to subscribers or individual document purchasers.
510	0		_aCompendex
510	0		_aINSPEC
510	0		_aGoogle scholar
510	0		_aGoogle book search
520	3		_aData profiling refers to the activity of collecting data about data, i.e., metadata. Most IT professionals and researchers who work with data have engaged in data profiling, at least informally, to understand and explore an unfamiliar dataset or to determine whether a new dataset is appropriate for a particular task at hand. Data profiling results are also important in a variety of other situations, including query optimization, data integration, and data cleaning. Simple metadata are statistics, such as the number of rows and columns, schema and datatype information, the number of distinct values, statistical value distributions, and the number of null or empty values in each column. More complex types of metadata are statements about multiple columns and their correlation, such as candidate keys, functional dependencies, and other types of dependencies. This book provides a classification of the various types of profilable metadata, discusses popular data profiling tasks, and surveys state-of-the-art profiling algorithms. While most of the book focuses on tasks and algorithms for relational data profiling, we also briefly discuss systems and techniques for profiling non-relational data such as graphs and text. We conclude with a discussion of data profiling challenges and directions for future work in this area.
530			_aAlso available in print.
588			_aTitle from PDF title page (viewed on November 28, 2018).
650		0	_aMetadata.
650		0	_aData mining.
653			_adata analysis
653			_adata modeling
653			_adependency discovery
653			_adata mining
653			_ametadata
700	1		_aGolab, Lukasz, _d1978-, _eauthor.
700	1		_aNaumann, Felix, _eauthor.
700	1		_aPapenbrock, Thorsten, _eauthor.
776	0	8	_iPrint version: _z9781681734460 _z9781681734484
830		0	_aSynthesis digital library of engineering and computer science.
830		0	_aSynthesis lectures on data management ; _v# 52. _x2153-5426
856	4	2	_3Abstract with links to resource _uhttps://ieeexplore.ieee.org/servlet/opac?bknumber=8540360
999			_c562332 _d562332