[Virtual Presenter] 18cse355 tons Data Mining And Analytics.
[Virtual Presenter] unit 1 Why Data mining? What is Data mining ? Kinds of data meant for mining Kinds of patterns that can be mined Applications suitable for data mining Issues in Data mining Data objects and Attribute types Statistical descriptions of data Need for data preprocessing and data quality Data cleaning Data integration Data reduction Data transformation Data cube and its usage.
[Audio] Why Data mining? The Explosive Growth of Data: from terabytes to petabytes Data collection and data availability Automated data collection tools, database systems, Web, computerized society Major sources of abundant data Business: Web, e commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, YouTube We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets.
[Audio] What is Data mining ? Data mining (knowledge discovery from data) Extraction of interesting (non trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data Data mining: a misnomer(incorrectly applied)? Alternative names Knowledge discovery (mining) in databases (K-D-D--), knowledge extraction, data/pattern analysis, data archeology, data dredging (misuse of data analysis to find pattern),, information harvesting, business intelligence, et cetera.
[Audio] Knowledge discovery process Data processing Data cleaning (remove noise and inconsistent data) Data integration (multiple data sources maybe combined) Data selection (data relevant to the analysis task are retrieved from database) Data transformation (data transformed or consolidated into forms appropriate for mining) (Done with data preprocessing) Data mining (an essential process where intelligent methods are applied to extract data patterns) Pattern evaluation (identify the truly interesting patterns) Knowledge presentation (mined knowledge is presented to the user with visualization or representation techniques).