IV - CSE - Data Warehousing and Data Mining

download IV - CSE - Data Warehousing and Data Mining

of 4

Transcript of IV - CSE - Data Warehousing and Data Mining

  • 7/27/2019 IV - CSE - Data Warehousing and Data Mining



    lV Year B.Tech. CSE - I Sem


    L T'P'D C4 1t-t- 4


    UNIT Ilntroduction: Fundarnentals of data mlning, Data Mining Functionalities,Classification of Data Mining systems, Data Mining Task primitives,lntegration ofa Oata Mining System with a Database ora Data WarehouseSystem, Major issues in Daia Mining.Data Preprocesslng: Need for Preprocessing the Oata, Data Cleaning,Data lntegration and Transformation, Data Reduction, Oiscretization andConcept Hierarchy Generation.UNIT IIData Warehouse and OLAP Technology for Data Mining: Data'lWarehouse, Multidimensional Data Model, Oata Warehouse Architecture, IData Warehouse lmplementation, Further Development of Data Cube \Technology, From Data Warehousing to Data Mining

    ^, \)Data Cube Computation and Data Generalization: Efficient Methods for \Oata Cube Computation, Further Development of Data Cube and OLep )Technology, Attribute-Odented lnduction. I

    UNIT IIIMining Frsquent Paiterns, Associatlons and- Gorrslations, Srsl:l-Concepts, Effcient and Scalable Frequent ltemset Mining MetnoOs, Uining "\ q\ \various kinds of Association Rules, FromAssociation Mining to Conelation ,,i vAnalysis, Constraint-Based Association MiningUNIT IV \Classification and Prediction: lssues Regarding Classification and\Prediction, Classification by Oecision Tree tnduction, Bayesian/Classification, Rule-Based Classification, Ctassification bt{_Backpropagation, Support Vector Machines, Associative Classification,'\Lazy Learners, Other Classification Methods, Prediction, Accuracy and IEnor measures, Evaluating the accuracy of a Classifier or a Predictor, fEnsemble Methods /unitv \Cluster Analysis lntroduction :Types of Oata in Ciuster Analysis, A ICategorization of Maior Cluslering Methods, Partitioning Methods. tHierarchical Methods, Density-Based Methods, Grid-Based lvlethods. )

  • 7/27/2019 IV - CSE - Data Warehousing and Data Mining


  • 7/27/2019 IV - CSE - Data Warehousing and Data Mining


    w.e.t.2010-201 1 academic year


    DATA WAREHOUSING AND DATA MININGUnit-l:lntroduciion to Data Mining: What is data mining, motivating challenges, origins of datamining, data mining tasks, Types of Data-attributes and miasuremints, types of datasets, Data Quality (Tan)Unit-ll:Dta preprocessing, Measures of Similarity and Oissimilarity: Basics, similarity anddissimilarity between simple attributes, dissimilarities between data objects, simiriritiesbetween data objects, examples of. proximity-measures: similarity ,"""rr"" for binarydata, Jaccard coefficient, Cosine similarity, Extended Jaccard ctefficient, Conelation,Exploring Data : Data Set, Summary Statistics (Tan)Unitlll:Data Warehouse: basic concepts:, Data Warehousing Modeling: Data Cube and OLAp,Data Warehouse implementation : efficient Olta cuU6 computation, partiaimaterialization, indexing OLAP data, efficient processing of OLAp queries. ( H & i)Uniuv:Classification: Basic Concepts, General approach to solving a classification problem,Decision Tree induction: working of decision tree, building aiecision tree, meihods foiexpressing attribute test conditions, measures for selecting the best split, Algorithm fordecision tree induction.Model over fitting: Due to presence- of noise, due to lack of represeniation samples,evaluating the performance of classifier: holdout method, random sub sampling, cioss_validation, bootstrap. (l-an)Unit-V:Classification-Alternative techniques: Bayesian Classifier: Bayes theorem, using bayestheorm for classification, Nai've Bayes classifier, Bayes erior rate, eayesian'eelletNetworks: Model representation, model building (Tan)Unit-Vl:Association Analysis: Problem Definition, Frequent ltem-set generation_ The Aprioriprinciple , Frequent ltem set generation in the Apriori algorithri, candidate generationand pruning, support counting (eluding support counting using a Hash tree) , Rulegeneration, compact representation of frequent item sets, Fp-Growth Algorithmj. Gan)

  • 7/27/2019 IV - CSE - Data Warehousing and Data Mining


    w.e.f.20 1 0-201 1 academic year

    Unit-Vll:Overview- types of clusledng, Basic K-means, K -means -additional issues, Bisectingk-means, k-means and different types of clusters, strengths and weaknesses, k_meanias an optimization problem.Unit-Vlll:Agglomerative Hierarchical cl'lslelrlg, basic agglomerative hierarchical clusteringalgorithm, specific techniques, DBSCAN: Traditionat density: center_based approach]. strengths and weaknesses (Tan)TEXT BOOKS:

    l. lntroduction to Data Mining : pang-Ning tan, Michael Steinbach, Vipin Kumar,Pearson2. Data l\rining ,Concepts and Techniques, 3/e, Jiawei Han , Micheline Kamber ,Elsevier

    REFERENCE BOOKS:l. lntroduction to Data Mining with Case Studies 2nd ed: GK Gupta; pHl.2. Data Mining : lntroductory and Advanced Topics : Dunham, Siidhar, pearson.3. Data Warehousing, Data Mining & OLAP, Alex Berson, Stephen J Smith, TMH4. Data Mining Theory and practice, Soman, Diwakar, Aiay, pHl,2006.