Quick R Tutorial for Data Mining - cis.csuohio.edu

81
Quick R Tutorial for Data Mining Quick R Tutorial for Data Mining Quick R Tutorial for Data Mining Quick R Tutorial for Data Mining CIS 660 Data Mining Sunnie Chung 1

Transcript of Quick R Tutorial for Data Mining - cis.csuohio.edu

Page 1: Quick R Tutorial for Data Mining - cis.csuohio.edu

Quick R Tutorial for Data MiningQuick R Tutorial for Data MiningQuick R Tutorial for Data MiningQuick R Tutorial for Data Mining

CIS 660 Data Mining

Sunnie Chung1

Page 2: Quick R Tutorial for Data Mining - cis.csuohio.edu

Getting data from twitter Streaming API Getting data from twitter Streaming API Getting data from twitter Streaming API Getting data from twitter Streaming API

• Install Tweepy

CIS 660 Data Mining

Sunnie Chung2

Page 3: Quick R Tutorial for Data Mining - cis.csuohio.edu

Getting data from twitter Streaming API Getting data from twitter Streaming API Getting data from twitter Streaming API Getting data from twitter Streaming API

• Install “pip”

For install pip, first download “get-pip.py” then run following commands:

• “sudo python get-pip.py”

CIS 660 Data Mining

Sunnie Chung3

Page 4: Quick R Tutorial for Data Mining - cis.csuohio.edu

Getting data from twitter Streaming APIGetting data from twitter Streaming APIGetting data from twitter Streaming APIGetting data from twitter Streaming API

• “sudo pip install tweepy”

CIS 660 Data Mining

Sunnie Chung4

Page 5: Quick R Tutorial for Data Mining - cis.csuohio.edu

Getting data from twitter Streaming API Getting data from twitter Streaming API Getting data from twitter Streaming API Getting data from twitter Streaming API

• Now,Create a twitter account.

• Go to https://apps.twitter.com/ and log in with your twitter credentials.

• Click "Create New App"

• Fill out the form, agree to the terms, and click "Create your Twitter application"

• In the next page, click on "API keys" tab, and copy your "API key" and "API secret".

• Scroll down and click "Create my access token", and copy your "Access token" and "Access token secret".

CIS 660 Data Mining

Sunnie Chung5

Page 6: Quick R Tutorial for Data Mining - cis.csuohio.edu

Connecting to Twitter streaming API and downloading Connecting to Twitter streaming API and downloading Connecting to Twitter streaming API and downloading Connecting to Twitter streaming API and downloading datadatadatadata

CIS 660 Data Mining

Sunnie Chung6

Page 7: Quick R Tutorial for Data Mining - cis.csuohio.edu

CIS 660 Data Mining

Sunnie Chung7

Page 8: Quick R Tutorial for Data Mining - cis.csuohio.edu

Extract user specific dataExtract user specific dataExtract user specific dataExtract user specific data

• Consider any JSON line, it contains User data for example

• { ,User:{ID:{ },Name:{ },------ }, }

• Extract below portion from JSON data.

• User:{ID:{ }, Name:{ },----------- }

• From above line, extract individual attributes such as

• ID:{ }

• Name:{ }

• Language:{ }

• Again get value from JSON data such as 12345 and Sunnie in below example.

• ID:{12345}

• Name:{Sunnie}

CIS 660 Data Mining

Sunnie Chung8

Page 9: Quick R Tutorial for Data Mining - cis.csuohio.edu

CIS 660 Data Mining

Sunnie Chung9

Page 10: Quick R Tutorial for Data Mining - cis.csuohio.edu

Data Mining with Twitter DataData Mining with Twitter DataData Mining with Twitter DataData Mining with Twitter Data

CIS 660 Data Mining

Sunnie Chung10

Page 11: Quick R Tutorial for Data Mining - cis.csuohio.edu

CIS 660 Data Mining

Sunnie Chung11

Page 12: Quick R Tutorial for Data Mining - cis.csuohio.edu

Twitter DataTwitter DataTwitter DataTwitter Data

CIS 660 Data Mining

Sunnie Chung12

Page 13: Quick R Tutorial for Data Mining - cis.csuohio.edu

Twitter DataTwitter DataTwitter DataTwitter Data

CIS 660 Data Mining

Sunnie Chung13

Page 14: Quick R Tutorial for Data Mining - cis.csuohio.edu

Association Rule MiningAssociation Rule MiningAssociation Rule MiningAssociation Rule Mining

CIS 660 Data Mining

Sunnie Chung14

Page 15: Quick R Tutorial for Data Mining - cis.csuohio.edu

• Data Preprocessing

CIS 660 Data Mining

Sunnie Chung15

Page 16: Quick R Tutorial for Data Mining - cis.csuohio.edu

Association Rule MiningAssociation Rule MiningAssociation Rule MiningAssociation Rule Mining

CIS 660 Data Mining

Sunnie Chung16

Page 17: Quick R Tutorial for Data Mining - cis.csuohio.edu

Association Rule MiningAssociation Rule MiningAssociation Rule MiningAssociation Rule Mining

CIS 660 Data Mining

Sunnie Chung17

Page 18: Quick R Tutorial for Data Mining - cis.csuohio.edu

Association Rule MiningAssociation Rule MiningAssociation Rule MiningAssociation Rule Mining

CIS 660 Data Mining

Sunnie Chung18

Page 19: Quick R Tutorial for Data Mining - cis.csuohio.edu

Association Rule MiningAssociation Rule MiningAssociation Rule MiningAssociation Rule Mining

CIS 660 Data Mining

Sunnie Chung19

Page 20: Quick R Tutorial for Data Mining - cis.csuohio.edu

Association Rule MiningAssociation Rule MiningAssociation Rule MiningAssociation Rule Mining

• Removing Redundancy

CIS 660 Data Mining

Sunnie Chung20

Page 21: Quick R Tutorial for Data Mining - cis.csuohio.edu

Association Rule MiningAssociation Rule MiningAssociation Rule MiningAssociation Rule Mining

CIS 660 Data Mining

Sunnie Chung21

Page 22: Quick R Tutorial for Data Mining - cis.csuohio.edu

Association Rule MiningAssociation Rule MiningAssociation Rule MiningAssociation Rule Mining

CIS 660 Data Mining

Sunnie Chung22

Page 23: Quick R Tutorial for Data Mining - cis.csuohio.edu

Association Rule MiningAssociation Rule MiningAssociation Rule MiningAssociation Rule Mining

CIS 660 Data Mining

Sunnie Chung23

Page 24: Quick R Tutorial for Data Mining - cis.csuohio.edu

Association Rule MiningAssociation Rule MiningAssociation Rule MiningAssociation Rule Mining

CIS 660 Data Mining

Sunnie Chung24

Page 25: Quick R Tutorial for Data Mining - cis.csuohio.edu

Decision TreeDecision TreeDecision TreeDecision Tree

CIS 660 Data Mining

Sunnie Chung25

Page 26: Quick R Tutorial for Data Mining - cis.csuohio.edu

Decision TreeDecision TreeDecision TreeDecision Tree

CIS 660 Data Mining

Sunnie Chung26

Page 27: Quick R Tutorial for Data Mining - cis.csuohio.edu

Decision TreeDecision TreeDecision TreeDecision Tree

• Prune decision tree:

CIS 660 Data Mining

Sunnie Chung27

Page 28: Quick R Tutorial for Data Mining - cis.csuohio.edu

Neural Networks Neural Networks Neural Networks Neural Networks

CIS 660 Data Mining

Sunnie Chung28

Page 29: Quick R Tutorial for Data Mining - cis.csuohio.edu

Neural Networks Neural Networks Neural Networks Neural Networks

CIS 660 Data Mining

Sunnie Chung29

Page 30: Quick R Tutorial for Data Mining - cis.csuohio.edu

Neural Networks Neural Networks Neural Networks Neural Networks

CIS 660 Data Mining

Sunnie Chung30

Page 31: Quick R Tutorial for Data Mining - cis.csuohio.edu

Neural NetworksNeural NetworksNeural NetworksNeural Networks

CIS 660 Data Mining

Sunnie Chung31

Page 32: Quick R Tutorial for Data Mining - cis.csuohio.edu

Neural NetworksNeural NetworksNeural NetworksNeural Networks

CIS 660 Data Mining

Sunnie Chung32

Page 33: Quick R Tutorial for Data Mining - cis.csuohio.edu

Neural NetworksNeural NetworksNeural NetworksNeural Networks

CIS 660 Data Mining

Sunnie Chung33

Page 34: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----Nearest Neighbor ClassificationNearest Neighbor ClassificationNearest Neighbor ClassificationNearest Neighbor Classification

CIS 660 Data Mining

Sunnie Chung34

Page 35: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----Nearest Neighbor Classification Nearest Neighbor Classification Nearest Neighbor Classification Nearest Neighbor Classification

• Data Preprocessing

CIS 660 Data Mining

Sunnie Chung35

Page 36: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----Nearest Neighbor Classification Nearest Neighbor Classification Nearest Neighbor Classification Nearest Neighbor Classification

CIS 660 Data Mining

Sunnie Chung36

Page 37: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----Nearest Neighbor Classification Nearest Neighbor Classification Nearest Neighbor Classification Nearest Neighbor Classification

CIS 660 Data Mining

Sunnie Chung37

Page 38: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----Nearest Neighbor Classification Nearest Neighbor Classification Nearest Neighbor Classification Nearest Neighbor Classification

CIS 660 Data Mining

Sunnie Chung38

Page 39: Quick R Tutorial for Data Mining - cis.csuohio.edu

Bayesian ClassifierBayesian ClassifierBayesian ClassifierBayesian Classifier

CIS 660 Data Mining

Sunnie Chung39

Page 40: Quick R Tutorial for Data Mining - cis.csuohio.edu

Bayesian ClassifierBayesian ClassifierBayesian ClassifierBayesian Classifier

CIS 660 Data Mining

Sunnie Chung40

Page 41: Quick R Tutorial for Data Mining - cis.csuohio.edu

Bayesian ClassifierBayesian ClassifierBayesian ClassifierBayesian Classifier

CIS 660 Data Mining

Sunnie Chung41

Page 42: Quick R Tutorial for Data Mining - cis.csuohio.edu

Bayesian ClassifierBayesian ClassifierBayesian ClassifierBayesian Classifier

CIS 660 Data Mining

Sunnie Chung42

Page 43: Quick R Tutorial for Data Mining - cis.csuohio.edu

Support Vector MachineSupport Vector MachineSupport Vector MachineSupport Vector Machine

CIS 660 Data Mining

Sunnie Chung43

Page 44: Quick R Tutorial for Data Mining - cis.csuohio.edu

Support Vector MachineSupport Vector MachineSupport Vector MachineSupport Vector Machine

CIS 660 Data Mining

Sunnie Chung44

Page 45: Quick R Tutorial for Data Mining - cis.csuohio.edu

Support Vector MachineSupport Vector MachineSupport Vector MachineSupport Vector Machine

CIS 660 Data Mining

Sunnie Chung45

Page 46: Quick R Tutorial for Data Mining - cis.csuohio.edu

Support Vector MachineSupport Vector MachineSupport Vector MachineSupport Vector Machine

CIS 660 Data Mining

Sunnie Chung46

Page 47: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----means clusteringmeans clusteringmeans clusteringmeans clustering

• Data Preprocessing

CIS 660 Data Mining

Sunnie Chung47

Page 48: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----means clusteringmeans clusteringmeans clusteringmeans clustering

CIS 660 Data Mining

Sunnie Chung48

Page 49: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----means clusteringmeans clusteringmeans clusteringmeans clustering

CIS 660 Data Mining

Sunnie Chung49

Page 50: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----means clusteringmeans clusteringmeans clusteringmeans clustering

CIS 660 Data Mining

Sunnie Chung50

Page 51: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----means clusteringmeans clusteringmeans clusteringmeans clustering

CIS 660 Data Mining

Sunnie Chung51

Page 52: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----MedoisMedoisMedoisMedois ClusteringClusteringClusteringClustering

CIS 660 Data Mining

Sunnie Chung52

Page 53: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----MedoisMedoisMedoisMedois ClusteringClusteringClusteringClustering

CIS 660 Data Mining

Sunnie Chung53

Page 54: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----MedoisMedoisMedoisMedois ClusteringClusteringClusteringClustering

CIS 660 Data Mining

Sunnie Chung54

Page 55: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----MedoisMedoisMedoisMedois ClusteringClusteringClusteringClustering

CIS 660 Data Mining

Sunnie Chung55

Page 56: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----MedoisMedoisMedoisMedois ClusteringClusteringClusteringClustering

CIS 660 Data Mining

Sunnie Chung56

Page 57: Quick R Tutorial for Data Mining - cis.csuohio.edu

KKKK----MedoisMedoisMedoisMedois ClusteringClusteringClusteringClustering

CIS 660 Data Mining

Sunnie Chung57

Page 58: Quick R Tutorial for Data Mining - cis.csuohio.edu

Hierarchical ClusteringHierarchical ClusteringHierarchical ClusteringHierarchical Clustering

CIS 660 Data Mining

Sunnie Chung58

Page 59: Quick R Tutorial for Data Mining - cis.csuohio.edu

Hierarchical ClusteringHierarchical ClusteringHierarchical ClusteringHierarchical Clustering

CIS 660 Data Mining

Sunnie Chung59

Page 60: Quick R Tutorial for Data Mining - cis.csuohio.edu

Hierarchical ClusteringHierarchical ClusteringHierarchical ClusteringHierarchical Clustering

CIS 660 Data Mining

Sunnie Chung60

Page 61: Quick R Tutorial for Data Mining - cis.csuohio.edu

Hierarchical ClusteringHierarchical ClusteringHierarchical ClusteringHierarchical Clustering

CIS 660 Data Mining

Sunnie Chung61

Page 62: Quick R Tutorial for Data Mining - cis.csuohio.edu

Hierarchical ClusteringHierarchical ClusteringHierarchical ClusteringHierarchical Clustering

CIS 660 Data Mining

Sunnie Chung62

Page 63: Quick R Tutorial for Data Mining - cis.csuohio.edu

Hierarchical ClusteringHierarchical ClusteringHierarchical ClusteringHierarchical Clustering

CIS 660 Data Mining

Sunnie Chung63

Page 64: Quick R Tutorial for Data Mining - cis.csuohio.edu

Hierarchical ClusteringHierarchical ClusteringHierarchical ClusteringHierarchical Clustering

CIS 660 Data Mining

Sunnie Chung64

Page 65: Quick R Tutorial for Data Mining - cis.csuohio.edu

Hierarchical ClusteringHierarchical ClusteringHierarchical ClusteringHierarchical Clustering

CIS 660 Data Mining

Sunnie Chung65

Page 66: Quick R Tutorial for Data Mining - cis.csuohio.edu

Cluster ValidationCluster ValidationCluster ValidationCluster Validation

CIS 660 Data Mining

Sunnie Chung66

Page 67: Quick R Tutorial for Data Mining - cis.csuohio.edu

Cluster ValidationCluster ValidationCluster ValidationCluster Validation

CIS 660 Data Mining

Sunnie Chung67

Page 68: Quick R Tutorial for Data Mining - cis.csuohio.edu

Cluster ValidationCluster ValidationCluster ValidationCluster Validation

CIS 660 Data Mining

Sunnie Chung68

Page 69: Quick R Tutorial for Data Mining - cis.csuohio.edu

Cluster ValidationCluster ValidationCluster ValidationCluster Validation

CIS 660 Data Mining

Sunnie Chung69

Page 70: Quick R Tutorial for Data Mining - cis.csuohio.edu

Density based ClusteringDensity based ClusteringDensity based ClusteringDensity based Clustering

CIS 660 Data Mining

Sunnie Chung70

Page 71: Quick R Tutorial for Data Mining - cis.csuohio.edu

Density based ClusteringDensity based ClusteringDensity based ClusteringDensity based Clustering

CIS 660 Data Mining

Sunnie Chung71

Page 72: Quick R Tutorial for Data Mining - cis.csuohio.edu

Density based ClusteringDensity based ClusteringDensity based ClusteringDensity based Clustering

CIS 660 Data Mining

Sunnie Chung72

Page 73: Quick R Tutorial for Data Mining - cis.csuohio.edu

Density based ClusteringDensity based ClusteringDensity based ClusteringDensity based Clustering

CIS 660 Data Mining

Sunnie Chung73

Page 74: Quick R Tutorial for Data Mining - cis.csuohio.edu

Density based ClusteringDensity based ClusteringDensity based ClusteringDensity based Clustering

CIS 660 Data Mining

Sunnie Chung74

Page 75: Quick R Tutorial for Data Mining - cis.csuohio.edu

Outlier DetectionOutlier DetectionOutlier DetectionOutlier Detection

CIS 660 Data Mining

Sunnie Chung75

Page 76: Quick R Tutorial for Data Mining - cis.csuohio.edu

Outlier DetectionOutlier DetectionOutlier DetectionOutlier Detection

CIS 660 Data Mining

Sunnie Chung76

Page 77: Quick R Tutorial for Data Mining - cis.csuohio.edu

Term frequency Term frequency Term frequency Term frequency –––– Inverse document frequency WeightingInverse document frequency WeightingInverse document frequency WeightingInverse document frequency Weighting

CIS 660 Data Mining

Sunnie Chung77

Page 78: Quick R Tutorial for Data Mining - cis.csuohio.edu

Term frequency Term frequency Term frequency Term frequency –––– Inverse document frequency WeightingInverse document frequency WeightingInverse document frequency WeightingInverse document frequency Weighting

CIS 660 Data Mining

Sunnie Chung78

Page 79: Quick R Tutorial for Data Mining - cis.csuohio.edu

Term frequency Term frequency Term frequency Term frequency –––– Inverse document frequency WeightingInverse document frequency WeightingInverse document frequency WeightingInverse document frequency Weighting

CIS 660 Data Mining

Sunnie Chung79

Page 80: Quick R Tutorial for Data Mining - cis.csuohio.edu

Term frequency Term frequency Term frequency Term frequency –––– Inverse document frequency WeightingInverse document frequency WeightingInverse document frequency WeightingInverse document frequency Weighting

CIS 660 Data Mining

Sunnie Chung80

Page 81: Quick R Tutorial for Data Mining - cis.csuohio.edu

Term frequency Term frequency Term frequency Term frequency –––– Inverse document frequency WeightingInverse document frequency WeightingInverse document frequency WeightingInverse document frequency Weighting

CIS 660 Data Mining

Sunnie Chung81