Approaches to ml techniques on real world data
-
Upload
venkata-ramana -
Category
Education
-
view
282 -
download
0
Transcript of Approaches to ml techniques on real world data
![Page 1: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/1.jpg)
Approaches to ML Techniques on Real World Data
A Demo on Behavioral Analysis in Social Networking Sites.
--Venkata Ramana C
![Page 2: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/2.jpg)
Real World Data
Social Networking Sites Blogs
Forums Tweets …
![Page 3: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/3.jpg)
![Page 4: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/4.jpg)
Aim
• Create an Environment where you create your own rules on how to share your data on the Web.
![Page 5: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/5.jpg)
Technique
• Active Learning
![Page 6: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/6.jpg)
Active Learning
• The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labelled training instances if it is allowed to choose the data from which is learns.
• An active learner may ask queries in the form of unlabeled instances to be labelled by an oracle (e.g., a human annotator).
![Page 7: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/7.jpg)
Scenario
![Page 8: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/8.jpg)
Uncertainty Sampling• The technique of the project is as follows:
• there will be questions to the user about each friend as below, followed by what the user wants to share with his friend -- which are the profile features.
• How are u associated with your friend? ( or )
• What do u have in common with your friend?
• 1.personal
• 2.donno how (shall take some time to decide ?)
• 3.we have ...x.y.z....(specify) in common. [this will form a group]
![Page 9: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/9.jpg)
Pseudo Algorithm
• Input: initial small training set L, and pool of unlabeled data set U Use L to train the initial classifier C
– Repeat• Use the current classifier C to label all unlabeled examples in U.• Use uncertainty sampling technique to select m2 most informative unlabeled examples, and ask oracle H for labelling.• Augment L with these m new examples, and remove them from U.• Use L to retrain the current classifier C.
Until the predefined stopping criterion SC is met.
![Page 10: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/10.jpg)
So When is This Useful?
![Page 11: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/11.jpg)
Friend Groups
![Page 12: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/12.jpg)
Active Learning for Privacy
Courtesy: Privacy Wizards for Social Networking Sites. WWW2010
![Page 13: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/13.jpg)
Concepts
• Gini Impurity - the expected error rate.
• Entropy - how mixed up a set is? p(i) = frequency(outcome) = count(outcome) / count(total rows) Entropy = sum of p(i) x log(p(i)) for all outcomes
![Page 14: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/14.jpg)
Courtesy: Collective Intelligence by Toby Segaran
![Page 15: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/15.jpg)
CART (Classification and Regression Trees)
• Decision tree classifiers are simple to view and interpret.
• If-Then rules.
![Page 16: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/16.jpg)
Application Data.
• Id group1 group2 ..... DOB • 123 Yes No ..... Share• 124 No Yes ..... NotShare• ...... ... ..... .....• ...... ... ..... .....• ...... ... ..... .....• ...... ... ..... .....• 129 Yes Yes ..... Share
![Page 17: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/17.jpg)
Demo
• We will use Decision Trees to train some of our friends in Social Networks to set Privacy Preferences.
![Page 18: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/18.jpg)
Demo Results:
• [u'project', u'srm', u'sssg'] => Feature Vector [0, 1, 2] => mapped by CART algorithm• Goal is to ‘Share’ or ‘NotShare’
Dob, zip, religion, phone, email.For Dob: [‘yes’, ‘No’, ‘No’, ‘Share’] (from Vasu) [‘No’, ‘Yes’, ‘No’, ‘NotShare’] (from Cigith) [‘No’, ‘No’, ‘Yes’, ‘Share’] (from Harish)
![Page 19: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/19.jpg)
Decision Tree for DOB: 0: Yes? F/ \T{‘Share’: 2} {‘NotShare’: 1}
Zip Code:• [['Yes', 'No', 'No', 'Share'], ['No', 'Yes', 'No',
'NotShare'], ['No', 'No', 'Yes', 'Share']] 1:Yes? T-> F->{'NotShare': 1} {'Share': 2}
![Page 20: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/20.jpg)
![Page 21: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/21.jpg)
![Page 22: Approaches to ml techniques on real world data](https://reader034.fdocuments.us/reader034/viewer/2022052600/557cd181d8b42a60508b478c/html5/thumbnails/22.jpg)
References
• Collective Intelligence by Toby Segaran.
• Privacy Wizards for Social Networking Sites. WWW2010.