Jiit 2013 14 project presentation aniket mishra

Major Project

Event Based News Clustering

Submitted By: Aniket Mishra

Problem Statement:• To implement a clustering system which can cluster the data which is

related to it in one cluster and one can see what is happening in the next event. so basically i have to implement event based news clustering system using clustering algorithm.

Implementation Steps Followed:• I have crawled data of election campaign Using BING API in different

time periods.• Used sub categories AAP , BJP,Congress• Applied k-means first I have taken 10 clusters.• Then applied Modified K-means On data to improve it’s Efficiency.• Applied algorithm using tfidf ,centroid calculation,cosine similiarity.

RSS Purity Rand Index

K-means 73.52 65.9 .66

Modified K-means 73.70 71.5 .649

Table 1 shows the results obtained by our system for k-means and modified k-means algorithm.

Table 1-Comparison of clustering results

When calculating purity and rand index of k-means and modified k-means we found out that when we repeat the clusters for 10 times and get the initial k-points from each of the k different clusters rather than random restart for modified k-means it gives better results and give better purity as it can be.

Results DemonstrationThese are the results in cluster 9 that are coming altogether making it related news as we can see all 4 news are related to Rahul Gandhi. I have taken the news on 29-05-14 and these results were scattered and by using k-means clustering they are clustered and we found out these results.

As in this second example that I have taken we can see news is mostly related to Punjab unit of congress.so this is inferring that the news that I have taken correctly clustered. And we can also see that 2 news are also not related so It is not 100% pure clustered news.

Conclusion• In this project I have designed and evaluated clustering system. Our clustering

system crawls incoming news reports from Bing api and cluster them according to the event they are describing. The clustering is performed by representing incoming news reports as Bag of Word with TF-IDF weighting, and using a variation of k-means algorithm that works in a single pass without cluster re-organization. The number of cluster to produce is fixed for every query to 29 and new events are detected automatically. Clustering process takes 1-2 minutes to fetch news from website.

• The evaluation results show that our system is very effective when clustering documents into highly specific clusters, but performs rather poorly when clustering documents into more general categories and it performs better for Modified k-means.

Future Work:• It is my opinion that our clustering can be applied in other domains

apart from online news. For example it can be applied successfully to the clustering of social media feed to produce clusters according to the item being discussed by different people. In my project in future a user interface for user can be created for better use. And we can also improve its scalability

Thank you!

Jiit 2013 14 project presentation aniket mishra

Technology

Transcript of Jiit 2013 14 project presentation aniket mishra

Aniket Joshi Justin Thomas

Training Report-Aniket

CUSTOMER RELATIONSHIP MANAGEMENT AND - JIIT

Outcome Based Education - JIIT

Aniket Warade_Portfolio 2016

JIIT;Project 2013-14,Project Presentation

Portfolio Aniket Datar_Side-A_Academic

Big Data Aniket-Bhushan1

Aniket Repot(India Mart)

Yogacharya dr. aniket bhosale

Lecture-wise Breakup - JIIT

Selection list for BBA 2… · kumar ayush astha mishra tanu kumari mowparna chakraborty riya keshri praveen kumar rahul simran kumari sourav kumar kalyani kumari aniket kr singh

MPC Tutorial Jiit 128 2011

Aniket Document 2012 msc thesis

Aniket enzymatic degumming

17M12EC130 - JIIT

Value Added Courses - JIIT

Aniket IT Services Pvt Ltd

HTML (By Aniket)

JIIT PORTAL based on Drupal