Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

20
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI

Transcript of Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Page 1: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Finding high-Quality contents in Social media

BY : APARNA TODWAL

GUIDED BY : PROF. M. WANJARI

Page 2: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

IntroductionMain challenges What we are targeting?Papers readDiscussion on findings of the base paperFurther plans

OUTLINE:

Page 3: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

INTRODUCTION:

In the early 1990s, the majority of web users were consumers of content, created by a relatively small amount of publishers.

In the early 2000s, user-generated content has become increasingly popular on the web: more and more users participate in content creation, rather than just consumption.

Page 4: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Popular user generated content domains include web forums, photo and video sharing communities, as well as social networking platforms such as Facebook and MySpace.

Community-driven question/answering portals are a particular form of user-generated content that is gaining a large audience in recent years.

These portals, in which users answer questions posed by other users, provide an alternative channel for obtaining information on the web: rather than browsing results of search engines.

Page 5: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

The main challenge posed by content in social media sites is the fact that the distribution of quality has high variance: from very high-quality items to low-quality, sometimes abusive content.

This makes the tasks of filtering and ranking in such systems more complex than in other domains.

Our main task is identifying high quality content in community-driven question/answering sites, As a test case, we focus on Yahoo! Answers.

MAIN CHALLENGES:

Page 6: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

“ Yahoo! Answers ”Yahoo! Answers, a large

community question/answering portal that is particularly rich in the amount and types of content where people ask and answer questions on any topic.

A user can vote for answers of other users, gives rating “thumbs up” or “thumbs down”, mark interesting questions, and even report abusive behavior.

Page 7: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Each user has a threefold role: asker, answerer & evaluator.

The central element of the Yahoo! Answers system are questions. Each question has a lifecycle :

It starts in an “open” state where it receives answers. After automatic timeout in the system, the question is

considered “closed” Once a best answer is chosen, the question is “resolved”

For the community question/answering domain, we show that our system is able to separate high-quality items from the rest .

Page 8: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

? What are the elements of social media that can be used to facilitate automated discovery of high-quality content ?

? How are these different factors related ? Is content alone enough for identifying high-quality items ?

? Can community feedback approximate judgments of specialists ?

WHAT WE ARE TARGETING?

Page 9: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Sr. No.

Paper Name Author Year Conclusion

1 Finding High-Quality Content in Social Media

Carlos Castillo 2008 A general classification framework for quality estimation in social media.

2 Internet-scale collection of human-reviewed data.

W. C. Baker. 2007 Conducted experiments analysing data quality of an Internet-scale system for human-reviewed data.

3 Automated essay scoringusing bayes.

L. M. Rudner and T. Liang.

2002 Presented a Bayesian approach to essay scoring based on the well developed text classification literature.

PAPERS READ:

Page 10: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Related workLink analysis in social media.

Link-based ranking algorithms is successful in estimating the quality of web pages. It uses User Answers

Graph.where - Graph is represented as G.

- Vertex is represented as V corresponding to the users of a ques /ans system.

- Directed edge e = (u, v) ∈ E from a user u ∈ V to a user v ∈ V if user u has answered to at least one question of user v.

The two most prominent link-based ranking algorithms are PageRank and HITS.

Discussion On Findings Of Base Paper:

Page 11: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

Text analysis for content quality Some of the features employed in systems are lexical, such as

word length, measures of vocabulary irregularity via repetitiveness, Other features take into account usage of punctuation and detection of common grammatical error.

Implicit feedback for ranking

Implicit feedback from millions of web users has been shown to be a valuable source of result quality and ranking information.

Page 12: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

CONTENT QUALITY ANALYSIS IN SOCIAL MEDIA Evaluation of content quality is an essential module for

performing more advanced information - retrieval tasks on the question/answering system.

Punctuation and typos : A particular form of low visual quality are misspellings and typos; additional features in our set quantify the number of spelling mistakes, as well as the number of out-of-vocabulary words.

Grammaticality : The content is annotated with part-of-speech (POS) tags and use the tag n-grams. This allows us to capture, to some degree, the level of “correctness” of the grammar used.

Page 13: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

MODELING APPROACH:Application-specific user relationships :

Page 14: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

The relationships between questions, users asking and answering questions, & answers can be captured by a tripartite.

Since a user is not allowed to answer his/her own questions, there are no triangles in the graph

Page 15: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

EXPERIMENTAL SETTING: Let Q0 and A0 be the sets of

questions and answers, respectively, included in the evaluation dataset.

Let U1 be the set of users who have made a question in Q0 or given an answer in A0.

Additionally we select Q1 to be the set of all questions asked by all users in U1.

Page 16: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

EXPERIMENTAL SETTING: Similarly we select A1 to be the

set of answers given by users in U1 and A2 to be the set of all the answers to questions in Q1.

Obviously Q0 ⊆ Q1 and A0 ⊆ A1.

Our dataset is then defined by the nodes (Q1, A1 ∪ A2, U1) and the edges induced from the whole dataset.

Page 17: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

There is a positive correlation between question quality and answer quality.

Good answers are much more likely to be written in response to good questions, and bad questions are the ones that attract more bad answers.

Page 18: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

FURTHER PLAN: Further plan is to refine the analysis of high quality

contents.

We can integrate the results from all question/answer sites for concluding the best contents amongst them to provide users with more efficient services.

Future plan is to more specifically explore the relationships and usage features to automatically identify malicious users.

Page 19: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

REFERENCES:

E. Agichtein, E. Brill, S. T. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In SIGIR, pages 3–10, 2006.

K. Ali and M. Scarr. Robust methodologies for modeling web click distributions. In www, pages 511–520, 2007.

E. B. Page. Computer grading of student prose, using modern concepts and software. Journal of Experimental Education, 62(2), 1994.

Page 20: Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.

THANK YOU !