Can Web Search Be Enhanced For User-Generated

7
Can Web Search Be Enhanced For User-Generated Content? Evan Atkinson Elon University 2434 W. Webb Ave Apt. 205 Burlington, NC 1(862) 579-7039 [email protected] ABSTRACT With so much content on the web this paper aims to answer the question how users can get the most out of their web searches with regards to user-generated content. The system in which we use for web searches can be modified or optimized for better web search result for users. The main goal of this literature review is to look into the research on web search optimization mainly focusing on research in the areas of tags, algorithms, and URLs. These sources on web search show that it has the ability to be enhanced via a few different avenues. With enhanced web search capabilities this would allow users to gather better results tailored to them specifically. Keywords Algorithms, URLs, tags, tag cloud, folksonomy, user-generated content 1. INTRODUCTION To understand user-generated content, one must explain the concept of social media. Social media has been one of the biggest technological booms in recent years to date. It is evident that social media has a strong hold on the average Internet consumer today. According to Forrester Research, 75 percent of Internet users use social media as of 2008, which was a 56 percent increase from the year before (Kaplan & Haenlein 2010). Social media is not a new concept; however, people have always longed for different platforms to connect with one another. Each new way to communicate was revolutionary. Kaplan & Haenlein (2010) state that in 1998 Bruce and Susan Abelson founded “Open Diary,” an early social media site that connected online diary writers. Even though the term social media is huge today, these social media sites have been around longer than most people realize. People are always interested in communicating with others, which is why the printing press and telephones were so radical when they were released to the public. Blogs were the first forms of social media and today there are over 100 million blogs available online (Kietzmann, Hermkens, McCarthy, & Silvestre 2011). People have been debating different ways to access these blogs for users consumption, and there are search engines specifically targeting blogs such as Technorati. Information such as this is very valuable to many online users, but can not be accessed through regular web search engines. With the rise of micro-blogging through Twitter, a lot of this information can be vital in searches. According to Kietzmann et al. (2011), there are seven functional blocks of social media: Identity, Presence, Relationships, Sharing, Reputation, Groups, and Conversations. Each block does not need to be present in each social media platform, but these blocks can be used to analyze each platform in a more thorough manner. Rainie & Wellman (2012) reference the triple revolution, which they propose is currently going on. The Social Network, Internet, and Mobile Phone revolutions are coming together at the same time. With this boom in all three platforms, it has become much easier for people to create content, and connect with one another. The Social Network revolution has provided opportunities for people to reach beyond their individual tight knit worlds. The Internet revolution has given people the power to communicate, and the power to

description

With so much content on the web this paper aims to answer the question how users can get the most out of their web searches with regards to user-generated content. The system in which we use for web searches can be modified or optimized for better web search result for users. The main goal of this literature review is to look into the research on web search optimization mainly focusing on research in the areas of tags, algorithms, and URLs. These sources on web search show that it has the ability to be enhanced via a few different avenues. With enhanced web search capabilities this would allow users to gather better results tailored to them specifically.

Transcript of Can Web Search Be Enhanced For User-Generated

Page 1: Can Web Search Be Enhanced For User-Generated

Can Web Search Be Enhanced For User-Generated Content?

Evan AtkinsonElon University

2434 W. Webb Ave

Apt. 205Burlington, NC

1(862) [email protected]

ABSTRACT

With so much content on the web this paper aims to answer the question how users can get the most out of their web searches with regards to user-generated content. The system in which we use for web searches can be modified or optimized for better web search result for users. The main goal of this literature review is to look into the research on web search optimization mainly focusing on research in the areas of tags, algorithms, and URLs. These sources on web search show that it has the ability to be enhanced via a few different avenues. With enhanced web search capabilities this would allow users to gather better results tailored to them specifically.

KeywordsAlgorithms, URLs, tags, tag cloud, folksonomy, user-generated content

1. INTRODUCTIONTo understand user-generated content, one must

explain the concept of social media. Social media has been one of the biggest technological booms in recent years to date. It is evident that social media has a strong hold on the average Internet consumer today. According to Forrester Research, 75 percent of Internet users use social media as of 2008, which was a 56 percent increase from the year before (Kaplan & Haenlein 2010). Social media is not a new concept; however, people have always longed for different platforms to connect with one another. Each new way to communicate was revolutionary.

Kaplan & Haenlein (2010) state that in 1998 Bruce and Susan Abelson founded “Open Diary,” an early social media site that connected online diary writers. Even though the term social media is huge today, these social media sites have been around longer than most people realize. People are always interested in communicating with others, which is why the printing press and telephones were so radical when they were released to the public.

Blogs were the first forms of social media and today there are over 100 million blogs available online (Kietzmann, Hermkens, McCarthy, & Silvestre 2011). People have been debating different ways to access these blogs for users consumption, and there are search engines specifically targeting blogs such as Technorati. Information such as this is very valuable to many online users, but can not be accessed through regular web search engines. With the rise of micro-blogging through Twitter, a lot of this information can be vital in searches.

According to Kietzmann et al. (2011), there are seven functional blocks of social media: Identity, Presence, Relationships, Sharing, Reputation, Groups, and Conversations. Each block does not need to be present in each social media

platform, but these blocks can be used to analyze each platform in a more thorough manner.

Rainie & Wellman (2012) reference the triple revolution, which they propose is currently going on. The Social Network, Internet, and Mobile Phone revolutions are coming together at the same time. With this boom in all three platforms, it has become much easier for people to create content, and connect with one another. The Social Network revolution has provided opportunities for people to reach beyond their individual tight knit worlds. The Internet revolution has given people the power to communicate, and the power to access an insurmountable amount of information. The Mobile revolution has given people the ability to use powerful technology devices wherever they go.

Through the understanding of social media, we can now delve into the subject of UGC and web search optimization. Web searches have always been a way for users to find results on the web for a wide range of topics, but in recent years the web has grown tremendously and a lot has to do with user-generated content. The question asked earlier is if web searches can be enhanced to aid in search results for user-generated content (UGC).

2. Discussion2.1 User-Generated ContentThere is no standard definition of UGC, but according to Balasubramiam (2009) the Organisation for Economic Co-operation and Development (OECD) defines UGC as the following:

i) A content which is made publicly available, through internet

ii) Boasting a certain level of creativity and maybe the most important point

iii) Contents created outside of professional practices.Millions of people contribute to this UGC online culture that has grown in recent years due to certain websites such as Delicious, Wikipedia, and YouTube, which all have very different platforms for users to generate content on.

For example, Wikipedia has more than two million articles created by users in English alone, and it is one of the fastest growing collaborative content outlets in the world (Nov, 2007). Balasubramiam (2009) also states that people contributing user-generated content are looking to connect with people, as self-expression and as well as to receive recognition or prestige for their work. While Shirky (2008) thinks, “ This desire to make a meaningful contribution where we can is part of what drives Wikipedia’s spontaneous division of labor.”

UGC is re-shaping the way people use the Internet, and it is creating new social interactions, and giving users the power to be more creative, along with being able to develop different business opportunities or marketing. UGC has created,

Page 2: Can Web Search Be Enhanced For User-Generated

a huge impact in the video viewing world on the Internet with the creation of YouTube. Constant waves of new videos being created because of the Web have made watching videos a quick personal viewing experience, leading to a great variability in user-behavior and attention span (Cha, Kwak, Rodriquez, Ahn, & Moon 2007). UGC of this nature has greatly affected the strategies for marketing, recommendation, and search engines.

There may be many different reasons for users to want to generate content and ideas on the Internet, but that is not the main focus. The main focus is strictly on if UGC can be more searchable in web searches. Whether users enjoy these new collaborative platforms for expressing themselves or to practice their skills and receive feedback, millions of users are creating this content, and it needs to be accessed through web search.

2.2 Social MediaSocial networking sites also create a space for users to generate content. With the rise of Twitter in recent years, it has become a place for users to micro-blog and to upload important information. Twitter has been known to have breaking news, real-time content, and popular trends on a national and global scale. This information could be quite interesting and important to users who, for example, are searching on a news topic in a web search. According to Teevan, Ramage, & Morris (2011), users reported the biggest factor in searching Twitter was to find timely information, yet these results do not yield in web searches. Teeven et al. (2011) reports almost 50 percent of the searches have to do with the news, or news trends. Showing that users are searching for informational content on social media UGC sites.

With the popularity of Facebook the way people connected has never been the same. I believe these forums of idea sharing, if garnered the right momentum, could completely change the way people gather “important information”, which would be filtered for them and potentially could change the way people “surf” the Internet. UGC is growing rapidly, and users of the web need to be able to access all pertinent knowledge on their search topics, even if it is UGC and not contemporary search results.

To gain access to UGC and other information through web searches, researchers have come up with a few different solutions that can be grouped into categories. The main ways researchers have focused on web search optimization have been to use tags, URLs, and/or algorithms.

2.3 TagsOne of the most complicated challenges in navigating

this new user-generated world is how to organize relevant information. In the world of UGC, bookmarking is one of the most popular ways information is being organized. The rise in popularity of websites, such as Delicious, has made this use of tags very apparent in online culture. Delicious is considered one of the first successful social bookmarking sites.

Golder & Huberman (2006) state that bookmarks are useful because they can be accessible from any computer, not just the users own browser. Each bookmark records the web page’s URL and its title, as well as the time at which the bookmark was created. Users can also choose a tag or multiple tags for each bookmark of their choice.

According to Sinclair & Cardew-Hall (2008), tagging services allow a participant to associate freely determined keywords (‘tags’) with a particular resource. Tagging services exist to tag an enormous variety of things such as photographs, URLs, podcasts, computer games, music and videos. The dataset arising from all the participants’ tags is commonly referred to as a ‘folksonomy’. Thomas Vander Wal, who coined the term folksonomy says, however that folksonomy is not collaborative but is the result of personal free tagging of information for one’s own retrieval.

However, tagging in itself is not collaborative, but it leads to a collaborative function on the web that can be used to aid in web search capabilities. Accessing these tags, or meta-data, can be extremely vital in users’ search engine experiences. Tags have many different abilities and with UGC creating new forums of creativity, bookmarking sites have flourished. Golder & Huberman (2006) identify several functions tags that perform with bookmarks: Identify what or who it is about, what it is, who owns it, refining categories, identifying qualities or characteristics, self reference and task organizing.

As reported by Golder & Huberman (2006), tags have many different functional aspects, and have been used successfully to aid users in organizing with bookmarks. The study also focuses on finding regularities in user activity, and tag frequencies. The results showed that after the first 100 or so bookmarks added to a specific tag, each tag’s frequency is nearly fixed proportion of the total frequency of all tags used. The results also showed that this stability often appears fewer than 100 bookmarks, which shows a URL does not need to be become very popular for the tag data to be useful.

With regards to the Delicious interface, users can add bookmarks, but tags already used by people who already tagged that URL can also be seen. This causes many users to imitate the tag selection already used, which can lead to a consensus in common tags used for a URL. In this way, Delicious is not directly a recommendation system, but through popular tags sending users in a certain direction it acts in such a way.

Bischoff, Firan, Nejdl, & Paiu (2008) say certain motivations for users to tag include organizational motivations for tagging, opinion expression, attraction of attention, self-presentation or providing context to friends. Results show in a free-for-all system, opinion expression, self-presentation, and activism seem to be major motivating factors while in self-tagging systems, such as Delicious or Flickr, users tag motivations seem to predominantly be for their own benefits like enhancement of information organization.

Results also show that tags in bookmarking systems for the most part provide a good summary of the web page they are tagged to, and they can indicate the popularity of a page. With this as the case, accessing popular information through web search is vital. Successful tags would yield better results for users using web searching. Guy, Zwerdling, Ronen, Carmel, & Uziel (2010) focused on different types of tag recommendation engines, which included a people-based recommender (PBR); a tags-based recommender (TBR); two types of a hybrid recommender (PTBR): a combination of people or tags (or-PTBR), and a combination of people and tags (and-PTBR, suggesting only items related to both people and tags); and a popularity-based recommender (POPBR).

Page 3: Can Web Search Be Enhanced For User-Generated

Guy et al. (2010) results showed that the combination of incoming tags and used tags is the most effective in representing a user’s topics of interest. Recommendations based on a TBR, with a tag profile that combines incoming and used tags, are rated significantly more interesting than the most effective PBR studied.

Recommended items are shown to be highly different between the PBR and the TBR, with less than 2 percent overlap. A hybrid PTBR recommender including explanations improves the results slightly further, leading to an over 70:30 ratio between interesting and non-interesting items. It also presents other potential benefits over a TBR, such as a lower percentage of already known items and higher diversity of item types (Guy et al. 2010).

With this data it can be shown that there are clear benefits in different types of tag recommendation engines. If the correct tag recommendation engine is employed then combining a system with a tag-based engine that could access UGC and other meta-data can enhance web search capabilities.

With so many tags populating the Internet, a tag-cloud system has also been created. Tag clouds are a new way to find and visualize information that can be accessed in one click. Through research on tag-cloud search engine interfaces Trattner et al’s (2012) results show that from the users’ perspective, both tag-based browsing interfaces were perceived to be better to the baseline interface (i.e. Google, Yahoo, Bing). The users indicated that these interfaces provided significantly enhanced support and reported significantly higher levels of confidence that relevant information would be found. They also ranked both tag-based browsing interfaces significantly higher overall than the baseline interface.

Millen & Feinberg (2006) were interested in how social tagging can improve social navigation. In the study, they focused research on a service called Dogear. The dogear social bookmarking service was designed to display bookmarks within a navigation model that allows users to manage and explore the collection in different ways. Results from Millen & Feinberg (2006) confirm that social tags are used by a large number of the users of the social bookmarking application under study. Approximately 60 percent of the dogear service visitors explored the bookmark collection using one or more of the pivot links (tags or people). These tags are linked and based on the URLs they are tagging. A few studies have researched into how useful these URLs are for web search capabilities.

2.4 URLsKolay & Dasdan (2009) study shows that Delicious

URLs provide significant value to web search engines from the perspective of content discovery and user search satisfaction. The URLs lead to faster discovery of good quality content. Also, when given to users in response to search results, the numbers of them receiving clicks as well as the total number of clicks they receive are significantly high.

Kandylas & Dasdan (2010) study instead focused on the massive site Twitter. Twitter is being used to recommend popular articles in real-time to track breaking news stories, for work-related communication, or for brand marketing. Much of this UGC would be of importance to users’ search results.

Kandylas & Dasdan (2010) results show that extracting bitly URLs from Twitter can be useful for a web search engine. The average URL quality is higher than that of a randomly selected set. However, URL tweet count and lifetime

provides insights into some of the URLs, but is not enough to filter out a large portion of bad or spam URLs. These results are still a bright spot in web search optimization.

Wetzker, Zimmermann, & Bauckhage (2008) also agrees with Kandylas & Dasdan’s (2010) conclusion that more research needs to be done. Wetzker et al’s (2008) analysis on social bookmarking systems showed that social bookmarking provides a valuable source for information retrieval and social data examination. However, the study found that spam could highly distort any analysis.

Chen, Scripps, & Tan (2008) also found similar evidence of this issue of spam, with results showing in the study that social bookmarking is a rich domain for applying link mining. Although, presenting interesting research problems such as how to identify potential collusion between users or tag spam in social bookmarking data. Spam has the ability to invade user-generated sites, and the web as a whole and clearly more future research needs to be done on how to better filter spam from user-generated social media sites such as Twitter. Besides research on how folksonomy and URLs can improve web search, there has also been research done on improving algorithms for web search development.

2.5 AlgorithmsBateman, Muller, & Freyne (2009) presented an

algorithm that has potential to improve the user experience using the popular pivot browsing mechanism by improving bookmark orderings, thus reducing the number of query term refinements needed to find a bookmark of interest.

Bogers & Van Den Bosch (2011) also studied this topic with focus on algorithmic improvements, that approach the use of tag overlap and metadata which provide better results for social bookmarking data sets than the transaction patterns that are used traditionally in recommender systems research. They find that fusing recommendations can indeed produce significant improvements in recommendation accuracy. Also finding that it is often better to combine approaches that use different data representations, such as tags and metadata, than to combine approaches that only vary in the algorithms they use. The best results are obtained when both of these aspects of the recommendation task are varied in the fusion process.

Heymann, Koutrika, & Garcia-Molina (2008) found through their study on tags, URLs, and social bookmarking that social bookmarking as a data source for search has URLs that are often actively updated and prominent in search results. The study also found that tags were overwhelmingly relevant and objective. Bao et al. (2007) also believes through social annotation that web search can be optimized and improved. This study focused on optimizing web search through the aspects of similarity ranking and static ranking. The results showed that social annotations provided not only a summary, but also a good indicator of the quality of web pages. Social annotations could benefit web search in both similarity ranking and static ranking. The results showed that similarity ranking can successfully find the relations among annotations and static ranking can provide information from the web annotators’ perspective.

3. ConclusionEither through tag, URL, or algorithmic

improvements, web search optimization is vital for many reasons. Users of web search have the desire to be able to access any and all information that is relevant to their search

Page 4: Can Web Search Be Enhanced For User-Generated

topics. If social bookmarking continues to grow at the rate it has over the past several years then it will quickly reach the scale of the current web. These tags will have a large impact on improving the quality of web search.

The studies presented provide a basis for future research on web search optimizations using tags, URLs, and algorithm for improvements. UGC has proliferated out virtual worlds, and many users have a want and need to access this information. Through future work improvements in some fundamental aspects of web search, these golden nuggets of UGC will be easily, and more readily accessed.

4. REFERENCES[1] Balasubramaniam, N. (2009). User-generated

content. Business Aspects of the Internet of Things, 28

[2] Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., & Su, Z. (2007, May). Optimizing web search using social annotations. In Proceedings of the 16th international conference on World Wide Web (pp. 501-510). ACM. Doi: 10.1145/1242572.1242640

[3] Bateman, S., Muller, M. J., & Freyne, J. (2009, May). Personalized retrieval in social bookmarking. In Proceedings of the ACM 2009 international conference on Supporting group work (pp. 91-94). ACM. Doi: 10.1145/1531674.1531688

[4] Bischoff, K., Firan, C. S., Nejdl, W., & Paiu, R. (2008, October). Can all tags be used for search?. In Proceedings of the 17th ACM conference on Information and knowledge management (pp. 193-202). ACM. Doi: 10.1145/1458082.1458112

[5] Bogers, T., & Van Den Bosch, A. (2011). Fusing recommendations for social bookmarking web sites. International Journal of Electronic Commerce, 15(3), 31-72. Doi: 10.2753/JEC1086-4415150303

[6] Cha, M., Kwak, H., Rodriguez, P., Ahn, Y. Y., & Moon, S. (2007, October). I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (pp. 1-14). ACM. Doi: 10.1145/1298306.1298309

[7] Chen, F., Scripps, J., & Tan, P. N. (2008, December). Link mining for a social bookmarking web site. In Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT'08. IEEE/WIC/ACM International Conference on (Vol. 1, pp. 169-175). IEEE. Doi: 10.1109/WIIAT.2008.369

[8] Golder, S. A., & Huberman, B. A. (2006). Usage patterns of collaborative tagging systems. Journal of information science, 32(2), 198-208. Doi: 10.1177/0165551506062337

[9] Guy, I., Zwerdling, N., Ronen, I., Carmel, D., & Uziel, E. (2010, July). Social media recommendation based on people and tags. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 194-201). ACM. Doi: 10.1145/1835449.1835484

[10] Heymann, P., Koutrika, G., & Garcia-Molina, H. (2008, February). Can social bookmarking improve web search?. In Proceedings of the international conference on Web search and web data mining (pp. 195-206). ACM. Doi: 10.1145/1341531.1341558

[11] Kandylas, V., & Dasdan, A. (2010, April). The utility of tweeted URLs for web search. In Proceedings of the 19th international conference on World wide web(pp. 1127-1128). ACM. Doi: 10.1145/1772690.1772837

[12] Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! The challenges and opportunities of Social Media. Business horizons, 53(1), 59-68. Doi: 10.1016/j.bushor.2009.09.003

[13] Kietzmann, J. H., Hermkens, K., McCarthy, I. P., & Silvestre, B. S. (2011). Social media? Get serious! Understanding the functional building blocks of social media. Business Horizons, 54(3), 241-251. Doi: 10.1016/j.bushor.2011.01.005

[14] Kolay, S., & Dasdan, A. (2009, April). The value of socially tagged urls for a search engine. In Proceedings of the 18th international conference on World wide web (pp. 1203-1204). ACM. Doi: 10.1145/1526709.1526929

[15] Millen, D. R., & Feinberg, J. (2006, June). Using social tagging to improve social navigation. In Workshop on the Social Navigation and Community based Adaptation Technologies.

[16] Nov, O. (2007). What motivates wikipedians? Communications of the ACM,50(11), 60-64.

[17] Rainie, H., Rainie, L., & Wellman, B. (2012). Networked: The new social operating system. The MIT Press.

[18] Shirky, C. (2008). Here comes everybody: The power of organizing without organizations. Penguin.

[19] Sinclair, J., & Cardew-Hall, M. (2008). The folksonomy tag cloud: when is it useful?. Journal of Information Science, 34(1), 15-29. Doi: 10.1177/0165551506078083

[20] Spiteri, L. F. (2013). The structure and form of folksonomy tags: The road to the public library catalog. Information technology and libraries, 26(3), 13-25. Doi: 10.6017/ital.v26i3.3272

[21] Teevan, J., Ramage, D., & Morris, M. R. (2011, February). # TwitterSearch: a comparison of microblog search and web search. In Proceedings of the fourth ACM international conference on Web search and data mining (pp. 35-44). ACM. Doi: 10.1145/1935826.1935842

[22] Trattner, C., Lin, Y. L., Parra, D., Yue, Z., Real, W., & Brusilovsky, P. (2012, June). Evaluating tag-based information access in image collections. InProceedings of the 23rd ACM conference on Hypertext and social media (pp. 113-122). ACM. Doi: 10.1145/2309996.2310016

[23] Wetzker, R., Zimmermann, C., & Bauckhage, C. (2008, July). Analyzing social bookmarking systems: A del. icio. us cookbook. In Proceedings of the ECAI 2008 Mining Social Data Workshop (pp. 26-30).