Search and the ‘Net in 2013 Michael Hunter Reference Librarian Hobart and William Smith Colleges...
-
Upload
hollie-poole -
Category
Documents
-
view
215 -
download
1
Transcript of Search and the ‘Net in 2013 Michael Hunter Reference Librarian Hobart and William Smith Colleges...
Search and the ‘Net in 2013
Michael HunterReference Librarian
Hobart and William Smith Colleges
For Rochester Regional Library Council
Member Libraries’ StaffSponsored by the
Rochester Regional Library Council Supported by Regional Bibliographic Databases and Resources Sharing (RBDB)
funds granted by the New York State Library 2013
For today . . .
The Searchscape Entity-based Search New Services and Tools The Social Web Bing, Blekko, DuckDuckGo News from Google A Privacy Primer Trends and Future Directions Linklist
http://people.hws.edu/hunter/searchnet13links.htm
America at the Digital Turning PointCenter for the Digital Future – USC Annenberg School for Communication www.digitalcenter.org/pdf/CDF_10_year_digital_turning_point.pdf Longitudinal study over 10 years Over 2,000 US households surveyed each
year “…online behavior changes relentlessly.” “…constant social connection, unlimited
access to information, and unprecedented abilities to purchase.”
“…online technology creates extraordinary demands on our time, major concerns about privacy, and fundamental questions about the proliferation of the digital realm…”
America at the Digital Turning PointSelected highlights Americans view the Internet as an
important information source, yet many Internet users do not trust much of the information (there)
Our privacy is lost. Most printed daily newspapers will be gone
in about five years. The sheer overwhelming nature of
technology may be reaching a critical point. Because of online technology, work is
increasingly a 24/7 experience.
America at the Digital Turning PointTime spent face-to-face with family in the household since the Internet
The Web Worldwidedata from the International Telecommunications Union 2011
Total Population – ca. 7 b. Connected to the Web – ca. 2 b. Mobile subscriptions – ca. 6 b.
Mobile subscriptions forecast for 2017- 9 b. with 5 b. mobile broadband connections
GLOBAL 5,981,000,000
Developed nations 1,461,000,000
Developing nations 4,520,000,000
New Top Level Domains (ICANN 1/11/12) .com domains almost exhausted for new
website names “Someone got there first” New businesses must pay domain brokers
for an address or register a new one with un-natural, insignificant words
Now possible to purchase a unique TLD (.mycompany or .ourtrademark or .ourbrand)
Fee - $185,000 with waiting period of 2 years.
Domain Registration
Currently unrestricted: .com .info .net .org
Currently require proof of eligibility .edu .coop .mil .gov .int .museum .xxx .aero .asia
Search engines and satisfactionmdgadvertising.com (data from Pew Research)How often do you actually find the information you’re looking for with search engines?
Entity-based Search:Google’s Knowledge GraphBing’s Satori
Entity-based searchThe back end- How S.E’s worked until now
Matched query terms to terms in their crawler-created database
Results refined Linkage patterns Popularity Personalization Other (?????)
Ambiguous terms abound“kings” “jaguar” “Apollo”
Can a system know????
“Charles Dickens” This searcher wants information about
and books by him “Frank Lloyd Wright”
This searcher wants information about and pictures of buildings designed by him
The basics…. Entity database seeded with a
large“bag of nouns” and supplemented with nouns from web crawls identified through natural language processing
These nouns are mapped to another database of information related and/or relevant to those nouns through n.l.p. beyond simple text matches
Results can be customized based on click responses from previous anonymous searches for that query
Yahoo Research paper - 2009http://research.yahoo.com/files/pods09-woc.pdf Extract structured data (addresses, prices,
item #, etc.) from web documents and associate it with an entity
Link relationships between entities An actor to his films and other actors he has
worked with Discover categorizing information in the
document’s content Subject headings Reviews ( : or ) : Type of food served
The front end- Google’s Knowledge Graph: Focused on questions and answers Contextual box for ambiguous terms with
short descriptions Bing’s Satori: Focused on potential “actions” associated
with the entity Searchers for a rock band usually want
to buy a recording, find lyrics or get tickets
“Snapshot” panel – entity-based results from the social web (yours and others)
Benefits of entity-based search Greater predictability of searcher
satisfaction Discovers related information that
does not contain the search term(s) Disambiguates many terms Colocates related information from
across the Web in a variety of filetypes
The Long Tailhttp://searchengineland.com/search-illustrated-b2b-long-tail-seo-13237
Future challenges- the “long tail” Entities are now limited to the most
popular topics Currently no way to map complex
queries to an entity or entity group “volcanic eruptions in the 18th century” “Lady Gaga concerts in a warm location”
Currently limited to English only Including more entities in English and
other languages will greatly increase processing and impact response time
New Services and ToolsVertical, Realtime, Metas
iSeek Educationhttp://education.iseek.com Targeted discovery engine for students,
teachers and administrators Sources limited to “university,
government and established non commercial providers”
Limited to Safe Search Lesson plans Results clusters include
Subject areas State standardsSpecific topics Grade levelsPlaces People
iSeek Webhttp://iseek.com Small database
iSeek crawler Google Public-contributed “favorites”
Results clusters includeSpecific topics OrganizationsPeople Date & timePlaces Source
MySeek (personal account service) problematic (7/15/13)
Topsy – www.topsy.comReal-time search of the social web
Results from Twitter and Google+ Ranking factors include
How often the page is cited in tweets “Influence” algorithm
“Who is listening to you?” Dynamic process assigns influence score
based on Number of followers Their influence How often your tweets are re-tweeted
Topsy – www.topsy.comReal-time search of the social web
Unlike other real time se’s, ranking is based on a deep archive of social media
Trending metrics used in ranking What’s viral right now?
Experts Search locates authoritative Twitter users on topics of your choice
Advanced search filters Site/domain Twitter user Language (10) Date, time posted
Lexipedia (in beta) Lexigraphic visualization tool based on NLP Maps parts of speech related to specific terms
Nouns, verbs, adverbs, adjectives Gives synonyms, antonyms and “fuzzynyms”
eg. happy – well, fortuitous, volitional Hover for definition and usage examples Currently available in English, Spanish,
German, French, Dutch and Italian(all meanings and usage given in English)
Powered by iSeek
Zapmeta
Searches Web, Images and Video engines Web search includes Yahoo, Bing,
Gigablast, AV, Entireweb Results grouped into “concept clusters” Advanced search offers
Full Boolean Limit by country of page’s origin Limit by domain type Highlighting search term(s)
Polymeta
Web search includes Google, Bing, Ask, Yahoo, Exalead
Source selection available for each search type Web News Images Videos Twitter Blogs Twitter search is limited to top 50 containing
your search terms Faceted and graphed results available Related results from other search types
appear to the right
Searchteam.com
Search engine with wiki-like, real-time collaborative work spaces
“Collective knowledge from your trusted social network circles”
Web sites Videos (YouTube) ImagesReference (Wikipedia) EducationalBooks and Articles (Amazon)
Faceted results and suggested searches Related main topics Subtopics Related searches (suggested)
Searchteam.com
SearchSpaces Organize and share links Online forum for collaborative searching
with friends Must search while in a searchspace to add
to it Educational tab not inclusive of all .edu
domains Results counts unreliable
The Shape of Today’s Social Web
Why search the social web???
Public responses/attitudes/primary sources Breaking news Trending topics and people Latest product reviews Companies and competition
Security, technology topics (latest virus, etc.) Locate individuals and their networks
Who they follow, who follows them People interested in a topic/hobby
Monitor collaborations
Social Networks in the Egyptian Revolution
1/25/11-2/11/11Enabling protesters to become citizen
journalists
Mining Today’s Social Web:The trust factors People you don’t know
Wikipedia Human-created databases, directories
“I need a few good sites on solar energy”
Mahalo, Ipl2.org Q&A Services
“How do I repair my garage door opener?”
Yahoo Answers, Answers.com, Mahalo Answers
Mining Today’s Social Web:The trust factors
People you follow Twitter-human created Tweets“What’s the buzz on Beyonce?”
People you know Post a question to friends and family“What type of Mac should I buy?” Facebook, LinkedIn, Google+, Bing (login
via Facebook)
http://marketingland.com/new-social-discovery-engine-bottlenose-aims-to-take-over-real-time-exploration-17024
TwitterminingSome tweets are more “authoritative” than others…
Access to unfiltered, real-time perspective on what people are thinking and doing
Authority (and usefulness) of a tweet depends on Who sent it The number and “authority” of their
followers When it was sent Documents/sites it refers to
Twittermining Tools
Twitter.com Requires a (free) account Only the latest 2 weeks available Searchable by hashtag (#)
Author-designated keyword or significant term or phrase
#rochester #jobs #marketing
Twittermining Tools
Discover Tab (access via your account) Launched 5/12 Offers Personalized content based on your
Twitter activity Favorites, follows, retweets, and more by
people you follow Who to follow -Twitter accounts suggested
for you based on who you follow Browse categories (<25) and
people/organizations heavily associated with the categories
Twittermining Tools
https://twitter.com/search-advanced No account required Only the latest 2 weeks available Advanced search features
Booleans Hashtag Language limit Author search (tweets from or to) “Near this place” Attitude – positive, negative, question
.
Twittermining Tools
Storify.com Users build social stories, bringing
together media scattered across the Web into a coherent narrative
Access material shared with and by you and public posts
Postings, status updates, photos, videos, podcasts from Twitter, Facebook, YouTube, Flickr, Instagram and more
Discover others with similar interests Requires (free) account, via Facebook or
Social Networks and Results: Users RespondA distraction and concerns about privacy
Established Services:Bing, Blekko, DuckDuckGo
The Fallacy of the Superior Search Engine
Conrad Saam*
Is there a difference in the quality of search results from Google and Bing? Data set of 100 difficult queries
“clean crayon off an led t.v. screen”“Who was Kim Jong Un’s mother?”“wii new release rumors”
*http://searchengineland.com/google-fails-to-trounce-bing-again-the-fallacy-of-the-superior-search-engine-revisited-107238
The Fallacy of the Superior Search Engine
Evaluative factors Timeliness One-click access to information Volume of content Lack of spam Authoritative sites appear in first 3 results
The winner??? G. 296 B. 274 “Bing needs to be a much better search
engine than Google to make it worth the switch”
Source: www.comscore.com
G
Microsoft’s Bing Redesigned 6/8/12 Social search results now located in the
newSocial Sidebar (Facebook-based)
When logged in through Facebook Ask friends Friends who might know People who know Feed of questions you’ve asked your FB
friends through Bing Without a FB login Sidebar results come
from public posts
What Bing is NOW
Travel- Price Predictor Video- Hover and get a preview Music: Artists – All content related to the
artist (entity-based search) Events – FanSnap (meta for ticket
purchasing) Shopping – Hottest deals on the web right
now Maps – Malls and Airports added Everywhere – Xbox, Mobile, iPad
Curating the web with Blekkohttp://blekko.com
Human/crawler service Blekko (human) editors create “topic” and
“built-in” slashtags used to label content in the Blekko crawler database.
Registered users can create their own tags for any site in the Blekko database for a personal, searchable web
Slashtags help refine results and eliminate spam
Small but well curated database
Blekko this year
Slashtags now automatically added to searches based on aggregated anonymous search behavior.
Adding /monte gives you results from 3 engines; sources revealed only after you select the most relevant results set
Received substantial investment from major Russian search engine Yandex
DuckDuckGo – http://ddg.gg
Home and search results pages redesigned
Related “Search Suggestions” on results pages
“Goodies” – user-supplied questions with answers in 20 broad categoriesEntertainment ProgrammingFood & Drink SysadminTravel Web Design
Google – The Highlights
Google+ plus.google.com
Google’s social network (requires a Google account)
Launched 9/19/11 (access to Twitter ended 7/2/11)
Currently over 400 m users, 100 m active on a monthly basis Facebook currently over 1.01 b. active
users Offers “hangouts” –video chat rooms
within the social network Businesses and organizations allowed
Google+
“Google+1” allows Google+ member to give a site a vote of approval
Web search results include +1 votes, sometimes location-based
Best access to content is through Google: site:plus.google.com search term(s)
Google’s NowAn Intelligent Digital Assistant
Turns spoken natural language queries into a search, returning customized answers
Uses search and other data from your mobile devices, g-mail and other Google services
The more personal, contextualizing data accessible, the more customized the answers
Currently available for mobile devices only
Search Lesson Plans and Common Core Standards
Part of Google’s search education initiative 5 main topics with beginner, intermediate and
advanced levels Picking the right search terms Understanding search results Narrowing a search to get the best results Searching for evidence for research tasks Evaluating credibility of sources
google.com/insidesearch/searcheducation/lessons.html
Personalization and Social Networks:Search Plus Your World Boosts in results ranking
Based on IP search behavior (Opt-out) Based on personal search behavior (Opt-
in) Based on your social networks (Opt-in) Based on Google+ public posts (Default;
multiple steps needed to opt-out) Based on your private Google+ network
posts(Opt-in)
IP-based personalization
To permanently opt-out go to Search Settings
To opt-out on a per-search basis use the toggle (top right)
Personalization based on your personal search behavior is still opt-in
APA Lawsuit settled
2005 – Association of American Publishers and McGraw-Hill, Person, Penguin, John Wiley, Simon & Shuster allege copyright violation in the Library scanning project
2012- Google settles with publishers, who may now remove their books or journals from the Library project
Author’s Guild suit remains unsettled (back in court 5/8/13-Second Circuit Court of Appeals)
SHARING USER INFORMATION HAS BECOME THE INDUSTRY NORM
A Privacy Primer
Search engines and privacy
NSA and Personal Data
Corporations historically restricted to reporting only total number of government information requests
6/5/13- Snowden leaks NSA documents on PRISM surveillance program; implicates 9 major Internet corporations
6/15/13-Yahoo, MS, Apple, FB successfully petition to disclose certain data on requests; not allowed to specify number of classified FISA requests
6/18/13-Google petitions FISA court for permission to reveal requesting agency, time frame and other details of the requests
Google’s policy for its account-based services New unified privacy policy in effect
3/1/12 User profiles and individual search
behavior will be shared among all Google services that require a login
Account holders cannot opt-out of this sharing
Separate privacy policies still in effect for Google Books and Chrome
Google’s policy for services not requiring an account
Covers Search, Youtube IP-based personalization in effect since
2009 “We will not combine Double-Click
cookie information with personally identifiable information unless we have your opt-in consent”
Bing’s privacy policy
For MS services that require a Windows Live ID “…information collected through one MS
service may be combined with information obtained through other Microsoft services.”
Signing into one service may automatically sign you into other Microsoft services
To opt-out Use separate browsers for each MS service
you access Sign in and out of your accounts throughout
the day to de-couple specific activities
DuckDuckGo
Does not collect or share personal information
No browser cookies stored No personally identifiable or IP-based
search histories stored No IP addresses stored Very comprehensive with high-quality
search results
Current Trends andFuture Directions
Search Engine Trends in 2012 Reversal in transparency at the major
services Increasing personalization as the norm Explosion of social network influence Stronger anti-competitive allegations Modest Bing marketshare gains
“The nature of the Internet is undergoing a paradigm shift” – Matthew Berk (Zyxt Labs)http://zyxt.com/post/26851542949/study-of-1-3-billion-urls-22-of-web-pages-reference
2012 study of 1.3 billion URLs 22% of web pages contain Facebook
URLs Among 500 m. hardcoded links to
Facebook only 3.5 m. are unique URLs from Common Crawl (open
repository of web crawl data that can be accessed and analyzed by everyone)
“The Internet is shifting….” – M. Berk from unstructured to structured content
Structured content can be parsed and formatted into any other type of content
Unstructured content- static html from websites to entities
Nodes in social and other networks that contain or link to websites and other content
from links to connection Growth of business and personal presence
on the social web
In the future ---
Mobile search will continue to grow rapidly Entity-based search will continue to develop Personalization will grow but more slowly as
users better understand the consequences Social networks will continue as powerful
tools for grassroots political movements Web access and web search will attract more
government scrutiny worldwide
Thank You and Enjoy Your Searching!
Michael HunterReference Librarian
Hobart and William Smith CollegesGeneva, NY 14456
(315) 781-3014 [email protected]