Searching the Internet More Effectively

56
Searching the Internet More Effectively Barnsley 29 th February 2012 Karen Blakeman RBA Information Services Slides are available at http://www.rba.co.uk /as/ [email protected] o.uk Twitter: @karenblakeman http://www.rba.co.uk This presentation is licensed under a Creative Commons Attribution 3.0 License

Transcript of Searching the Internet More Effectively

Page 1: Searching the Internet More Effectively

Searching the Internet More Effectively

Barnsley29th February 2012

Karen Blakeman

RBA Information Services

Slides are available at http://www.rba.co.uk/as/

[email protected]

Twitter: @karenblakeman

http://www.rba.co.uk/

This presentation is licensed under a Creative Commons Attribution 3.0 License

Page 2: Searching the Internet More Effectively

How it all started

Before 1992 priced electronic databases - for example Lexis (legal), Nexis (news), technical/scientific data – and print (government Daily Lists, Annual Reports, directories, local newspapers, official statistics)

1992 – the Internet can be accessed by anyone but 2-3 years before significant information started appearing on the web

Increase in amount of data and information led to the development of tools that indexed and searched the content of web pages

Lycos, Excite, AltaVista, Hotbot

11/04/23 www.rba.co.uk 2

Page 3: Searching the Internet More Effectively

How the search tools worked (and still do in part)

"Crawl" the internet looking for new and updated pages by following links

Copies of pages and documents added to a database that is publicly searchable

Results sorted according to:– how often the words you looked for appear in the page

– where they appear (words in the title and first few sentences given higher ranking)

– and many other criteria not disclosed by the search engines

They do not cover:– password protected sites

– databases or sites where you have to fill in a form to find the information, for example Companies House

11/04/23 www.rba.co.uk 3

Page 4: Searching the Internet More Effectively

Then along came.....

11/04/23 www.rba.co.uk 4

11 November 1998The Internet Archive www.archive.org

Page 5: Searching the Internet More Effectively

How was Google different?

11/04/23 www.rba.co.uk 5

Links (citations) a major part of ordering search results

http://www.seobook.com/learn-seo/collateral-damage.php

Page 6: Searching the Internet More Effectively

Where is Google now?

11/04/23 www.rba.co.uk 6

2001 Revenues $86,426 thousandsNet Income $10,964 thousands

2011Revenues $37,905 millionsNet Income $9,737 millionshttp://investor.google.com/financial/tables.html

2011 – 96% of revenues are from advertising Google is mass market consumer oriented. Serious researchers wanting reliable, structured search are a miniscule fraction of their customer base.

Page 7: Searching the Internet More Effectively

How Google organises and sorts information

Has a primary index of higher "quality" documents and a secondary index. Only the primary index is searched when running straightforward searches. Secondary index comes into play with more complex searches and if a small number of results are found.

“Dear Bing, We Have 10,000 Ranking Signals To Your 1,000. Love, Google” http://searchengineland.com/bing-10000-ranking-signals-google-55473

Over 200 hundred “signals” and each may have over 50 variations11/04/23 www.rba.co.uk 7

Page 8: Searching the Internet More Effectively

How Google ranks and organises your results

11/04/23 www.rba.co.uk 8

Google personalizes and tailors your results depending on your location, computer/device, browser, past searches, what you have looked at in the past, your +1s, your Google+ account, what you had for breakfast...and anything else it can find by rummaging around in your Google dashboard

To see what's in your dashboard log in to your Google account and go to http://www.google.com/dashboard/ Also see Google personalisation: web history isn’t the only problem http://www.rba.co.uk/wordpress/2012/02/22/google-personalisation-web-history-isnt-the-only-problem/

Page 9: Searching the Internet More Effectively

What I see on my screen for a search is not what you’ll see on yours.

11/04/23 www.rba.co.uk 9

Page 10: Searching the Internet More Effectively

Google knows best!

11/04/23 www.rba.co.uk10

Hewish mild

Google decided to change my search to Jewish mild without asking

Placing a phrase within quote marks – "Hewish mild" – will usually force an exact match

Google automatically looks for variations of your search terms

Page 11: Searching the Internet More Effectively

For 10 days in February 2011: coots = lions

11/04/23 www.rba.co.uk 11

Google decides that coots are really lions– http://www.rba.co.uk/wordpress/2011/02/12/google-decides-that-co

ots-are-really-lions/

Update on coots vs. lions– http://www.rba.co.uk/wordpress/2011/02/21/update-on-coots-vs-lio

ns/

Page 12: Searching the Internet More Effectively

Coots = lions

11/04/23 www.rba.co.uk 12

Page 13: Searching the Internet More Effectively

Three search tricks

These three techniques can change what Google (and other search engines) decides to give you and also the order of the results.

Repeat important search termscoots coots mating behaviour (found coots)

Change the order of your termsmating behaviour coots (found coots)

Change one of your search termscoots mating behaviour (found lions)coots courtship behaviour (found coots)coots mating ritual (found coots)

11/04/23 www.rba.co.uk 13

Page 14: Searching the Internet More Effectively

Excluding pages containing words

Want to exclude pages containing a term? Place a - (minus sign) before the term

Use with care as may miss important material

Excluding lions from our bizarre coots search

coots mating behaviour –lions

gave us:

11/04/23 www.rba.co.uk 14

Page 15: Searching the Internet More Effectively

Coots=lions was an extreme example of how Google can work

We think Google was doing the following:

- assumed a typing error or was running a mobile/smartphone predictive text algorithm (coots=cats)

- ran an automatic variation/synonym search on cats

- used a search frequency rule and found that lions mating behaviour was requested more than cats

11/04/23 www.rba.co.uk 15

Page 16: Searching the Internet More Effectively

Dear Google, stop messing with my search http://www.rba.co.uk/wordpress/2011/11/08/dear-google-stop-messing-with-my-search/

11/04/23www.rba.co.uk

16

Google no longer looks for all of your terms in a page

Page 17: Searching the Internet More Effectively

See what Google sees

11/04/23 www.rba.co.uk 17

Hover over a result and a "preview" of the page should appear to the right together with a Cached link – this is Google's copy

Page 18: Searching the Internet More Effectively

“When you do a multi-term query on Google (even with quoted terms), the algorithm sometimes backs-off from hard ANDing all of the terms together.......it’s clear that people will often write long queries (with anywhere from 5 to 10 terms) for which there are no results. Google will then selectively remove the terms that are the lowest frequency to give you some results (rather than none)....Soft AND is a way to reduce the overall frustration and give the searcher something to examine (and with luck, a chance to reformulate their query).”

Dan Russell

http://www.rba.co.uk/wordpress/2011/11/08/dear-google-stop-messing-with-my-search/#comments

11/04/23 www.rba.co.uk 18

Page 19: Searching the Internet More Effectively

Verbatim

Forces Google to run an exact

match search. Run your search first

and then select Verbatim from the

left hand menu on your results page

Cannot be combined with time

options in the side bar

Google: Verbatim for exact match

search

http://www.rba.co.uk/wordpress/2011/11/18/google-verbatim-for-exact-match-search/

11/04/23 www.rba.co.uk 19

Page 20: Searching the Internet More Effectively

Google doing its own thing can be good

11/04/23 www.rba.co.uk 20

Page 21: Searching the Internet More Effectively

Google's new(ish) social network Google Plus (Google+)

http://plus.google.com/

Google trying forcing people to create a Google+ profile http://marketingland.com/google-now-forcing-all-new-users-to-create-google-enabled-accounts-3912

Search Plus Your World (SPYW) referred to as Search+ now available in Google.com and is the default. Gives priority to content from people in your Google+ network if you are signed in to your account.

(And the next Google killer is….Google! http://www.rba.co.uk/wordpress/2012/01/30/and-the-next-google-killer-is-google/ )

11/04/23 www.rba.co.uk 21

Page 22: Searching the Internet More Effectively

11/04/23 www.rba.co.uk 22

Before After

SPYW Currently being tested on Google.com

Page 23: Searching the Internet More Effectively

11/04/23 www.rba.co.uk 23

SPYW Currently being tested on Google.com

Page 24: Searching the Internet More Effectively

Google results side bar

These help you focus your search

Vary depending on type of search e.g. web, news, images

Open up the "more" options to see everything

11/04/23 www.rba.co.uk 24

Page 25: Searching the Internet More Effectively

Google side bars

11/04/23 www.rba.co.uk 25

Images Videos News Books Blogs

Page 26: Searching the Internet More Effectively

11/04/23 www.rba.co.uk 26

Google images – not always what you expect

Search for patent and select the colour red from the side bar (Thanks to Arthur Weiss for the example)

Page 27: Searching the Internet More Effectively

Related searches

11/04/23 www.rba.co.uk 27

Page 28: Searching the Internet More Effectively

Translated foreign pages for a different perspective

Google suggests languages from context of search but you can choose your own

Your search is translated and the results are translated into your language

11/04/23 www.rba.co.uk 28

Page 29: Searching the Internet More Effectively

Problems finding information on a particular site?

Use Google's site: command

For example, trying to find information on Reading Borough Council's recycling policy by searching reading.gov.uk

11/04/23 www.rba.co.uk 29

Page 30: Searching the Internet More Effectively

Go to Google and type in

recycling policy site:reading.gov.uk

11/04/23 www.rba.co.uk 30

Page 31: Searching the Internet More Effectively

Or if you are interested in all government (central, departmental and local) recycling policies:

recycling policy site:gov.uk

11/04/23 www.rba.co.uk 31

Page 32: Searching the Internet More Effectively

Combine with date option in the side bar

11/04/23 www.rba.co.uk 32

Page 33: Searching the Internet More Effectively

LGSearch

http://lgsearch.net/

Google Custom Search Engine (CSE)

11/04/23 www.rba.co.uk 33

Page 34: Searching the Internet More Effectively

Create your own Google custom search engine

http://www.google.com/cse/

For– regularly searched sites

– selected sites on a subject or type of organisation

Cannot include password protected sources or sites where you have to fill in a form to access the information

Information on setting up a Google Custom Search Engine (CSE)

http://www.rba.co.uk/as/GoogleCustomSearchEngines.doc

Google's blog on custom search http://googlecustomsearch.blogspot.com/

11 April 2023 Karen Blakeman www.rba.co.uk 34

Page 35: Searching the Internet More Effectively

Looking for a particular type of information for example statistics, research report, expert presentation?

Use the filetype: command

For statistics car ownership UK filetype:xls car ownership UK filetype:xlsx

For government, research, industry reports UK oil consumption forecasts filetype:pdf

For conference presentations or trying to locate an expert renewable energy UK filetype:ppt renewable energy UK filetype:pptx

11/04/23 www.rba.co.uk 35

Page 36: Searching the Internet More Effectively

Can combine commands

renewable energy UK filetype:ppt site:ac.uk

Advanced search screen with more options at http://www.google.co.uk/advanced_search

Selected Google Commands

http://www.rba.co.uk/search/SelectedGoogleCommands.shtml

11/04/23 www.rba.co.uk 36

Page 37: Searching the Internet More Effectively

Google alternatives - Bing and Yahoo

Yahoo now uses Bing.com’s database and ranking

Many of the Advanced Search commands are similar to Google’s, see Search Tools Summary and Comparison http://www.rba.co.uk/search/compare.shtml

Most of the interesting developments and features are only available in the US version

Results tend to be more consumer/retail focused unless using advanced search features

Coverage not identical to Google’s - sometimes yields important unique content

Sometimes more up to date than Google

11/04/23 www.rba.co.uk 37

Page 38: Searching the Internet More Effectively

DuckDuckGo

http://duckduckgo.com/

DuckDuckGo – silly name but a neat little search tool http://www.rba.co.uk/wordpress/2011/11/07/duckduckgo-silly-name-but-a-neat-little-search-tool/

No tracking, no “filter bubble”

Commandssite: filetype: sort:date to sort by date (uses results from Blekko)

Syntax and keyboard shortcuts at http://duckduckgo.com/goodies.html

11/04/23 www.rba.co.uk 38

Page 39: Searching the Internet More Effectively

Flickr to search for images

Use the default search box or Flickr Creative Commons http://www.flickr.com/creativecommons or advanced search screen http://www.flickr.com/search/advanced/

11/04/23 www.rba.co.uk 39

Page 40: Searching the Internet More Effectively

Statistics http://www.offstats.auckland.ac.nz/

11/04/23 www.rba.co.uk 40

Page 41: Searching the Internet More Effectively

MySociety http://www.mysociety.org/projects/

11/04/23 www.rba.co.uk 41

Page 42: Searching the Internet More Effectively

MySociety http://www.mysociety.org/more-projects/

11/04/23 www.rba.co.uk 42

Page 43: Searching the Internet More Effectively

Police.uk - Local crime and policing information for England and Wales : http://www.police.uk/

11/04/23 www.rba.co.uk 43

Page 44: Searching the Internet More Effectively

Linkedin.com

Professional network

For people and companies

For identifying experts in a field

Boolean Black Belt-Sourcing/Recruiting http://www.booleanblackbelt.com/

11/04/23 www.rba.co.uk 44

Page 45: Searching the Internet More Effectively

Facebook

Personal and business pages relatively easy to find

No easy way to search content within pages

11/04/23 www.rba.co.uk 45

Page 46: Searching the Internet More Effectively

Local "stuff"

Web pages, local papers, "what's on", local forums/discussion boards, Facebook pages, Twitter

Twitter search http://search.twitter.com/

Socialmention http://www.socialmention.com/

Topsy http://www.topsy.com/

Icerocket http://www.icerocket.com/

Set up 'lists' (can be kept private) - view through Twitter.com, desktop program or mobile app

11/04/23 www.rba.co.uk 46

Page 47: Searching the Internet More Effectively

My local stuff on Tweetdeck

11/04/23 www.rba.co.uk 47

Page 48: Searching the Internet More Effectively

Paper.li - create your own newspaper

11/04/23 www.rba.co.uk 48

Page 49: Searching the Internet More Effectively

Paper.li http://paper.li/karenblakeman/1330359266

11/04/23 www.rba.co.uk 49

Page 50: Searching the Internet More Effectively

Copyright

Always check the copyright of anything that you want to use or incorporate into a document or web page

Always, always check and double check the copyright of images - may have a digital watermark and be tracked e.g. Digimarc

Creative Commons does not mean you can do what you like with the text/image

– six licences http://creativecommons.org/licenses/

“Open-licencing your images. What it means and how to do it.” Andy Mabbett aka pigsonthewing

– http://pigsonthewing.org.uk/open-licencing-images-what-how/

Karen Blakeman's Blog “Free-to-use images might not be”– http://www.rba.co.uk/wordpress/2009/07/16/free-to-use-images-might-not-be/

11/04/23 www.rba.co.uk 50

Page 51: Searching the Internet More Effectively

Evaluating resources

Type of web site for example:– gov.uk, ac.uk, .gov, .edu

Who is really behind the site? – use a domain name register such as http://whois.domaintools.com

– you do NOT want to see that the domain name is hosted by an organisation such as this:

11/04/23 www.rba.co.uk 51

Page 52: Searching the Internet More Effectively

Evaluating resources

Date of publication, 'last updated'

Check text for clues of publication date

Stated date for a web page or document may be automatically generated when it is put onto the web site

After a web site redesign pages are re-uploaded and are given a new publication date

Some pages are generated "on the fly" so will always have today's date

11/04/23 www.rba.co.uk 52

Page 53: Searching the Internet More Effectively

Quoting and referencing

Make it clear when you are quoting someone else and always quote the source of data

Give at least the title of the article and URL in the text of a document

Full reference:– author (and/or organisation), title of page/document, URL (web

address – do not use shortened URLs), date of publication (if known), date you accessed the document

– George Monbiot, In Praise of Distrust http://www.monbiot.com/2012/02/27/in-praise-of-distrust/, 27th February 2012, [Accessed 28th February 2012]

– organisations and publishers may have their own preferred format

If the information is critical make a local copy

11/04/23 www.rba.co.uk 53

Page 54: Searching the Internet More Effectively

Keeping up to date

Inside Search http://insidesearch.blogspot.com/

Official Google Blog http://googleblog.blogspot.com/

Google Scholar Blog http://googlescholar.blogspot.com/

Search Engine Land http://searchengineland.com/

Search Engine Watch http://searchenginewatch.com/

Boolean Black Belt-Sourcing/Recruiting http://www.booleanblackbelt.com/

Karen Blakeman’s Blog http://www.rba.co.uk/wordpress/

Phil Bradley's weblog http://philbradley.typepad.com/

11/04/23 www.rba.co.uk 54

Page 55: Searching the Internet More Effectively

http://elgin.gov.uk/

11/04/23 www.rba.co.uk 55

Page 56: Searching the Internet More Effectively

11/04/23 www.rba.co.uk 56

When are road works not road works?

When they are classified as Network Rail bridge works!

http://www.flickr.com/photos/rbainfo/5911913498/ CC 3.0 Attribution Non-commercial