SEO scraping with Excel (Google suggest and more)

40
#SMConnect @Zen2SEO Search Marketing Connect - 20 e 21 Novembre 2015 SEO Scraping with Excel: From an “infinite” Google Suggest to SERPs estractions for several goals, without any cost and with no programming skills needed

Transcript of SEO scraping with Excel (Google suggest and more)

Page 1: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Search Marketing Connect - 20 e 21 Novembre 2015

SEO Scraping with Excel: From an “infinite” Google Suggest to SERPs estractions for several

goals, without any cost and with no programming skills needed

Page 2: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

salsa dancing + travel + crime novels + lot of fun

=

Giuseppe Pastore(unconventional SEO manager)

Say hello!

@Zen2SEO

Page 3: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Web Scraping - WhatWeb scraping = extracting information from websites, simulating human exploration with a

software

Page 4: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Web Scraping - Whyprice comparison, contact scraping, weather data monitoring, website change detection, research,web mashup and web data integration.

Page 5: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Web Scraping - HowLots of techniques... That need coding.

I can’t code, but I like Excel.

Page 6: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

ExcelSEO tools for Excel

RegExXpath

Page 7: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

http://seotoolsforexcel.com

Page 8: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Regular Expression (regex or regexp) = a

sequence of characters that define a search pattern, mainly for use in pattern

matching with strings

http://goo.gl/pqtNE0

Page 9: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Xpath = a query language for selecting nodes from

an XML document

//*[@id="rso"]/div/div/h3/a

Page 10: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

SCRAPING (EVERY!!!) SUGGEST

Page 11: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Google Suggest API to be discontinued

http://googlewebmastercentral.blogspot.it/2015/07/update-on-autocomplete-api.html

Page 12: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

UberSuggest (takes data from Bing)

http://ubersuggest.org

Page 13: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Keyword Tool

http://keywordtool.io

Page 14: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Target #1 – Google Suggest

Page 15: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

http://suggestqueries.google.com/complete/search?output=toolbar&hl=it&q=milan

Step 1

Page 16: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Step 2

=DownloadString("http://suggestqueries.google.com/complete/search?output=toolbar&hl=it&q="&A2)

<?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="milan"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan news"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano finanza"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano meteo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano marittima"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milanotoday"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano expo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano malpensa"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milanuncios"/></CompleteSuggestion></toplevel>

Downloading the entire page code

Page 17: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Step 3

=RegexpReplace(DownloadString("http://suggestqueries.google.com/complete/search?output=toolbar&hl="&B2&"&q="&A2);"((.*)toplevel>)?<CompleteSuggestion><suggestion(\s)data=";"")

"milan"/></CompleteSuggestion>....</toplevel>

<?xml version="1.0"?><toplevel><CompleteSuggestion> <suggestion data="milan"/></CompleteSuggestion>...</toplevel>

Deleting nodes opening

Page 18: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Step 4

=RegexpReplace(A11;"/></CompleteSuggestion>(</toplevel>)?";",")

"milan", "milan news","milano finanza", "milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",

"milan"/></CompleteSuggestion> "milano news"/></CompleteSuggestion>...</toplevel>

Deleting nodes closing

Page 19: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Step 5

=SINISTRA(A14;TROVA(",";A14;1))

"milan","milan", "milan news","milano finanza", "milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",

Finding comma and isolating everything at its left

Page 20: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Step 6

=RegexpReplace(SINISTRA(A17;TROVA(",";A17;1));""",?";"")

milan"milan",

Removing quotes: I’ve isolated the first result

Page 21: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Step 7

=DESTRA(A14;LUNGHEZZA(A14)-TROVA(",";A14;1))

"milan news","milano finanza","milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",

From the 10 results string I’m isolating the part that’s at the right of the first term

"milan","milan news","milano finanza", "milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",

143 caratteri

8 caratteri135 caratteri

Page 22: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

"milan news","milano finanza","milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",

Iterating 5-6-7

milanmilan news

milanomilano finanzamilano meteo

milano marittimamilano expo

milano malpensamilanotodaymilan store

=SINISTRA(A14;TROVA(",";A14;1))=RegexpReplace(SINISTRA(A17;TROVA(",";A17;1));""",?";"")=DESTRA(A14;LUNGHEZZA(A14)-TROVA(",";A14;1))

Page 23: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Iterating 5-6-7

=RegexpReplace(RegexpReplace(DownloadString("http://suggestqueries.google.com/complete/search?output=toolbar&tbm=&hl="&B2&"&lang_"&B2&"&q="&A2);"((.*)toplevel>)?<(/?Complete)?suggestion((\s)data=)?>?(</toplevel>)?";"");"/>";",")

Page 24: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Target #2 – Bing Suggest

Page 25: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

http://api.bing.com/osjson.aspx?query=milan

Step 1

12 resultsBased on IP

["milan",["milan news","milano finanza","milan","milano today","milano","milan live","milanotoday","milannews.it","milannews","milanofinanza.it","milano meteo","milan calciomercato"]]

https://hide.me/en/proxy

Page 26: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Target #3 – Amazon Suggest

Page 27: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

http://completion.amazon.com/search/complete?method=completion&q=%q&search-alias=aps&mkt=1

http://completion.amazon.co.uk/search/complete?method=completion&q=%q&search-alias=aps&mkt=4

http://completion.amazon.co.jp/search/complete?method=completion&q=%q&search-alias=aps&mkt=6

Aps = All Product Selection (?)

Step 1

["milano",["milani","milano cookies","milano bride","milano knife","kiko milano","milano moda","milano lego","giorgio milano","milano poker chips","milanos"],[{"sc":"1","nodes":[{"name":"Beauty","alias":"beauty"},{"name":"Health & Personal Care","alias":"hpc"}]},{},{},{},{},{},{},{},{},{}],[]]

Page 28: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Target #4 – Google Image Suggest

http://suggestqueries.google.com/complete/search?json&client=toolbar&ds=i&q=%q

<?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="milano"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano expo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano metro"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano skyline"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano marittima"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano metropolitana"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano navigli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan news"/></CompleteSuggestion></toplevel>

Page 29: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Target #5 – Youtube Suggest

http://suggestqueries.google.com/complete/search?json&client=toolbar&ds=yt&q =%q

<?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="milano bangkok"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli 0 4"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan palermo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan palermo 3 2"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli 0 4 auriemma"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli 0 4 crudeli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan udinese 3 2"/></CompleteSuggestion></toplevel>

Page 30: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Target #6 – Wikipedia Suggest

http://it.wikipedia.org/w/api.php?action=opensearch&search=%q

["milano",["Milano","Milano-Sanremo","Milano 2","Milano-Torino","Milano-Sanremo 2012","Milano-Sanremo 2014","Milano-Sanremo 2013","Milano-Sanremo 2015","Milano-Sanremo 2011","Milano-Sanremo 2010"],["Milano ( pronuncia /mi\u02c8lano/, in lombardo Milan, pronunciato /mi\u02c8l\u00e3\u02d0/ nel dialetto locale) \u00e8 una citt\u00e0 italiana di 1 342 806 abitanti, capoluogo dell'omonima citt\u00e0 metropolitana e della regione Lombardia, secondo comune italiano per numero di abitanti, tredicesimo comune dell'Unione europea e diciannovesimo del continente e, con l'agglomerato urbano, terza area metropolitana pi\u00f9 popolata d'Europa dietro Londra e Parigi.","La Milano-Sanremo \u00e8 una corsa in linea maschile di ciclismo su strada professionistico, una delle pi\u00f9 importanti corse ciclistiche del relativo circuito internazionale e prima grande classica nel calendario ciclistico stagionale.","Milano 2 (o anche Milano Due, abbreviato MI2 e M2) \u00e8 un quartiere residenziale sito nel territorio del comune italiano di Segrate, nella citt\u00e0 metropolitana di Milano.","La Milano-Torino \u00e8 una corsa in linea maschile di ciclismo su strada, che si svolge tra Milano e Torino, in Italia, ogni anno nel mese di ottobre, ed \u00e8 una delle classiche d'autunno.","La Milano-Sanremo 2012, centotreesima edizione della corsa, si \u00e8 disputata il 17 marzo 2012, per un percorso totale di 298 km.","La Milano-Sanremo 2014, centocinquesima edizione della corsa, valida come quarta prova del circuito UCI World Tour 2014, si svolse il 23 marzo 2014 su un percorso di 294km, con partenza da Milano ed arrivo a Sanremo.","La Milano-Sanremo 2013, centoquattresima edizione della corsa, si \u00e8 disputata il 17 marzo 2013 su un percorso accorciato per motivi meteorologici da 298 km a 255 km.","La Milano-Sanremo 2015, centoseiesima edizione della corsa, valida come quarta prova del circuito UCI World Tour 2015, si \u00e8 svolta il 22 marzo 2015 su un percorso di 293 km, con partenza da Milano ed arrivo a Sanremo.","La Milano-Sanremo 2011, centoduesima edizione della corsa, si \u00e8 disputata il 19 marzo 2011, per un percorso totale di 298 km.","La Milano-Sanremo 2010, centunesima edizione della corsa, si \u00e8 disputata il 20 marzo 2010 e ha affrontato un percorso totale di 298 km."],["https://it.wikipedia.org/wiki/Milano","https://it.wikipedia.org/wiki/Milano-Sanremo","https://it.wikipedia.org/wiki/Milano_2","https://it.wikipedia.org/wiki/Milano-Torino","https://it.wikipedia.org/wiki/Milano-Sanremo_2012","https://it.wikipedia.org/wiki/Milano-Sanremo_2014","https://it.wikipedia.org/wiki/Milano-Sanremo_2013","https://it.wikipedia.org/wiki/Milano-Sanremo_2015","https://it.wikipedia.org/wiki/Milano-Sanremo_2011","https://it.wikipedia.org/wiki/Milano-Sanremo_2010"]]

Page 31: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

SCRAPING (GOOGLE) SERPs

Page 32: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Target #2 – Google SERP

Page 33: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Xpath Identification

Step 1

//h3[@class='r']/a

Page 34: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Href element estraction

Step 2

=XPathOnUrl("https://www.google.it/search?q=%q&hl=it&&tbs=lr:lang_1it,qdr:a&prmd=ivns&num=10&source=lnt";"(//h3[@class='r']/a)["1"]";"href")

Page 35: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Target #3 – Google Cache

Page 36: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

http://webcache.googleusercontent.com/search?hl=it&q=cache:http://www.miosito.it

Step 1

Page 37: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

=RegexpFindOnUrl("http://webcache.googleusercontent.com/search?hl=it&q=cache%3Ahttp://www.giuseppepastore.com");"cache di Google di(.*)</a>\.(\s)")

Step 2

cache di Google di <a href="http://www.giuseppepastore.com" dir="ltr">http://www.giuseppepastore.com</a>.

=RegexpFindOnUrl("http://webcache.googleusercontent.com/search?hl=it&q=cache%3Ahttp://www.giuseppepastore.com");" visualizzata il(.*)GMT ")

visualizzata il 16 nov 2015 14:29:53 GMT

Page 38: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Conclusions

Google SuggestBing Suggest

Google Image SuggestYoutube SuggestAmazon Suggest

Wikipedia Suggest

(What-Ever-You-Want Suggest – as long you can query an URL)

Page 39: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Conclusions

Google SERPsGoogle Cache

(What-Ever-You-Want from any web page)

Page 40: SEO scraping with Excel (Google suggest and more)

#SMConnect @Zen2SEO

Thank you!Giuseppe Pastore

@Zen2SEO