SEO scraping with Excel (Google suggest and more)
-
Upload
giuseppe-pastore -
Category
Marketing
-
view
89 -
download
1
Transcript of SEO scraping with Excel (Google suggest and more)
#SMConnect @Zen2SEO
Search Marketing Connect - 20 e 21 Novembre 2015
SEO Scraping with Excel: From an “infinite” Google Suggest to SERPs estractions for several
goals, without any cost and with no programming skills needed
#SMConnect @Zen2SEO
salsa dancing + travel + crime novels + lot of fun
=
Giuseppe Pastore(unconventional SEO manager)
Say hello!
@Zen2SEO
#SMConnect @Zen2SEO
Web Scraping - WhatWeb scraping = extracting information from websites, simulating human exploration with a
software
#SMConnect @Zen2SEO
Web Scraping - Whyprice comparison, contact scraping, weather data monitoring, website change detection, research,web mashup and web data integration.
#SMConnect @Zen2SEO
Web Scraping - HowLots of techniques... That need coding.
I can’t code, but I like Excel.
#SMConnect @Zen2SEO
ExcelSEO tools for Excel
RegExXpath
#SMConnect @Zen2SEO
http://seotoolsforexcel.com
#SMConnect @Zen2SEO
Regular Expression (regex or regexp) = a
sequence of characters that define a search pattern, mainly for use in pattern
matching with strings
http://goo.gl/pqtNE0
#SMConnect @Zen2SEO
Xpath = a query language for selecting nodes from
an XML document
//*[@id="rso"]/div/div/h3/a
#SMConnect @Zen2SEO
SCRAPING (EVERY!!!) SUGGEST
#SMConnect @Zen2SEO
Google Suggest API to be discontinued
http://googlewebmastercentral.blogspot.it/2015/07/update-on-autocomplete-api.html
#SMConnect @Zen2SEO
UberSuggest (takes data from Bing)
http://ubersuggest.org
#SMConnect @Zen2SEO
Keyword Tool
http://keywordtool.io
#SMConnect @Zen2SEO
Target #1 – Google Suggest
#SMConnect @Zen2SEO
http://suggestqueries.google.com/complete/search?output=toolbar&hl=it&q=milan
Step 1
#SMConnect @Zen2SEO
Step 2
=DownloadString("http://suggestqueries.google.com/complete/search?output=toolbar&hl=it&q="&A2)
<?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="milan"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan news"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano finanza"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano meteo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano marittima"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milanotoday"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano expo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano malpensa"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milanuncios"/></CompleteSuggestion></toplevel>
Downloading the entire page code
#SMConnect @Zen2SEO
Step 3
=RegexpReplace(DownloadString("http://suggestqueries.google.com/complete/search?output=toolbar&hl="&B2&"&q="&A2);"((.*)toplevel>)?<CompleteSuggestion><suggestion(\s)data=";"")
"milan"/></CompleteSuggestion>....</toplevel>
<?xml version="1.0"?><toplevel><CompleteSuggestion> <suggestion data="milan"/></CompleteSuggestion>...</toplevel>
Deleting nodes opening
#SMConnect @Zen2SEO
Step 4
=RegexpReplace(A11;"/></CompleteSuggestion>(</toplevel>)?";",")
"milan", "milan news","milano finanza", "milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",
"milan"/></CompleteSuggestion> "milano news"/></CompleteSuggestion>...</toplevel>
Deleting nodes closing
#SMConnect @Zen2SEO
Step 5
=SINISTRA(A14;TROVA(",";A14;1))
"milan","milan", "milan news","milano finanza", "milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",
Finding comma and isolating everything at its left
#SMConnect @Zen2SEO
Step 6
=RegexpReplace(SINISTRA(A17;TROVA(",";A17;1));""",?";"")
milan"milan",
Removing quotes: I’ve isolated the first result
#SMConnect @Zen2SEO
Step 7
=DESTRA(A14;LUNGHEZZA(A14)-TROVA(",";A14;1))
"milan news","milano finanza","milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",
From the 10 results string I’m isolating the part that’s at the right of the first term
"milan","milan news","milano finanza", "milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",
143 caratteri
8 caratteri135 caratteri
#SMConnect @Zen2SEO
"milan news","milano finanza","milan","milano","milano meteo","milano marittima","milano expo","milano malpensa","milanotoday","milan store",
Iterating 5-6-7
milanmilan news
milanomilano finanzamilano meteo
milano marittimamilano expo
milano malpensamilanotodaymilan store
=SINISTRA(A14;TROVA(",";A14;1))=RegexpReplace(SINISTRA(A17;TROVA(",";A17;1));""",?";"")=DESTRA(A14;LUNGHEZZA(A14)-TROVA(",";A14;1))
#SMConnect @Zen2SEO
Iterating 5-6-7
=RegexpReplace(RegexpReplace(DownloadString("http://suggestqueries.google.com/complete/search?output=toolbar&tbm=&hl="&B2&"&lang_"&B2&"&q="&A2);"((.*)toplevel>)?<(/?Complete)?suggestion((\s)data=)?>?(</toplevel>)?";"");"/>";",")
#SMConnect @Zen2SEO
Target #2 – Bing Suggest
#SMConnect @Zen2SEO
http://api.bing.com/osjson.aspx?query=milan
Step 1
12 resultsBased on IP
["milan",["milan news","milano finanza","milan","milano today","milano","milan live","milanotoday","milannews.it","milannews","milanofinanza.it","milano meteo","milan calciomercato"]]
https://hide.me/en/proxy
#SMConnect @Zen2SEO
Target #3 – Amazon Suggest
#SMConnect @Zen2SEO
http://completion.amazon.com/search/complete?method=completion&q=%q&search-alias=aps&mkt=1
http://completion.amazon.co.uk/search/complete?method=completion&q=%q&search-alias=aps&mkt=4
http://completion.amazon.co.jp/search/complete?method=completion&q=%q&search-alias=aps&mkt=6
Aps = All Product Selection (?)
Step 1
["milano",["milani","milano cookies","milano bride","milano knife","kiko milano","milano moda","milano lego","giorgio milano","milano poker chips","milanos"],[{"sc":"1","nodes":[{"name":"Beauty","alias":"beauty"},{"name":"Health & Personal Care","alias":"hpc"}]},{},{},{},{},{},{},{},{},{}],[]]
#SMConnect @Zen2SEO
Target #4 – Google Image Suggest
http://suggestqueries.google.com/complete/search?json&client=toolbar&ds=i&q=%q
<?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="milano"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano expo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano metro"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano skyline"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano marittima"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano metropolitana"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano navigli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan news"/></CompleteSuggestion></toplevel>
#SMConnect @Zen2SEO
Target #5 – Youtube Suggest
http://suggestqueries.google.com/complete/search?json&client=toolbar&ds=yt&q =%q
<?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="milano bangkok"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli 0 4"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan palermo"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan palermo 3 2"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milano"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli 0 4 auriemma"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan napoli 0 4 crudeli"/></CompleteSuggestion><CompleteSuggestion><suggestion data="milan udinese 3 2"/></CompleteSuggestion></toplevel>
#SMConnect @Zen2SEO
Target #6 – Wikipedia Suggest
http://it.wikipedia.org/w/api.php?action=opensearch&search=%q
["milano",["Milano","Milano-Sanremo","Milano 2","Milano-Torino","Milano-Sanremo 2012","Milano-Sanremo 2014","Milano-Sanremo 2013","Milano-Sanremo 2015","Milano-Sanremo 2011","Milano-Sanremo 2010"],["Milano ( pronuncia /mi\u02c8lano/, in lombardo Milan, pronunciato /mi\u02c8l\u00e3\u02d0/ nel dialetto locale) \u00e8 una citt\u00e0 italiana di 1 342 806 abitanti, capoluogo dell'omonima citt\u00e0 metropolitana e della regione Lombardia, secondo comune italiano per numero di abitanti, tredicesimo comune dell'Unione europea e diciannovesimo del continente e, con l'agglomerato urbano, terza area metropolitana pi\u00f9 popolata d'Europa dietro Londra e Parigi.","La Milano-Sanremo \u00e8 una corsa in linea maschile di ciclismo su strada professionistico, una delle pi\u00f9 importanti corse ciclistiche del relativo circuito internazionale e prima grande classica nel calendario ciclistico stagionale.","Milano 2 (o anche Milano Due, abbreviato MI2 e M2) \u00e8 un quartiere residenziale sito nel territorio del comune italiano di Segrate, nella citt\u00e0 metropolitana di Milano.","La Milano-Torino \u00e8 una corsa in linea maschile di ciclismo su strada, che si svolge tra Milano e Torino, in Italia, ogni anno nel mese di ottobre, ed \u00e8 una delle classiche d'autunno.","La Milano-Sanremo 2012, centotreesima edizione della corsa, si \u00e8 disputata il 17 marzo 2012, per un percorso totale di 298 km.","La Milano-Sanremo 2014, centocinquesima edizione della corsa, valida come quarta prova del circuito UCI World Tour 2014, si svolse il 23 marzo 2014 su un percorso di 294km, con partenza da Milano ed arrivo a Sanremo.","La Milano-Sanremo 2013, centoquattresima edizione della corsa, si \u00e8 disputata il 17 marzo 2013 su un percorso accorciato per motivi meteorologici da 298 km a 255 km.","La Milano-Sanremo 2015, centoseiesima edizione della corsa, valida come quarta prova del circuito UCI World Tour 2015, si \u00e8 svolta il 22 marzo 2015 su un percorso di 293 km, con partenza da Milano ed arrivo a Sanremo.","La Milano-Sanremo 2011, centoduesima edizione della corsa, si \u00e8 disputata il 19 marzo 2011, per un percorso totale di 298 km.","La Milano-Sanremo 2010, centunesima edizione della corsa, si \u00e8 disputata il 20 marzo 2010 e ha affrontato un percorso totale di 298 km."],["https://it.wikipedia.org/wiki/Milano","https://it.wikipedia.org/wiki/Milano-Sanremo","https://it.wikipedia.org/wiki/Milano_2","https://it.wikipedia.org/wiki/Milano-Torino","https://it.wikipedia.org/wiki/Milano-Sanremo_2012","https://it.wikipedia.org/wiki/Milano-Sanremo_2014","https://it.wikipedia.org/wiki/Milano-Sanremo_2013","https://it.wikipedia.org/wiki/Milano-Sanremo_2015","https://it.wikipedia.org/wiki/Milano-Sanremo_2011","https://it.wikipedia.org/wiki/Milano-Sanremo_2010"]]
#SMConnect @Zen2SEO
SCRAPING (GOOGLE) SERPs
#SMConnect @Zen2SEO
Target #2 – Google SERP
#SMConnect @Zen2SEO
Xpath Identification
Step 1
//h3[@class='r']/a
#SMConnect @Zen2SEO
Href element estraction
Step 2
=XPathOnUrl("https://www.google.it/search?q=%q&hl=it&&tbs=lr:lang_1it,qdr:a&prmd=ivns&num=10&source=lnt";"(//h3[@class='r']/a)["1"]";"href")
#SMConnect @Zen2SEO
Target #3 – Google Cache
#SMConnect @Zen2SEO
http://webcache.googleusercontent.com/search?hl=it&q=cache:http://www.miosito.it
Step 1
#SMConnect @Zen2SEO
=RegexpFindOnUrl("http://webcache.googleusercontent.com/search?hl=it&q=cache%3Ahttp://www.giuseppepastore.com");"cache di Google di(.*)</a>\.(\s)")
Step 2
cache di Google di <a href="http://www.giuseppepastore.com" dir="ltr">http://www.giuseppepastore.com</a>.
=RegexpFindOnUrl("http://webcache.googleusercontent.com/search?hl=it&q=cache%3Ahttp://www.giuseppepastore.com");" visualizzata il(.*)GMT ")
visualizzata il 16 nov 2015 14:29:53 GMT
#SMConnect @Zen2SEO
Conclusions
Google SuggestBing Suggest
Google Image SuggestYoutube SuggestAmazon Suggest
Wikipedia Suggest
(What-Ever-You-Want Suggest – as long you can query an URL)
#SMConnect @Zen2SEO
Conclusions
Google SERPsGoogle Cache
(What-Ever-You-Want from any web page)
#SMConnect @Zen2SEO
Thank you!Giuseppe Pastore
@Zen2SEO