How to get your data into Sindice and Google with sitemap4rdf

Post on 28-Jan-2015

105 views 2 download

Tags:

description

 

Transcript of How to get your data into Sindice and Google with sitemap4rdf

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

How to get your data into Sindice and Google with

sitemap4rdfBoris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)

Digital Enterprise Research Institute www.deri.ie

Publishing Linked Data

from a triple store

Digital Enterprise Research Institute www.deri.ie

Linked Data frontends for triple stores

Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/

Digital Enterprise Research Institute www.deri.ie

Search engines

Digital Enterprise Research Institute www.deri.ie

Sindice: the best RDF search engine

Digital Enterprise Research Institute www.deri.ie

Sindice: the best RDF search engine

120M+ documents Continuously updating since 2006 Low-latency search API RDF/XML, Turtle, RDFa, microformats

Digital Enterprise Research Institute www.deri.ie

The Sitemap protocol

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol

Used by web crawlers Efficiently find all your content &

discover what has been updated

http://sitemaps.org/

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Simple example

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> </url> <url> <loc>http://yoursite/products/53546</loc> </url> <url> <loc>http://yoursite/products/98421</loc> </url> <url> <loc>http://yoursite/products/41003</loc> </url></urlset>

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Optional parts

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> <lastmod>2010-06-24</lastmod> <changefreq>daily</changefreq> </url></urlset>

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Huge sitemaps

Gzip-compress your sitemap Limit: 50k URLs or 10MB

split into multiple sitemap filesadd a sitemap index file

Digital Enterprise Research Institute www.deri.ie

Sitemap Protocol: Discovery

Publish the sitemap file Add a line to http://yoursite/robots.txt

Sitemap: http://yoursite/sitemap.xml

Digital Enterprise Research Institute www.deri.ie

sitemap4rdfGenerate Sitemap files from a SPARQL endpoint

Digital Enterprise Research Institute www.deri.ie

sitemap4rdf

Simple command line tool Sends a SPARQL query to list all URIs Generates sitemap

sitemap4rdf http://yoursite/sparql http://yoursite/resource/

Digital Enterprise Research Institute www.deri.ie

Submit the sitemap location - Sindice

http://sindice.com/main/submit

Digital Enterprise Research Institute www.deri.ie

Submit the sitemap location - Google

https://www.google.com/webmasters/tools/

Digital Enterprise Research Institute www.deri.ie

Summary

Sitemap protocol informs search engines about available pages Supported by Sindice!

sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint Open source, Java http://lab.linkeddata.deri.ie/2010/sitemap4rdf/