Post on 28-Jan-2015
description
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
How to get your data into Sindice and Google with
sitemap4rdfBoris Villazón-Terrazas (OEG), Richard Cyganiak (DERI)
Digital Enterprise Research Institute www.deri.ie
Publishing Linked Data
from a triple store
Digital Enterprise Research Institute www.deri.ie
Linked Data frontends for triple stores
Source: Pubby website, http://www4.wiwiss.fu-berlin.de/pubby/
Digital Enterprise Research Institute www.deri.ie
Search engines
Digital Enterprise Research Institute www.deri.ie
Sindice: the best RDF search engine
Digital Enterprise Research Institute www.deri.ie
Sindice: the best RDF search engine
120M+ documents Continuously updating since 2006 Low-latency search API RDF/XML, Turtle, RDFa, microformats
Digital Enterprise Research Institute www.deri.ie
The Sitemap protocol
Digital Enterprise Research Institute www.deri.ie
Sitemap Protocol
Used by web crawlers Efficiently find all your content &
discover what has been updated
http://sitemaps.org/
Digital Enterprise Research Institute www.deri.ie
Sitemap Protocol: Simple example
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> </url> <url> <loc>http://yoursite/products/53546</loc> </url> <url> <loc>http://yoursite/products/98421</loc> </url> <url> <loc>http://yoursite/products/41003</loc> </url></urlset>
Digital Enterprise Research Institute www.deri.ie
Sitemap Protocol: Optional parts
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://yoursite/</loc> <lastmod>2010-06-24</lastmod> <changefreq>daily</changefreq> </url></urlset>
Digital Enterprise Research Institute www.deri.ie
Sitemap Protocol: Huge sitemaps
Gzip-compress your sitemap Limit: 50k URLs or 10MB
split into multiple sitemap filesadd a sitemap index file
Digital Enterprise Research Institute www.deri.ie
Sitemap Protocol: Discovery
Publish the sitemap file Add a line to http://yoursite/robots.txt
Sitemap: http://yoursite/sitemap.xml
Digital Enterprise Research Institute www.deri.ie
sitemap4rdfGenerate Sitemap files from a SPARQL endpoint
Digital Enterprise Research Institute www.deri.ie
sitemap4rdf
Simple command line tool Sends a SPARQL query to list all URIs Generates sitemap
sitemap4rdf http://yoursite/sparql http://yoursite/resource/
Digital Enterprise Research Institute www.deri.ie
Submit the sitemap location - Sindice
http://sindice.com/main/submit
Digital Enterprise Research Institute www.deri.ie
Submit the sitemap location - Google
https://www.google.com/webmasters/tools/
Digital Enterprise Research Institute www.deri.ie
Summary
Sitemap protocol informs search engines about available pages Supported by Sindice!
sitemap4rdf generates Sitemap files by listing URIs in a SPARQL endpoint Open source, Java http://lab.linkeddata.deri.ie/2010/sitemap4rdf/