SEO (Search Engine Optimization) vs SEM(Search Engine Marketing)
Develop open source search engine
-
Upload
iwillstudycom -
Category
Technology
-
view
1.064 -
download
2
Transcript of Develop open source search engine
![Page 1: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/1.jpg)
DEVELOP OPEN SOURCE SEARCH ENGINERitesh Ambastha – CEO, iWillStudy.com26th Feb 2012
![Page 2: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/2.jpg)
Open Source Search Engines
Sphinx Lucene DataparkSearch
Zettair YaCy Xapian
SWISH-E Seeks Recoll
OpenFTS Nutch Namazu
![Page 6: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/6.jpg)
We are going to talk about
Sphinx & Apache-Solr
![Page 7: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/7.jpg)
Sphinx
Sphinx is an open source full text search server.
It's written in C++ and works on Linux (RedHat, Ubuntu, etc), Windows, MacOS, Solaris, FreeBSD, and a few other systems.
Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily
![Page 8: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/8.jpg)
Sphinx
Text processing features Searching via SphinxAPI is as
simple as 3 lines of code, and querying via SphinxQL is even simpler
Sphinx clusters scale up to billions of documents and tens of millions search queries per day, powering top websites such as Craigslist, DailyMotion, NetLog, etc.
![Page 9: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/9.jpg)
Performance and scalability
Indexing performance: Sphinx indexes up to 10-15 MB of text per second per single CPU core.
Searching performance: Searching through 1,000,000-document, 1.2 GB text collection that they use for everyday development and testing runs at 500+ queries/sec on a 2-core desktop machine with 2 GB of RAM.
Scalability: Biggest known Sphinx cluster indexes almost 5 billion documents, resulting in over 6 TB of data.
Busiest known one is, unsurpisingly, Craigslist, top-10 website in the US that serves 50+ million search queries/day.
![Page 10: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/10.jpg)
Key Features
Batch and Real-Time full-text indexes Non-text attributes support SQL database indexing Non-SQL storage indexing Easy application integration Advanced full-text searching syntax Rich database-like querying features Better relevance ranking Flexible text processing Distributed searching
![Page 11: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/11.jpg)
http://lucene.apache.org/solr/
![Page 12: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/12.jpg)
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project.
![Page 13: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/13.jpg)
Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.
![Page 14: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/14.jpg)
Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat.
![Page 15: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/15.jpg)
Solr Features
Advanced Full-Text Search Capabilities Optimized for High Volume Web Traffic Standards Based Open Interfaces - XML,JSON and
HTTP Comprehensive HTML Administration Interfaces Server statistics exposed over JMX for monitoring Scalability - Efficient Replication to other Solr
Search Servers Flexible and Adaptable with XML configuration Extensible Plugin Architecture
![Page 16: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/16.jpg)
What is it all about?
![Page 17: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/17.jpg)
![Page 18: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/18.jpg)
Solr is based on Lucene
![Page 19: Develop open source search engine](https://reader035.fdocuments.us/reader035/viewer/2022062319/55560407d8b42a3f168b484d/html5/thumbnails/19.jpg)
More about Lucene