Java Search Engine Framework

download Java Search Engine Framework

If you can't read please download the document

description

Flavio Marchi talks about Java search engine framework during Appsterdam TalkLab

Transcript of Java Search Engine Framework

  • 1.Java Search Engine Framework

2. soluzioni Regular expression (can be slow and memory hungry) Lucene (full-text search engine library) Solr (standalone full-text search server ) SolrJ (java client per solr) 3. Regular expression (cos) una sequenza di simboli (quindi una stringa) che identifica un insieme di stringhe (che fa) definisce una funzione che prende in ingresso una stringa, e restituisce in uscita un valore del tipo s/no, a seconda che la stringa segua o meno un certo pattern. 4. Regular expression (esempio) 1. Pattern p = Pattern.compile("eur*usd"); 2. Matcher m = p.matcher( 3. In quel ramo del lago di eUr&uSd).toLowerCase() 4. ); 5. If(m.find()) { //trovato! Ma dove nella stringa? 6. } 5. Lucene Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Apache Software Foundation Stable release 4.3.0 / May 6, 2013 Development status Active 6. Lucene (esempio) Analyzer analyzer = null; Directory index = null; IndexWriterConfig config = null; IndexWriter w = null; //analyzer = new StandardAnalyzer(Version.LUCENE_43); analyzer = new KeywordAnalyzer(); index = new RAMDirectory(); config = new IndexWriterConfig(Version.LUCENE_43, analyzer); w = new IndexWriter(index, config); 7. Lucene (esempio 2) 1. private void addDoc(long time, String value, String flag) throws Exception { 2. Document doc = new Document(); 3. doc.add(new StringField("time", String.valueOf(time), Field.Store.YES)); 4. doc.add(new StringField("value", value, Field.Store.YES)); 5. doc.add(new StringField("flag", flag, Field.Store.YES)); 6. w.addDocument(doc); 7. } w.commit(); //da eseguire alla fine del batch 8. Lucene (esempio 3) 1. IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(index)); 2. MultiFieldQueryParser queryParser = new MultiFieldQueryParser( 3. Version.LUCENE_43, 4. new String[] {"time", "value", "flag"}, 5. analyzer); 6. QueryParser queryParser = new QueryParser( 7. Version.LUCENE_43, 8. "value", 9. analyzer); 10. TopDocs hits = searcher.search(queryParser.parse("VALUE:(+eurusd)"), 50); 11. System.out.println(hits.totalHits); 12. for(ScoreDoc scoreDoc : hits.scoreDocs) { 13. Document doc = searcher.doc(scoreDoc.doc); 14. System.out.println(doc.toString()); 15. } 9. Solr Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required. Apache Software Foundation Stable release 4.3.0 / May 6, 2013 Development status Active 10. SolrJ SolrJ is a java client to access Solr. It offers a java interface to add, update, and query the solr index. Last version: 1.4.X 11. SolrJ (esempio) 1. SolrServer server = new HttpSolrServer("http://localhost:8983/solr/"); 2. server.deleteByQuery( "*:*" );// CAUTION: deletes everything! 3. SolrInputDocument doc1 = new SolrInputDocument(); 4. doc1.addField( "id", 23425); 5. doc1.addField( "name", "doc1"); 6. doc1.addField( "price", 100980 ); 7. SolrInputDocument doc2 = new SolrInputDocument(); 8. doc2.addField( "id", 63432); 9. doc2.addField( "name", "doc2"); 10.doc2.addField( "price", 205345 ); 11.Collection docs = new ArrayList(); 12.docs.add(doc1); 13.docs.add(doc2); 14. server.add(docs); 15. server.commit(); 16. SolrQuery query = new SolrQuery(); 17. query.setQuery("+name:*c1 +price:100980"); 18. QueryResponse rsp = server.query(query); 12. SolrJ (esempio) 1. SolrDocumentList docsr = rsp.getResults(); 2. for(SolrDocument document : docsr){ 3. Object formName = document.getFieldValue("id"); 4. System.out.println(formName); 5. } 6. List products = rsp.getBeans(Product.class); 7. for(Product product : products){ 8. Object empName = product.getId(); 9. System.out.println(empName); 10. } 13. SolrJ (Product class) 1. public class Product { 2. private String id; 3. public String getId() { 4. return id; 5. } 6. @Field("id") 7. public void setId(String id) { 8. this.id = id; 9. } the same for price and name attributes. 10.} 14. SolrJ (file indexing) 1. public static void indexPdfWithSolrJ(String fileName, String solrId) throws Exception { 2. String urlString = "http://localhost:8983/solr"; 3. SolrServer solr = new HttpSolrServer(urlString); 4. ContentStreamUpdateRequest up = new longnameclass("/update/extract"); 5. up.addFile(new File(fileName),"application/pdf"); 6. up.setParam("literal.id",solrId); 7. up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 8. solr.request(up); 9. QueryResponse rsp = solr.query(new SolrQuery("*:*")); 10. System.out.println(rsp); 11. } 15. references Lucene & Solr http://lucene.apache.org/solr/ SolrJ http://wiki.apache.org/solr/Solrj Tika http://tika.apache.org/