1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

67
1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009

description

3 Session Details: Advanced Topics Lesson Syllabus Lesson 1: Restrictions and Resolutions Lesson 2: Iterators Lesson 3: Combinatorial Queries

Transcript of 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

Page 1: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

1

LexEVS 5.0 Advanced

TopicsAdvanced Topics:

Query Optimization

LexEVS Boot CampNovember, 2009

Page 2: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

2

Session Details: Advanced TopicsCourse Objectives

• When you complete this course you will be able to:• Understand ways to optimize searching and processing

results using Query Optimization in LexEVS 5.0• Restrictions & Resolution• Iterator Handling• Combinatorial Queries

Page 3: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

3

Session Details: Advanced TopicsLesson Syllabus

• Lesson 1: Restrictions and Resolutions• Lesson 2: Iterators• Lesson 3: Combinatorial Queries

Page 4: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

4

Lesson 1: Restrictions and Resolutions

• When you complete this lesson you will be able to:• Filter a coded node set based on the meaning of concept content

utilizing restrictions• i.e. text matches within various property text fields

• Structure and restrict the results of coded node set operations with resolving methods

Page 5: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

5

Lesson 1: Restrictions and ResolutionsRestriction Overview

• User benefits of a coded concept set reference in the coding scheme:• Provides potential resolution for the entire set of concepts

• Reference returned as:• CodedNodeSet or CodedNodeGraph

• Acts as a container for query modifications which are collected, listed, and sorted to provide the optimum execution order as a single query

• Restrictions on the types of concepts returned is the first of these query modifications

• Importance: Gives a meaningful result to the user

Page 6: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

6

Lesson 1: Restrictions and ResolutionsResolution Overview

Resolution:• After specifying optional restrictions, the nodes in a set or graph can be

resolved as a list of ConceptReference objects which in turn contain references to one or more Concept objects.

• Resolving these objects gives the user an opportunity to structure what is returned in terms of overall volume or number of objects as well as how much is contained in the objects themselves.

• Final restrictions can also be applied during this method call.• Resolving a node set or graph in a particular manner is important because

it can affect performance.

Page 7: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

7

Lesson 1: Restrictions and ResolutionsRestrictions Examples

• Restrictions• Create a basic service object for data retrieval

LexBIGService lbSvc = LexBIGServiceImpl.defaultInstance(); • Create a concept reference list appropriate for this coding scheme and this

concept code (C13432) where the parameters are a String array consisting of a single value and the name of the coding scheme NCI_Thesaurus where this concept resides.

ConceptReferenceList crefs = ConvenienceMethods.createConceptReferenceList( new String[] {“C13432”}, “NCI_Thesaurus”) );

• Initialize a coding scheme version object with the correct version number for NCI_Thesaurus”).

CodingSchemeVersionOrTag csvt = new CodingSchemeVersionOrTag(); csvt.setVersion(“08.09e”);

Page 8: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

8

Lesson 1: Restrictions and ResolutionsRestrictions Examples cont…

• Initialize a CodedNodeSet Object with all concepts in our sample coding scheme “getCodingSchemeConcepts(“NCI_Thesaurus”) , csvt).”

• The final restrictToCodes(crefs) method call restricts the return to the single code in the previously initialized list of one.

CodedNodeSet nodes = lbSvc.getCodingSchemeConcepts(“NCI_Thesaurus”) , csvt). restrictToCodes(crefs);

Page 9: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

9

Lesson 1: Restrictions and ResolutionsResolution Examples

• Resolution (and a little restriction too)• Build a list of references from the current (and already restricted) set and

restrict them further to the single property of “FULL_SYN” and resolve to a single value “1” regardless of what the result set size is. ResolvedConceptReferenceList matches = nodes.resolveToList( null, ConvenienceMethods.createLocalNameList("FULL_SYN"), 1);

Page 10: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

10

Lesson 1: Restrictions and ResolutionsResolution Examples cont…

• Now initialize a ResolvedConceptReference with the result and initialize a Concept object by calling the getReferencedEntry() method. The Concept object is the base information model object and contains properties, presentations and definitions which help define and explain the concept.

• We’ll retrieve a presentation defining the concept with a call to the first element in the presentation list, getting the text and it's accompanying content.

ResolvedConceptReference ref = (ResolvedConceptReference)matches. enumerateResolvedConceptReference().nextElement();

Concept entry = ref.getReferencedEntry(); System.out.println("Matching synonym: " +

entry.getPresentation(0).getValue() );

Page 11: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

11

Lesson 1: Restrictions and ResolutionsRestriction Example 2 Setup

Setup:LexBIGService lbSvc =

LexBIGServiceImpl.defaultInstance(); CodingSchemeVersionOrTag csvt = new

CodingSchemeVersionOrTag(); csvt.setVersion(“08.09e”);

Get a coded node reference:CodedNodeSet nodes =

lbSvc.getCodingSchemeConcepts(“NCI_Thesaurus”, csvt);

Page 12: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

12

Lesson 1: Restrictions and Resolutions Restriction Example 2

Begin a series of restrictions on this node set:nodes.restrictToMatchingDesignations("heart",

SearchDesignationOption.PREFERRED_ONLY, "LuceneQuery", null);

• Matching designations means we will be matching presentation type properties for this concept.

• “heart” is the text we will search on, and the PREFERRED_ONLY option get’s us only the preferred designation for that concept.

• We using a standard LuceneQuery type of search and the “null” value insures we are working in the default language for this scheme.

• We are basically restricting by text value containment and on a property type set with a particular flag set on it.

Page 13: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

13

Lesson 1: Restrictions and Resolutions

Restriction Example 2

Let’s restrict it further:cns.restrictToMatchingProperties(null, new PropertyType[] {PropertyType.DEFINITION}, null, null, null, “heart", "LuceneQuery", null);

• What’s wrong with doing it this way?• This would create an query where the results would be returned where

only those matches occurring in both Presentation and Definition type properties.

• Instead you can get two different references into the coding scheme and make the following call on one of them:cns.union(codednodeset);

• Getting results from both.

Page 14: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

14

Lesson 1: Restrictions and ResolutionsResolution Example 2

Resolve a light weight list as an option:• If we use a resolve to list method that accepts a boolean and a integer

limit for size we can create a list that contains minimal reference to the concept and a list limited in size.

• cns.resolveToList(null, null, null, null, false, -1);

• The false boolean entry can provide a list of concepts with minimal references to the coding scheme. This can allow a resolution of the full concept later.

Page 15: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

15

Lesson 1: Restrictions and ResolutionsResolution Example 2

• If you are confident in your sorting algorithm resolve the list to a limit of fifty concepts:

• First we’ll create a sort option• Then set the name of the sorting extension we are calling• Next set whether it will be sorted ascending or descending• Then add it to a list of sort options.

SortOption so = new SortOption();so.setExtensionName("code");so.setAscending(true);SortOptionList sol = new SortOptionList();sol.addEntry(so);

Page 16: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

16

Lesson 1: Restrictions and ResolutionsResolution Example 2

• Finally, we’ll pass the list of sorted options to the resolve method and resolve it:cns.resolveToList(sol, null, null, null, false, 50);

• Adding a maximum size of 50 further limits the size of the returned list and limits the overhead of the method call.

Page 17: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

17

Lesson 1: Restrictions and ResolutionsRestrict and Resolve a Node Graph

Let’s shift focus to the Coded Node Graph implementation.• Remember how we restricted the coded node graph to a coded node

set?cng.restrictToCodes(codednodeset);

• Remember how this provided the user with a set of starting points for associations connecting the list of coded nodes to other nodes above and below them in the hierarchy?

• And how we could further restrict the node graph by the kind of associations between nodes?cng.restrictToAssociations(Constructors.createNameAndValueList("subClassOf"),null);

Page 18: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

18

Lesson 1: Restrictions and ResolutionsNode Graph Restrictions

Node graph restrictions are more focused on the how associations are expressed in the terminology. So instead of building a set of nodes by determining the content of concept objects node graphs focus on:

• the edges of the graph• where they exist in relation to the nodes you provide as references, • what direction they can be navigated• how the edges are referred to (naming conventions)

Page 19: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

19

Lesson 1: Restrictions and ResolutionsNode Graph Restrictions

For example:

Restrict this graph to associations and any related qualifiers they may havecng.restrictToAssociations(listOfAssociations,

associationQuailifier);Restrict this graph to any directional names and any qualifiers associated

with this association namecng.restrictToDirectionalNames(listOfAssociations,

associationQuailifier)Restrict this graph to a set of codes and their source codes and edges.

cng.restrictToSourceCodes(codednodeset);Restrict this graph to a set of codes and their target codes and edges.

cng.restrictToTargetCodes(codednodeset);

Page 20: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

20

Lesson 1: Restrictions and ResolutionsNode Graph Resolution

• But efficient resolution of graphs requires similar attention to when and whether concepts and associations to other concepts are resolved immediately or later.

cng.resolveAsList(concept_reference, true, false, 1, 1, null, null, null, -1);

• Here we are resolving only the next layer “down” from the “concept_reference” focus code of both coded entries and associations.

• Limits are not placed limit on the number of nodes and associations to be resolved, but we insure we have a layer of node references

• from which to begin our next call into a hierarchy layer. • So what we’ve done is to set ourselves up to step through each layer of the

hierarchy of fully resolved nodes.• This is one technique to limit method call overhead. What would another be?

Page 21: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

21

Lesson 1: Restrictions and ResolutionsReview 1

How would you restrict a node set to a single concept.

Page 22: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

22

Lesson 1: Restrictions and ResolutionsAnswer 1

How would you restrict a node set to a single concept. By creating a restriction by codes with a single unique identifier as a list

element passed in as a parameter. This is an important use of restrictions when you have a reference to the concept and wish to resolve it’s details later.

Page 23: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

23

Lesson 1: Restrictions and ResolutionsReview 2

What kind of user is likely to be interested in how restrictions are applied to the coded node set or graph?

Page 24: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

24

Lesson 1: Restrictions and ResolutionsAnswer 2

What kind of user is likely to be interested in how restrictions are applied to the coded node set or graph? The end user will want to get some meaningful results

Page 25: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

25

Lesson 1: Restrictions and ResolutionsReview 3

What kind of user is likely to be interested in how resolutions are applied to the coded node set or graph?

Page 26: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

26

Lesson 1: Restrictions and ResolutionsAnswer 3

What kind of user is likely to be interested in how resolutions are applied to the coded node set or graph? The developer will want to insure performance.

Page 27: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

27

Lesson 1: Restrictions and ResolutionsReview 4

You’d like to return a lightweight list to keep your method call overhead low. What will you do to insure this?

Page 28: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

28

Lesson 1: Restrictions and ResolutionsAnswer 4

You’d like to return a lightweight list to keep your method call overhead low. What will you do to insure this? You’ll choose the resolveToList() method that accepts a boolean flag

indicating you want no coded entities resolved.

Page 29: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

29

Lesson 1: Restrictions and ResolutionsReview 5

Describe two ways a developer can resolve a coded node graph that won’t cause a huge object to be returned, yet still allow the end user to traverse an entire graph structure.

Page 30: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

30

Lesson 1: Restrictions and ResolutionsAnswer 5

Describe two ways a developer can resolve a coded node graph that won’t cause a huge object to be returned, yet still allow the end user to traverse an entire graph structure. A developer can resolve the graph to one level of completely resolved

nodes and associations and step through the hierarchy one level at a time. Or the developer can resolve the entire graph and leave the coded entities

unresolved – much as you might do with a coded node set.

Page 31: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

31

Lesson 2: Iterators

• When you complete this lesson you will be able to:• Write better performing code when resolving to lists or iterators.• Return a list, a single concept reference or advance the iterator using the

appropriate method• Understand how to a resolve an iterator with lightweight objects.

Page 32: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

32

Lesson 2: IteratorsIterators in LexEVS

Iterators in LexEVS• Additional method for coded node sets

• Helpful for resolving larger sets of nodes• Still insures lower resource overhead

• Advantage of resolving iterators from coded nodesets:• Obtain resolution without any calls to the database• Capable of referencing a local Lucene Index insteadResolvedConceptReferencesIterator rcri = cns.resolve(sol,

null, null, null, false);• Allows the user to employ sort options, filter options and a final restriction

option for restricting to property types and property names just like restricting to a list

• Included is the option to return any resolved concept references without resolving the coded entry using the boolean value “false”

Page 33: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

33

Lesson 2: IteratorsIterators in LexEVS

Iterators in LexEVS• Use a number of options for retrieving concept references, scrolling the

iterator and returning concept reference lists.

• Get the next ResolvedConceptReferencercri.next();

• Get a ResolvedConceptReferenceList of the specified size.rcri.next(size);

• Get a ResolvedConceptReferenceList from the iterator based on indexed start and end points.rcri.get(arg0, arg1);

• Scroll the iterator returning another iterator:rcri.scroll(scrolled_size);

• Get a ResolvedConceptReferenceList from the last scroll of the iterator.rcri.getNext();

Page 34: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

34

Lesson 2: IteratorsReview 1

What iterator method returns another, potentially smaller iterator?

Page 35: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

35

Lesson 2: IteratorsAnswer 1

What iterator method returns another, potentially smaller iterator? The scroll method returns another iterator

Page 36: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

36

Lesson 2: IteratorsReview 2

You can get a single concept reference from the iterator – what’s the method for that?

Page 37: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

37

Lesson 2: Iterators Answer 2

You can get a single concept reference from the iterator – what’s the method for that? next()

Page 38: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

38

Lesson 2: IteratorsReview 3

How is resolving an iterator potentially a better performing method than resolving to a list?

Page 39: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

39

Lesson 2: Iterators Answer 3

How is resolving an iterator potentially a better performing method than resolving to a list? The iterator resolves against the Lucene index – a much faster call.

Page 40: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

40

Lesson 2: IteratorsReview 4

Describe how you would resolve a list of concept references from an iterator and keep that list relatively lightweight.

Page 41: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

41

Lesson 2: Iterators Answer 4

Describe how you would resolve a list of concept references from an iterator and keep that list relatively lightweight. First use the resolve method that accepts a boolean flag indicating

whether to resolve coded entities in concept reference lists Then resolve to a concept reference list using the resolve(size) method.

Page 42: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

42

Lesson 3: Combinatorial Queries

• When you complete this lesson you will be able to:• Provide a useful result set to the end user by using a combination of fields

applied as parameters• Expand your understanding of the SortOption type filter criteria.• Choose from a variety of text matching algorithms• Discuss how the Lucene Query text matching algorithm, in particular, can

be leveraged by LexEVS.

Page 43: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

43

• Combinatorial Queries: putting it all together• One of the most powerful features of the LexEVS architecture is the ability

to define multiple search and sort criteria without intermediate retrieval of data from the LexEVS service.

• The following example shows a simple, yet powerful, query to search a code system based on a ‘sounds like’match algorithm

• (the list of all available match algorithms can be listed using the ‘ListExtensions –m’ admin script.)

Lesson 3: Combinatorial QueriesOverview of Combinatorial Queries

Page 44: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

44

Lesson 3: Combinatorial QueriesCombinatorial Queries Example cont…

To be exact this is a double restriction query with an additional application of sort criteria and restricted return values

• Declare the service... LexBIGService lbs = LexBIGServiceImpl.defaultInstance();

• Start with an unconstrained set of all codes for the vocabulary CodingSchemeVersionOrTag csvt = new CodingSchemeVersionOrTag(); csvt.setVersion(“08.09e”);

CodedNodeSet cns = lbs.getCodingSchemeConcepts(“NCI_Thesaurus”, csvt);

Page 45: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

45

Lesson 3: Combinatorial QueriesCombinatorial Queries Example cont…

• Constrain to concepts with designations (assigned text presentations that contain text that sounds like 'heart ventricle' cns.restrictToMatchingDesignations(

"hart ventrickle", SearchDesignationOption.ALL,

MatchAlgorithms.DoubleMetaphoneLuceneQuery.toString(), null);

• Further restrict the results to concepts with a semantic type of 'Anatomical Structure' cns.restrictToMatchingProperties(

Constructors.createLocalNameList("Semantic_Type"), "Anatomical Structure", "exactMatch", null);

Page 46: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

46

Lesson 3: Combinatorial QueriesCombinatorial Queries Example cont…

• Indicate that the resulting list should be sorted with the best results first and then sorted by code if there is a tie. SortOptionList sortCriteria = Constructors.createSortOptionList(

new String[] {"matchToQuery", "code"});

• Indicate to return only the assigned UMLS_CUI and textualPresentation properties. LocalNameList restrictTo = ConvenienceMethods.createLocalNameList(

new String[] {"UMLS_CUI", "textualPresentation"} );

• Still nothing computed yet.

Page 47: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

47

Lesson 3: Combinatorial QueriesCombinatorial Queries Example cont…

• Perform the query and resolve the sorted/filtered list with a maximum of 6 items returned. ResolvedConceptReferenceList list = cns.resolveToList(

sortCriteria, restrictTo, null, 6);

• Print the results ResolvedConceptReference[] rcr = list.getResolvedConceptReference(); for (ResolvedConceptReference rc : rcr) { System.out.println("Resolved Concept: +" +rc.getConceptCode());

}

Page 48: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

48

Lesson 3: Combinatorial QueriesCode Sample Review

Declare the target concept space • The coded node set (variable ‘cns’) is initially declared to query the NCI

Thesaurus vocabulary. • At this point the concept space included by the set can be thought of as

unrestricted, addressing every defined coded entry (the ‘false’ value on the declaration indicates to also include inactive concepts).

• However, it important to note that no search is performed by the LexEVS service at this time.

Page 49: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

49

Lesson 3: Combinatorial QueriesApplying Filter Criteria

• Applying filter criteria • No computation is performed (to realize query results) during invocation of

the restrictToMatchingDesignations() and restrictToMatchingProperties() methods.

• These calls effectively narrow the target space even further, indicating that filters should be applied to the information returned by the LexEVS query service.

Page 50: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

50

Lesson 3: Combinatorial QueriesUsing Lucene Query Syntax

Using the Lucene Query Syntax and other text matching functions • Text Matches:

• The text criteria applied in methods such as restrictToMatchingDesignations() uses one of a number of powerful text processing applications to provide the user with broad capability for text based searches.

• Text matches can be simple applications of “exactMatch”, “startsWith” or “contains” algorithms

• Regular expressions• Lucene Query syntax (used in the LuceneQuery function.)• As shown in the preceding slides, these options are passed into the

restrictToMatchingDesignations() method as parameters.

Page 51: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

51

Lesson 3: Combinatorial QueriesLucene Query

• Lucene Queries are well documented and can be very powerful• Uninitiated users may need some background on their use• User should start here with the official

Lucene Query Parser documentation• Keep in mind: Some LexEVS queries such as "startsWith" and

"contains" use wild card searches under the covers• Use of wild cards in this context can cause errors in searches involving

these search types• Wild card queries should use the flexibility of the Lucene Query

searches in restrictToMatchingDesignation() instead• Work much as described in the query syntax documentation.

Page 52: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

52

Lesson 3: Combinatorial QueriesSpecial Characters- Lucene Query

• Special characters in the Lucene Query search can cause unexpected results. For example:• If you are not using special characters as recommended for various

Lucene search mechanisms then your searches may not return expected results or may return an error

• Example: If the value you are searching upon contains say, parenthesis, you will need to place the value in quotations.

• The escape characters described in the Lucene Documentation do not work at this time

Page 53: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

5353

• Contains Method- should be used to narrow down search results once a progressively longer substring more closely matching the term of interest is entered

• Results would not be expected if a Lucene Query is used.

Lesson 3: Combinatorial QueriesNarrowing Searches- Lucene Query

Page 54: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

54

Lucene Query• Narrowing Searches:

• You should not expect to see a Lucene Query narrow down search results as you progressively enter a longer substring more closely matching your term of interest.

• Instead use the contains method.

Lesson 3: Combinatorial QueriesLucene Query

Page 55: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

55

Lesson 3: Combinatorial QueriesReview 1

You are not sure of the spelling for a term. What kind of match algorithm would help?

Page 56: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

56

Lesson 3: Combinatorial QueriesAnswer 1

You are not sure of the spelling for a term. What kind of match algorithm would help? The DoubleMetaphone algorithm

Page 57: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

57

Lesson 3: Combinatorial QueriesReview 2

You know the terminology well enough to understand the text you want to search in exists in a particular property. What kind of restriction method will you use?

Page 58: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

58

Lesson 3: Combinatorial Queries Answer 2

You know the terminology well enough to understand the text you want to search in exists in a particular property. What kind of restriction method will you use? restrictToMatchingProperties().

Page 59: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

59

Lesson 3: Combinatorial QueriesReview 3

You wish to sort a list of returned values, first by the unique identifying code next by the match in the text query. How will you do this?

Page 60: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

60

Lesson 3: Combinatorial Queries Answer 3

You wish to sort a list of returned values, first by the unique identifying code next by the match in the text query. How will you do this? Create a sort option list with “matchtoQuery” as the first string and “code”

as the second and pass it to the resolve method

Page 61: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

61

Lesson 3: Combinatorial QueriesReview 4

You need a very flexible text query with complete flexibility in design. What matching algorithm will you choose?

Page 62: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

62

Lesson 3: Combinatorial Queries Answer 4

You need a very flexible text query with complete flexibility in design. What matching algorithm will you choose? Regular expressions

Page 63: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

63

Lesson 3: Combinatorial QueriesReview 5

What kinds of adjustments can you make to the Lucene Query?

Page 64: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

64

Lesson 3: Combinatorial Queries Answer 5

What kinds of adjustments can you make to the Lucene Query? Boolean queries, Levenshtein Distance, wild card searches.

Page 65: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

65

Lesson 3: Combinatorial QueriesReview 6

What kinds of characters should you avoid searching on in the LuceneQuery?

Page 66: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

66

Lesson 3: Combinatorial Queries Answer 6

What kinds of characters should you avoid searching on in the LuceneQuery? Special characters such as ~, (), * since these can have a special meaning

or just be stripped when normalized by Lucene

Page 67: 1 LexEVS 5.0 Advanced Topics Advanced Topics: Query Optimization LexEVS Boot Camp November, 2009.

67

Session Details: Query OptimizationQuestions?