B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal...
Transcript of B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal...
![Page 1: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/1.jpg)
1
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
IBM IT Training Services
IBM WebSphere Portal and Lotus Workplace technical symposiumSession Number: B0F2Session Title: Text Search and Portal IntegrationSpeaker's e-mail: [email protected]
Aya Soffer, Manager, Search Technologies Dept.
![Page 2: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/2.jpg)
2
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
AgendaWebsphere Portal Search Engine (PSE)
Overview and Architecture
Main functions
Usage Examples and Planning Guidelines
Common Components: Lotus Workplace Search DemoInformation ResourcesQ & A
![Page 3: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/3.jpg)
3
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
What is the Portal Search Engine? (PSE)
High level functional overview
Administrator: indexing / collecting content/documentso HTTP crawler o Indexer componento Text analysis functions (taxonomy, categorizer, language tools,
summarizer)o Simple workflow to control what and how gets indexed
End-user: searcho web-style searcho high precision relevance rankingo browse through the collection
![Page 4: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/4.jpg)
4
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
General informationOriginally developed by IBM Research in Israel
Proven technology base with emphasis on search quality
Backed by the joint Research and Software group program – Institute for Search and Text Analysis
Fulltext search technology100% pure Java implementation
Suitable for server as well as client environments
Emphasis on highly accurate results - constantly benchmarking and evaluating via official forums such as TREC and INEX
internal interfaces allow for convenient integration in IBM products and solutions
o Rich set of APIs suitable for simple and complex implementationso Easy to customize and extend - Adapt ranking formulas, extend built-in
methods, add new document typesIBM strategic component
![Page 5: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/5.jpg)
5
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Portal Search Engine – where used ....
Portal Search Engine portlet application:
Administer multiple indexes (collections), where each may include multiple sitesEnd-user search portlet for both handling search requests and browsing through the documents in the collection
Integrated with Portal Document Manager (PDM)
Integrated with Lotus Workplace 1.1
Integrated with WebSphere Portal Content Publisher (WPCP)
![Page 6: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/6.jpg)
6
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
New key features with Websphere Portal Version 5
Taxonomies and categorizationA taxonomy is a hierarchical representation of a set of categoriesIt includes rules per category that are applied to a document through a categorizerTwo types of taxonomies available
o A pre-defined taxonomy allowing for simple manipulation (like renaming of categories and definition of new categories)
o A rules based taxonomy which can be built and defined by the userCategorization – process of assigning a document to category(-ies)
Summarizationthe top ‘3’ key sentences are extracted“the first ‘250’ characters of text” used for CJK and BiDi type languages
Document filtersSupports >250 document formatsTechnology wrapped into the ‘document conversion services’ (DCS) which add support for additional document formats
![Page 7: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/7.jpg)
7
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Conceptual overview – Index build process
Metadata injectedinto original content
Approved set ofContent “In-basket”
1 2
ContentCrawlerFilter
Text analysisComponents:
•Categorizer•Summarization•Document filters
ApprovalWorkflow Indexer
Collection
![Page 8: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/8.jpg)
8
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Creating a new document collection is easy
Create a new collection
Specify a web site to collect information/content from
Click on ‘Start collecting’ icon/text to initiate the index build process
Processing status and status of the index are shown at the bottom of the portlet, for:
the selected site
the selected collection (index)
Select the ‘Manage search collections’ portlet
![Page 9: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/9.jpg)
9
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
A look at the Manage Collections portlet
![Page 10: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/10.jpg)
10
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Manage Collections Portlet – Options and Status
Select ‘Portal Settings’ Manage Search Index
![Page 11: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/11.jpg)
11
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
End user – Search portlet – detailed view
![Page 12: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/12.jpg)
12
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced options
Portlet for defining a new collection
![Page 13: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/13.jpg)
13
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced optionsPortlet for defining a new site
![Page 14: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/14.jpg)
14
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced optionsPortlet for defining a schedule for periodic indexing of a site
![Page 15: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/15.jpg)
15
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced optionsPortlet for defining filters for sites
![Page 16: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/16.jpg)
16
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced options
Portlet for defining destination categories for the site
![Page 17: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/17.jpg)
17
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced options‘Browse document’ portlet
![Page 18: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/18.jpg)
18
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced options‘Search’ portlet
![Page 19: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/19.jpg)
19
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Administration – advanced options‘Advanced search’ portlet
![Page 20: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/20.jpg)
20
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Usage example
Goal: provide a community of users with information about competitors in the market
How: catalog information such as news articles and related information from external websites
Additional steps to take:
When creating a collection, select “User-defined” from the taxonomy pull-down
From the main administration portlet choose “Category tree” in the Manage Collections frame
![Page 21: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/21.jpg)
21
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Category Tree portlet• Build the taxonomy tree• then go to ‘Manage Rules’ to define rules for each category
![Page 22: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/22.jpg)
22
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
What the rule set looks like .....
• a ‘rule’ is essentially a search query one would use to find such specific documents• you can use ‘+’ and ‘-’ and ‘ “ ‘ and ‘*’ within the rule
![Page 23: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/23.jpg)
23
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Last step – assign categories to each website
![Page 24: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/24.jpg)
24
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Result: search and browse
![Page 25: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/25.jpg)
25
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Planning numbers, performance
Index size informationapproximately 40% to 60% of the textual content size of indexed documents/pages
indexing throughputcrawling/indexing rate between 100 to 200 documents per minute
Search responsivenesstypically a search result page is completed and ready for transmission in less than 0.5 seconds
![Page 26: B0F2 Text Search and Portal Integration - IBM Research | IBM · Lotus Software –WebSphere Portal B0F2 –Text Search and Portal Integration ©2004 IBM Corporation IBM IT Training](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a857738d99d667731aaa7/html5/thumbnails/26.jpg)
26
Lotus Software – WebSphere Portal
B0F2 – Text Search and Portal Integration © 2004 IBM Corporation
Additional Information and Resources
IBM Resources:Websphere Portal - http://www-3.ibm.com/software/genservers/portal/
Websphere Portal Catalog: http://www-3.ibm.com/software/genservers/portal/portlet/catalog
Websphere Portal Developer’s Zonehttp://www-106.ibm.com/developerworks/websphere/zones/portal/
WebSphere Portal Toolkit -http://www-3.ibm.com/software/info1/websphere/index.jsp?tab=products/portaltoolkit
Documentation - http://www-3.ibm.com/software/genservers/portal/library/
Education - http://www-3.ibm.com/software/genservers/portal/education/
WebSphere Commerce Portal - http://www-3.ibm.com/software/genservers/commerce/portal/
IBM Lotus Workplacehttp://www.lotus.com/engine/jumpages.nsf/wdocs/ondemand