WebSphere Information Integrator Content Edition and … Technical Overview_For... ·...
Transcript of WebSphere Information Integrator Content Edition and … Technical Overview_For... ·...
®
IBM Software Group
© 2004 IBM Corporation
WebSphere Information Integrator Content Edition and OmniFINDTechnical Overview
IBM Software Group | WebSphere software
2
WebSphere Information Integrator Content Edition
The Problem and the SolutionIntegration and ConnectorsFederation ServicesDeveloper and End-user ServicesPortfolio IntegrationsQuestions
IBM Software Group | WebSphere software
3
The Problem: Multiple Silos of Content
36%
14%
25%
17%
1 repository5%
2-5 repositories
6-10 repositories10-15 repositories4%
More than 15 repositories
Don't know “The Future of Content in the Enterprise,”Connie Moore and Robert Markham
Base: 81 North American decision-makers(multiple responses accepted)
IBM Software Group | WebSphere software
4
Multiple Content Sources Complicate Enterprise Initiatives
CUSTOMERSERVICE MARKETING CUSTOMERS
& PARTNERS LEGAL HR R&DFINANCE SALES & SUPPORT
Imaging/Document
Mgmt
ReportMgmt
Web Content/Media Asset
Mgmt
Database/CustomSystems
NetworkFile
Systems
Workflow/Business
Process Mgmt
SELF-SERVICE COMPLIANCECALL CENTER CRM / ERP WEBSITES
IBM Software Group | WebSphere software
5
The Solution: II Content Edition
CUSTOMERSERVICE MARKETING CUSTOMERS
& PARTNERS LEGAL HR R&DFINANCE SALES & SUPPORT
Imaging/Document
Mgmt
ReportMgmt
Web Content/Media Asset
Mgmt
Database/CustomSystems
NetworkFile
Systems
Workflow/Business
Process Mgmt
SELF-SERVICE COMPLIANCE WEBSITESCALL CENTER CRM / ERP
Information IntegratorContent Edition
IBM Software Group | WebSphere software
6
The Solution: II Content Edition (formerly known as VeniceBridge)
IBM Software Group | WebSphere software
7
II Content Edition FunctionalityReal-time, actionable search
Federated search IICE-enabled indexed search
“Portal” applicationSingle access point for all enterprise assetsCustomer or partner access to relevant content for self-service
Virtual Records managementManage retention of all enterprise content
Enterprise Information Integration (EII)Complete view of all information about a specific topic object, project or process
Production workflowAccess all the required content to enable people to perform actions in a business process
IBM Software Group | WebSphere software
8
The Tao of II Content EditionEvery user is known to the repository
Repositories handle their own authentication and authorizationThis is sometimes supplemented, but NEVER bypassed
Every II CE user has named sessions with the repositories being accessed
Repositories can only do what they can doII CE doesn’t generally compensate for missing features in repositoriesII CE offers a superset of functionalityRepository profiles describe the capabilities of individual repositories
Users know of only abstract repositoriesThere is never an API class or method specific to a repositoryUsers do know that multiple repositories are being dealt with
Repositories are abstracted, not virtualized (only exception: Virtual Repositories)
Content and metadata is never stored by II CEII CE is not a content management system, it doesn’t have a repositoryUsing II CE doesn’t require replicating your content or meta-data
IBM Software Group | WebSphere software
9
WebComponents
Web Svcs API(SOAP) Java API URL
Addressability
FederatedSearch
Virtual Repositories
MetadataMapping
View Services Authentication/Security
Subscription EventServices
Subscriptions
Admin Tools
Connecter Service Provider Interface (SPI)
Connector RMI Proxy Web ServicesProxy
Web Client Enterprise Applications
Custom Applications
DataSource
WebSphere Application Server
Connector Connector
DataSource
DataSource
Developerand End
UserServices
FederationServices
IntegrationServices
Access Services
Session Pools
IBM Software Group | WebSphere software
10
Integration Services – Unified Content Model
Content FunctionsCRUD
Meta dataNative content
Content Classes (meta meta data)Checkin / checkoutVersioning, Version HistorySecurity
CRUD
RenditionsAnnotationsCompound Documents
Folder Hierarchy FunctionsCRUD
Content Classes
Folder Contents
Folders Filed In
Security
IBM Software Group | WebSphere software
11
Integration Services – Unified Workflow ModelWork Item Functions
CRUDWork itemsAttachments
Work Item Classes (meta meta data)Get HistoryCompleteLock/UnlockSuspendResume/ReassignAd-hoc Route
Workflow FunctionsQueue Enumeration
Groups for a Queue
Users for a Queue
Queue Contents
SecurityCRUD
IBM Software Group | WebSphere software
12
Integration Services - ConnectorsConnectors do the work to support the Integration Services content and workflow modelsOut-of-the-box connectors to many sources (~20)Numerous options for distributing connectors
Remote EJBsRMI ConnectorWeb Services Connector
Session PoolingConfigurable stand-by pool of repository sessions improves performance and scalabilitySupports named and generic users
IBM Software Group | WebSphere software
13
Integration Services - ConnectorsConnector for Documentum
Documentum 4i and workflow Documentum 5 and workflow
Connector for IBM
• DB2 Content Manager 8• DB2 CM OnDemand • WebSphere MQ Workflow• WebSphere Portal Doc. Mgr.• Lotus Notes• Lotus Domino.Doc
Connector for FileNet
•FileNet Image Services and workflow•FileNet Content Services•FileNet Image Services Resource Adapter (ISRA) •FileNet P8 Content Manager•FileNet P8 BPM
Connector for Microsoft
Microsoft Index Server/NTFSMicrosoft SharePoint Services
Connector for OpenText
OpenText Livelink
Connector for Stellent
Stellent Content Server
Connector for Interwoven
Interwoven TeamSite
Connector for Hummingbird
Hummingbird Enterprise 2004 DM 5.1
Chargeable part numbers
RDBMS
DB2 UDBOracleOthers through II Federation
Others
Microsoft NTFSFile system (sample)
IBM Software Group | WebSphere software
14
Integration Services - Connectors
Connector SDKSame toolkit used to build all of our connectors Design is complete (mapping is what’s required!)~25 content Methods, ~15 workflow methodsRepository profile to simplify developmentSimple Java classes
Robust J2EE benefitsImmediate leverage by platformNo platform bleed into connector
100% forward compatibility Excellent docs / examples
IBM Software Group | WebSphere software
15
WebComponents
Web Svcs API(SOAP) Java API URL
Addressability
FederatedSearch
Virtual Repositories
MetadataMapping
View Services Authentication/Security
Subscription EventServices
Subscriptions
Admin Tools
Connecter Service Provider Interface (SPI)
Connector RMI Proxy Web ServicesProxy
Web Client Enterprise Applications
Custom Applications
DataSource
WebSphere Application Server
Connector Connector
DataSource
DataSource
Developerand End
UserServices
FederationServices
IntegrationServices
Access Services
Session Pools
IBM Software Group | WebSphere software
16
Federation ServicesFederated Search
Cross-repository, data mapped searchesMeta data and full text searchesParallel search, single unified result set“Actionable” results
Virtual RepositoriesCreate virtual repositories of all the content and work items related to a specific project, process, business object or topicWorks at many different levels of granularityVirtual Repositories can contain links to:
Content, work items, folders, queues, virtual folders, smart folders, URLs, custom objects
Supplemental meta data and security
IBM Software Group | WebSphere software
17
Federation Services
Metadata mapping (data maps)Discovery of content and workflow classes (schemas)Map disparate indexing schemes across repositories
Effective for search and CRUD
Named data maps for specific applications or uses
Authentication and Single sign-onAuthenticate once, access all repositoriesStill a specific named userMultiple directory services supported
LDAPActive DirectoryEmbedded
IBM Software Group | WebSphere software
18
Federation Services
View ServicesServer-side conversion
Electronic document to HTMLImage conversion and processing
Client-side Java viewer component (JavaBean)Image viewing and processingAnnotationsPrintingSigned Java applet
IBM Software Group | WebSphere software
19
Federation ServicesSubscription Event Services
Automated change notifications on content, searches and workflowItems are subscribed to, then monitored for changeEvents can be handled by custom handlers (fax, e-mail, workflow, synchronization)API, FrameworkPossible use cases:
Content Integration: Services to facilitate synchronization between repositoriesPortal: Portlet interfaces to provide subscription notification of changes to specific documents or work itemsCollaboration: When a final contract is received, e-mail notifications should be sent to stake holdersPublishing: Notifications to agents, brokers, and end-users on policy addendums via multi-channel strategy (e-mail, workflow, fax, etc)
IBM Software Group | WebSphere software
20
WebComponents
Web Svcs API(SOAP) Java API URL
Addressability
FederatedSearch
Virtual Repositories
MetadataMapping
View Services Authentication/Security
Subscription EventServices
Subscriptions
Admin Tools
Connecter Service Provider Interface (SPI)
Connector RMI Proxy Web ServicesProxy
Web Client Enterprise Applications
Custom Applications
DataSource
WebSphere Application Server
Connector Connector
DataSource
DataSource
Developerand End
UserServices
FederationServices
IntegrationServices
Access Services
Session Pools
IBM Software Group | WebSphere software
21
Developer & End-user Services
Applications can work with IICE at any of these levels:Programmatic (API)
Loosely coupled (Web Services)
UI only (Web Components)
By links (URL Addressability)
IICE web client requires no development effortIt can be set up quickly, to work with any number of repositories
It can be customized in different ways for different users, if desired
IBM Software Group | WebSphere software
22
Developer Services – Integration Options
Java APIRich object-oriented content management APIAbstracts all J2EE and system architecture complexityMany good examples includedFinally, developers can write generic, platform-independent ECM applications!
Web ServicesAll the key integration capabilities also available through SOAPSupports both Java and .NET clients
IBM Software Group | WebSphere software
23
Developer Services – Options (continued)
Web componentsSet of 20+ rich web UI componentsBuilding blocks for customizing applications or custom applicationsJ2EE-based - MVC, Struts, JSP, XSLT, JSR 168Various components cooperate in coordinated component groups using a shared event modelCreate new custom components
URL addressabilityCreate very loosely coupled applications using II Content EditionUse in Web applications, send in e-mail
IBM Software Group | WebSphere software
25
SecurityAuthentication and authorization are controlled by the underlying data sources; their security model is respected at all times
Sessions are created for each “user” with each data source by the data source’s authentication mechanism
Authorization of access to the data source is then controlled for that session by the data source
Having a single user for all users of a data source is discouraged
IICE provides supplemental security services:Single sign-on system
Supplemental authorization system
Identity-aware session pooling
IBM Software Group | WebSphere software
26
Administration and DeploymentAdministration
Centralized configuration and loggingRemote graphical and web-based administrationDynamic configuration (never take WebSphere down!)JMX-based administration of subscription event services
System Architecture and deployment optionsScale from one to dozens of serversLeverage J2EE application servers
Load balancingFault tolerance
Network protocol optionsEJB-to-EJBRMISOAP
IBM Software Group | WebSphere software
27
Repository 2 APIRepository 1 API
J2EEApplicationServer
Servlet Container
IICEArchitecture
- Customer Code
- 3rd Party / J2EE Application Server
- II Content Edition Platform
- Source Licensed to Customer
TemplateProcessor
(JSP/XSLT)II CE Services
Web Application
Viewer Applet
Java APIJava API
- 3rd Party Repository
DB2 II CEViewer Servlet
Web Application
AdministratorTool
EJB ContainerAccess ServicesView Services Server Result Set
Logging Config
Repository 3 API
RMI ConnectorProxy
Connector 3SOAP ConnectorProxy
Repository 1
Apache SOAP
Connector 1
Repository 2
RMI ConnectorProxy Server
Connector 2
Repository 3
Repository 3 API
ApacheSOAP
Application Java API
ApplicationWSDL
IBM Software Group | WebSphere software
28
Scaling Model – Single server or multi-server
Repository
II Content Ed. API
Application
DB2 II Content Ed. API
Application
II Content Ed. API
Application
II Content Ed.II Content Ed.
• J2EE EJB clustering• J2EE Servlet clustering• RMI Connector pooling• Web Services load
balancing
• J2EE EJB Clustering
Connector Connector
Repository
Connector
…
…
Multi Server
Repository
Access Services
Connector
II Content Ed. API
Application
Connector
Repository
Single Server
IBM Software Group | WebSphere software
29
II Content Edition and the II PortfolioInformation Integrator OmniFind Edition crawler
Included with OmniFind
Index and search enterprise content in the following repositories:FileNetDocumentumHummingbird
Information Integrator Content Edition wrapperWrapper included with IICE
Access unstructured content from WebSphere II federated server
Wrapper based on DB2 II 8.2’s Java wrapper SDK
RDBMS connector for IICERead-only access to database tables (columns appear as attributes)
DB2 UDB V8.2 or higher, Oracle 10g, and others through II federation
IBM Software Group | WebSphere software
30
II Content Edition and WebSphere
WebSphere Application ServerII Content Edition is a native J2EE application hosted entirely in WebSphereLeverages WebSphere for fault tolerance and load balancing (clustering)
WebSphere PortalJSR 168-based integration between II Content Edition web components and WebSphere PortalII Content Edition web client can be hosted in WebSphere or WebSphere Portal
Portal Environments supported by IICEBEA WebLogic, IBM WebSphere and Microsoft Sharepoint
IBM Software Group | WebSphere software
31
What about JSR-170?What is it?
A standard for Java access to content repositories Not yet widely supported, but that may change
JSR-170 is an incomplete standardImportant ECM functionality not covered (example- no support for workflow)IBM is part of the group working to address this in a future versionOnly covers access to a single repository, no federation capabilitiesOriginally intended only as a standard for accessing content within web sites
JSR-170 and IICECurrently no support for JSR-170 in IICE (or any other IBM CM product)However, since JSR-170 is just another type of repository, an IICE connector could be easily written for it, if it ever becomes popular
IBM Software Group
33
How to Differentiate ECI and Search Opportunities
Enterprise Search Enterprise Content Integration
One-way access Bi-directional access
Retrieval, display, full CRUD, conversion, browse, foldering, workflow, etc.Deeper access to content; generally part of a production app/workflowAccesses native content in real time
Focused on full unification of systems
Sometimes also about migrating content
Retrieval and display
Casual access -- generally part of a knowledge management strategyIndexes underlying content
Focused on speed/quality of results
Always about leaving content in place
IBM Software Group
34
Why Enterprise Search Matters
Business users need to find information quickly and easilyHigh quality, end user searchGreat diversity in end user search requirements
IT managers need a framework to integrate unstructured informationEasy to install, configure, and manageEnterprise scaleTotal Cost of Ownership – purchase cost, administration cost
Enterprise application developerIntegrate into existing portals and applicationsDevelop with existing tools using existing skills
Information Management
35
UIMA: A new standard for content processing and text analysis
Defines a common interface for integrating text analysis modulesEnables interoperability of different analytics solutions and enterprise applications
Provides an SDK for building and composing text analyticsEnables development of new and re-use of existing components for analysis
Iden
tify
Lang
uage
Find
Wor
ds &
Roo
ts
Cat
egor
izat
ion
Nam
ed-e
ntity
ext
ract
ion
Iden
tify
Rel
atio
nshi
ps
ExtractedMetadataand Facts
TextDatabase
Search Index
ApplicationsText Analysis Modules – aka “Annotators”
Identify Relevant Entities → Build StructurePeople, Places, Organizations, RelationshipsParts, Problems, Conditions Topics, Products, Interests, SentimentTimes, Events, Threats, Plots, Associations
Information Management
36
UIMA Component ArchitectureKey Concepts
Common Analysis Structure (“CAS”) enables pluggable AnnotatorsAnnotators can leverage CAS to build on each otherDifferent “Annotators” are relevant for different collectionsAnalysis results can be sent to multiple “Consumers”
Collection Processing Engine (CPE)
Text, Chat, Email, Audio,
Video
Collection Reader
Aggregate Analysis Engine
Analysis Engine
Annotator
Analysis Engine
Annotator
CAS
CAS Consumer
CAS Consumer
CAS Consumer
Ontologies
SearchEngineIndex
DBs
KnowledgeBases
CASCAS Initializer
CAS
IBM Software Group
37
WebSphere Information Integrator OmniFind Edition
Delivers enhanced results with sub-second response
Sophisticated relevancy algorithms for corporate content
Scales for large collections or enterprisesUp to 20 Million documents1000s of concurrent users
Fits easily into enterprise applicationsJava APIsDocument level security
Eases administration and maintenanceAnalysis features all under-the-covers
IBM Software Group
38
OmniFind Key Technologies
EnterpriseContentCrawling
Scalable Web crawlerData Source crawlersContent Push
Parsing/TokenizingHTML/XML200+ Doc FiltersAdvance Linguistic
SearchCollections
CategorizationTaxonomyRule-based
AnnotationText Analytics Plug-in
IndexingGlobal AnalysisStatic RankingStore
Dynamic RankingFielded SearchDynamic SummaryParametric SearchSpell Checking
Searching
Security
IBM Software Group
39
Search Quality: State of the Art RankingDynamic, term-based factors
(term freq) x (1/doc freq)Lexical affinitiesWhere term is found - title, body, anchor textWeight of text - bold, italic, relative font size
Static or document-based factorsMetadataLinks URLsDuplicate detection
Factor weighting dynamically adjusted based on the type of queryNavigational -- HR
e.g. anchor text weighted higher, link analysis,…Informational – Changing intranet password
e.g. term frequency, lexical affinity,…Search quality tuned to collection type
Intranet (linked documents)Based on date (newsgroup, document currency)Document Collection
IBM Software Group
40
Differentiated Value for IBM Clients
Enhances WebSphere Portal investmentsAccesses more sources Scales to larger implementations Leverages the taxonomy defined in the portal for navigation and classificationMigrates rules for rule-based classificationSurfaces similar portlet with additional features
Extends DB2 Content Manager investmentIntegrate DB2 Content Manager repository into enterprise search applicationsProvides native DB2 Content Manager crawler
Leverages Notes and Domino investmentSearch Lotus Notes file folders in enterprise search applicationsProvides native Lotus Notes crawlerSupports native Domino security meaning it will allow authorized searches down to a an application level
IBM Software Group
41
Comparing OmniFind to Other IBM Search Offerings
WebSphere Portal 5.1
Search Engine
Lotus Extended Search 4.0.2
Workplace 2.5 Search
WebSphere IIOmniFind
Edition V8.2Embedded portal
search technology to index and search Portal portlets and pages via Portal Site search technology, Web content, includes ILWWCM Web published content, Portal Document Manager content, and attachments
Search broker technology included w/ WebSphere Portal Extend Brokers search across supported data sources and indexes. Complements WebSphere Portal search capabilities with reach to additional data, content and index search sources
Embedded searchincluded in Workplace products.Index/search capabilities to: Team CollaborationWeb ConferencingCollaborative LearningPeople Finder Workplace Messaging (client)
Enterprise search engine for intranets, extranets and public corporate websites.and industry applications
Enterprise scale and broad reach to additional content sources for upgrading WebSphere Portal, Lotus Domino and Workplace customers.
IBM Software Group
42
IBM Search CompetitionCompetitor Competitor Weakness IBM StrengthVerity
Autonomy
FAST
Convera
Google Google Search Appliance
Focused on Internet search
Microsoft Entering market. Initial focus on Internet search, unlikely to reach non-MS content repositories.
Verity is largest with 450 employees and $113M revenue (2003)
Small companies with limited resources
Narrow focus on indexing and retrieval.
Most products are very complex with high TCO
Significantly higher price points
IBM views search as an extension to a comprehensive information integration infrastructure that includes information management, content management and information retrieval technologies required for building enterprise applications in the on demand era.
Customers understand they can rely on IBM as a long-term partner
Search technology is a strategic element in IBM’s software stack and, as such, is receiving tremendous focus and investment, $50M annually
IBM Software Group
43
Google - focused on the retail internet market Advertising revenue focused - 95% of revenuePage-ranking system is not optimized for enterprises
Corporate Intranets - fundamentally different: Less content, lower chance of finding the perfect match Poorly linked; linking process more centrally controlledContent stored in many different systems besides webEnterprise security needs differ“Black box” offering runs counter to enterprise HW/OS standards
OmniFind is designed to produce the best results from enterprise content
What about the Google Search Appliance?
IBM Software Group
44
Proven Quality, Scale and Robustness on IBM IntranetQuality – preferred 2:1 over
prior technology
Scale -- 80K queries/day with sub-second response over 7M pages
Indexes9 M unique pages 10,000 websites20 K per document
Processes80 K queries/day7 K queries/hour peakStressed to 10x higher
Robustness -- 99.9% availability since Sept.
24x7 operation
2 – Parsing & Tokenizing
3 – Indexing Build & Push
4 – Searching
CrawlerSearch Servers
Indexer
GO
1 – Crawling
IBM Software Group
46
Enter the search terms that you want, such as:
How to change CMVC password
Advanced Search for more options
Tabs allow you to direct where you want to look.
IBM Software Group
47
Excellent results! Note highlighted terms and word “stemming”. “Lexical affinity” – the proximity
of the words help in search quality
Quicklinks are predefined searches
Each summary is built dynamically – it is based on the search terms that you
entered.
IBM Software Group
49
Business Partner Extensions
EndecaMulti Faceted, or Guided Navigation
iPhraseSelf Service Applications, Natural
Language ProcessingAthoc
Subscription and NotificationSun & Son
Expertise Location, Taxonomy Services, Knowledge Management
Muse GlobalAccess to specialized datasources
UpshotDocument Intelligence
IBM Software Group
50
When to choose OmniFIND versus WebSphereContent Discovery Server
Choose OmniFIND when:The need is for Bi-Directional LanguagesThe need is text analytics
IBM Software Group
51
Summary
Most companies have multiple content management systems
Information in these systems is often isolated from key applications and security/compliance policies
IBM offers powerful, well differentiated products for content integration and enterprise search that enable organizations to better leverage and control distributed content assets
These products are key to business solutions related to customer service, records management/compliance, research & intelligence, various production workflows, and more