Alfresco WebScript Connector for Apache ManifoldCF
-
Upload
piergiorgio-lucidi -
Category
Technology
-
view
2.161 -
download
6
description
Transcript of Alfresco WebScript Connector for Apache ManifoldCF
![Page 1: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/1.jpg)
Apache ManifoldCFAlfresco WebScript Repository Connector
Alfresco Meetup Rome 2013
![Page 2: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/2.jpg)
About me● Open Source ECM Specialist at Sourcesence
● Author and Technical Reviewer at Packt Publishing○ Alfresco 3 Web Services (2010)○ GateIn Cookbook (2012)
● Alfresco Community (nickname OpenPj)○ Alfresco Community Star○ Alfresco Wiki Gardener○ Top 10 supporter (english and italian) ○ Moderator of the italian forum
● PMC Member and Committer at the Apache Software Foundation
● JBoss Community○ Content editor for jboss.org○ Project Leader and Committer for PortletSwap / Blog / Wiki
![Page 3: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/3.jpg)
Overview● Introducing Apache ManifoldCF
○ What is ManifoldCF?○ Why ManifoldCF?○ Architecture○ Who is using ManifoldCF?○ The book
● How ManifoldCF supports Alfresco● The goal of the new connector
○ Architecture○ Roadmap○ The team
● Resources
![Page 4: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/4.jpg)
The storyThe original ManifoldCF code base was granted by MetaCarta to the Apache Software Foundation in December 2009. The MetaCarta effort represented more than five years of successful development and testing in multiple, challenging enterprise environments. The project was graduated as Apache Top Level Project in July 2012.
![Page 5: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/5.jpg)
What is ManifoldCF?Open Source crawler● crawling model (add, change, delete)● schedule jobs to create indexes
○ get contents from repositories○ push contents on search servers
Apache ManifoldCF
Repository 1
Repository 2
Repository 3
Search Server 1
Search Server 2
Search Server 3
![Page 6: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/6.jpg)
What is ManifoldCF?
● Out-Of-The-Box it is distributed as a webapp
○ REST API
○ Authority Service
○ Crawler UI
● can be embedded in any Java application
![Page 7: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/7.jpg)
Why ManifoldCF?● Reliability
● Incremental
● Flexible
● Multi repositories
● Security model
● Monitoring
![Page 8: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/8.jpg)
Why ManifoldCF? - ReliabilityJobs scheduling and configuration are stored in the database to
maintain the state of all the executions
Pull Agent Daemon
Database
Repository Search Serverconfiguration and scheduling
![Page 9: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/9.jpg)
Why ManifoldCF? - Incrementalget content changesets obtained from the repository API
completechangesets Apache ManifoldCF
Repository
![Page 10: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/10.jpg)
Why ManifoldCF? - Flexible
incompletechangesets
N1N2
ChangeDiscovery
Apache Manifold CF
Repository
If the repository can't supply all the changes Manifold can
discover them through crawling
![Page 11: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/11.jpg)
Why ManifoldCF? - Multi repositoriesJobs can retrieve contents from the following repositories:
● CMIS-compliant● Alfresco ● IBM FileNet● EMC Documentum ● Microsoft SharePoint● OpenText LiveLink● Autonomy Meridio● Memex Patriarch● Windows Share/DFS ● Generic JDBC ● Generic Filesystem ● Generic RSS and Web
![Page 12: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/12.jpg)
Why ManifoldCF? - Multi repositoriesJobs can ingest contents to the following search servers:● Apache Solr● ElasticSearch ● OpenSearchServer● MetaCarta GTS
![Page 13: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/13.jpg)
Why ManifoldCF? - Security model
Retrieve per-content ACLsAuthority Service
Pull Agent Daemon
Repository 1
Repository 2
Repository 3
Authority 1
Authority 2
Authority 3
Search Server
user access tokens
doc access tokens
user specific search results
![Page 14: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/14.jpg)
Why ManifoldCF? - MonitoringUI Crawler allows you to:● configure jobs and connectors● monitor jobs execution● monitor contents ingestion
○ status reports■ document status■ queue status
○ history reports ■ simple history■ maximum activity■ maximum bandwidth■ result histogram
![Page 15: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/15.jpg)
Architecture - Job
JobRepository Search Server
ACLs
- metadata mapping- content ingestion
retrieve content ACL
- verbal description- crawling model- scheduling
query to retrieve contents
Repository Connector
Output Connector
Authority Connector
![Page 16: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/16.jpg)
Who is using ManifoldCF?
![Page 17: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/17.jpg)
The book: ManifoldCF in Action
ManifoldCF in Action
by Karl Wright
published by Manning
Karl is the original developer and the
principal committer of Apache ManifoldCF
The book is available at http://www.manning.com/wright
![Page 18: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/18.jpg)
How ManifoldCF supports Alfresco● CMIS Repository Connector based on OpenCMIS
● The current Alfresco Repository Connector only supports CML
○ works on any version of Alfresco 2.x, 3.x and 4.x
○ no support for quering Solr from Alfresco
○ it will die at the end of the year
○ Please see the Alfresco Roadmap
![Page 19: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/19.jpg)
Alfresco Solr search subsystem● Remote crawling of contents and ACLs into Solr
○ REST API for retrieving changesets from Alfresco db● Solr server provided by Alfresco
○ based on Apache Solr 1.4.1 (uhm...really!!!???)● hardcoded● can't be used with your own Solr instance
○ customers have newer version of Solr■ interested in new features (SolrCloud, sharding...)■ hundred of improvements available in 3.x and 4.x
![Page 20: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/20.jpg)
Alfresco Solr search subsystem
AlfrescoSolr 1.4.1
(provided by Alfresco)
Alfresco REST Client
alf_transactionalf_acl_*
alf_node_*
Transactions and ACL
Indexes
![Page 21: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/21.jpg)
Roadmap
![Page 22: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/22.jpg)
Goal - 1Create a new connector using the Alfresco REST Client
● provided and supported by Alfresco
○ for us is a Maven dependency :)
● invokes the Alfresco Solr API
![Page 23: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/23.jpg)
Goal - 2 - check feasibilityCreate a real Enterprise alternative for managing indexes
● compatibility with the SearchService of Alfresco
● repository takes care only of contents
● indexes are managed externally
● no redundancy for indexes
effort to redirect queries executions
![Page 24: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/24.jpg)
Goal - 3 - SecurityImplement an Alfresco authority connector○ manages ACLs indexing
![Page 25: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/25.jpg)
Goal - 4Manage indexes using ManifoldCF against any supported
search server
● Apache Solr 3.x / 4.x
● ElasticSearch
● Open Search Server
● MetaCarta
![Page 26: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/26.jpg)
Architecture
Alfresco
alf_transactionalf_acl_*
alf_node_*
ManifoldCFAlfresco WebScript
Repository Connector
Alfresco REST Client
Output Connector
Search Server
Indexes
![Page 27: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/27.jpg)
The team of the new connector
● Piergiorgio Lucidi (Sourcesense + ASF)
● Maurizio Pillitu (Alfresco)
● Aingaran Pillai (Zaizi) [new entry]
● Fran Alvarez (Zaizi) [new entry]
● Abraham Ayala (Zaizi) [new entry]
![Page 28: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/28.jpg)
Join us!
● We are looking for developers
● this is a work in progress
● don't fork the project feel free to join us
^__^
![Page 29: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/29.jpg)
Resources
● Apache ManifoldCFhttp://manifoldcf.apache.org/
● The connector hosted on github:https://github.com/maoo/alfresco-webscript-manifold-connector
● it will be included in Apache ManifoldCF
![Page 30: Alfresco WebScript Connector for Apache ManifoldCF](https://reader034.fdocuments.us/reader034/viewer/2022052216/54b6b4144a795942358b458a/html5/thumbnails/30.jpg)
Thank you for your attention!
http://www.open4dev.com