Introduction Web viewThe Alfresco Out of the Box content model and ... They can have modular...
Transcript of Introduction Web viewThe Alfresco Out of the Box content model and ... They can have modular...
For Alfresco 4.0 Enterprise Edition
Alfresco Scalability Blueprint
Copyright 2012 by Alfresco and others.
Information in this document is subject to change without notice. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Alfresco. The trademarks, service marks, logos, or other intellectual property rights of Alfresco and others used in this documentation ("Trademarks") are the property of Alfresco and their respective owners. The furnishing of this document does not give you license to these patents, trademarks, copyrights, or other intellectual property except as expressly provided in any written agreement from Alfresco.
The United States export control laws and regulations, including the Export Administration Regulations of the U.S. Department of Commerce, and other applicable laws and regulations apply to this documentation which prohibit the export or re-export of content, products, services, and technology to certain countries and persons. You agree to comply with all export laws, regulations, and restrictions of the United States and any foreign agency or authority and assume sole responsibility for any such unauthorized exportation.
You may not use this documentation if you are a competitor of Alfresco, except with Alfresco's prior written consent. In addition, you may not use the documentation for purposes of evaluating its functionality or for any other competitive purposes.
This copyright applies to the current version of the licensed program.
ii
Document History
VERSION DATE AUTHOR DESCRIPTION OF CHANGE
0.1 2012-04-22 Gabriele Columbro Initial Table of Contents and first ideas gathering for review
0.2-DRAFT 2012-06-07 Gabriele Columbro Incorporated preliminary comments from Docs and Eng, and completed a full first version.
0.3-DRAFT-HM
2012-06-25 Helen Mullally Copyedit
1.0.4-DRAFT-RF
2012-06-26 Rui Fernandes Tech Review
0.5 2012-06-27 Derek Hulley Tech Review to pg 25
0.6 2012-06-29 Andy Hunt Tech Review to pg 29
0.7 2012-06-29 Briana Wherry Merged Comments and accepted editing comments.
iii
Table of ContentsINTRODUCTION...................................................................................................5SYNOPSIS...............................................................................................................5SCOPE....................................................................................................................5INTENDED AUDIENCE................................................................................................5
Executives.....................................................................................................................................................................5Advanced Alfresco 4 developers...................................................................................................................................5
READERS’ PREREQUISITES.......................................................................................6ASSUMPTIONS.........................................................................................................7OUT OF SCOPE........................................................................................................7NOTICE TO READERS...............................................................................................7ACRONYMS AND SYMBOLS........................................................................................8
INTRODUCTION TO ALFRESCO ECM SCALABILITY........................................9ALFRESCO ECM SOLUTIONS....................................................................................9
Alfresco core ECM Solutions........................................................................................................................................9Alfresco extended ECM Solutions...............................................................................................................................10
ALFRESCO ECM SOLUTIONS SCALABILITY..............................................................11ECM SOLUTIONS SCALABILITY FACTORS.................................................................12
Performance...............................................................................................................................................................12Load distribution..........................................................................................................................................................13Availability...................................................................................................................................................................14Out of scope performance scalability factors...............................................................................................................15
ALFRESCO 3 ARCHITECTURES SCALABILITY GOTCHAS..............................................15In process (or in transaction) content and metadata indexing.....................................................................................16Lucene index tracked per node and Ehcache replicated independently......................................................................17ACL post query permission checking..........................................................................................................................18Non-HTTP based Virtual File systems not available in HA mode................................................................................20
ALFRESCO 4.X NEW SCALABILITY FRONTIERS..........................................................20Apache Solr indexing tier............................................................................................................................................20Clustered file system interfaces..................................................................................................................................26Alfresco Transformation Server...................................................................................................................................26
A NOTE ON THE ALFRESCO CLOUD SERVICE...........................................................26Transparent multi-tenancy..........................................................................................................................................27Index sharding............................................................................................................................................................28
THE ALFRESCO 4 BENCHMARKS....................................................................29BENCHMARK TYPES...............................................................................................29
Scalability Benchmarks...............................................................................................................................................29Comparative Benchmarks...........................................................................................................................................30System boundary discovery benchmarks....................................................................................................................30
iv
IntroductionThis chapter introduces the scope of this document, the intended audience, and the required level of Alfresco knowledge. It also includes a list of documents for recommended reading.
SynopsisThis document presents the results of the Alfresco Enterprise benchmarks and uses them to analyze Alfresco 4 performance and scalability in an Enterprise collaboration scenario. Use this document as a reference for practical recommendations and best practices for successful sizing, architecture, and deployment of large-scale Alfresco solutions.
ScopeThis document applies to Alfresco Enterprise 4.0 (and above), which is the target platform on which the first benchmarks were run. Benchmarks are a continuously evolving effort so further benchmarks might disprove or update the results and considerations presented in this document. Therefore, this is considered to be a “living document” and is generally intended for the whole Alfresco 4.x series. Specific tuning best practices might be superseded or obsolete once new benchmarks are run on future versions.
Intended audienceThis document is intended for the Alfresco Enterprise customer and partner network with special focus on the most technical teams, such as Enterprise Architecture, Development, Support, and Operations. As it requires a deep understanding of the architecture, components, and technologies involved in the operations of the Alfresco platform, the ideal reader should hold Alfresco Certified Engineer (ACE1) or Alfresco Certified Administrator (ACA2) certification.
ExecutivesAlthough this document is intended for a technical audience, the benchmarks results may also be of interest to a non-technical audience. Two sections are suitable for this audience:
Error: Reference source not found - presents the benchmarks results in the context of a real-life Enterprise deployment scenario
Error: Reference source not found - provides general conclusions and statements that are applicable to the benchmarks results
Advanced Alfresco 4 developersAdvanced developers who are familiar and have hands-on experience with Alfresco 4 may wish to skip directly to the chapters The Alfresco 4 Benchmarks and Error: Reference source not found. These chapters provide a quantitative analysis of Alfresco 4 performance and scalability based on the benchmarks results. For the complete reference of Alfresco and underlying components configuration, refer to the Error: Reference source not found appendix.
1 http://university.alfresco.com/ACE.html2 http://university.alfresco.com/ACA.html
5
Readers’ prerequisitesIn addition to being ACA/ACE certified, readers should have a full understanding of the following technical areas:
Alfresco deployment stacks
Alfresco 4 architecture and components, especially Solr
Alfresco clustering and high availability
General hardware/infrastructure/network performance concepts
Load and performance testing
Alfresco 3.x scalability concepts and Lucene integration
The following documentation is recommended reading:
Scale your Alfresco Solution3 available on the Alfresco Support site - offers a general introduction to architectures, design, and tuning of Alfresco highly-scalable solutions. Since it is based on Alfresco 3.2, some concepts are superseded by this benchmark document, such as the introduction of a separate indexing tier with Apache Solr in Alfresco 4. However, it still represents a valid reference and qualitative introduction to the concepts presented in this document.
Alfresco Online Documentation – see the sections on High availability4 and Solr integration5
Alfresco DevCon developer presentations available in Slideshare6 - in particular, see the presentations on Solr integration7 and scalability8
Alfresco training9 or a dedicated Alfresco Consultancy10 developer enablement package
For additional questions and enablement on Alfresco scalability, contact Alfresco Support or your local Solution Engineer if you are an Alfresco Customer, or refer to the Alfresco Partner Team11 if you are an Alfresco Partner.
AssumptionsThe benchmarks and the scalability analysis results presented in this document are based on a number of assumptions: ranging from technical to a process/scenario, in order to drill down the complexity of the benchmark exercise and trying to match the expectations of the widest audience within the Alfresco Enterprise network.
The following list outlines the high-level assumptions:
3 http://support.alfresco.com/ics/support/DLRedirect.asp?fileID=18158 (login required)4 http://docs.alfresco.com/4.0/topic/com.alfresco.enterprise.doc/concepts/ha-intro.html5 http://docs.alfresco.com/4.0/topic/com.alfresco.enterprise.doc/concepts/solr-intro.html6 http://www.slideshare.net/alfresco7 http://www.slideshare.net/alfresco/understanding-the-solr-integration8 http://www.slideshare.net/alfresco/performance-and-scalability-101617039 http://university.alfresco.com10 http://www.alfresco.com/services/consulting/11 https://partners.alfresco.com/
6
Alfresco 4.0.0 Enterprise Edition was used, leveraging Apache Solr for the indexing tier. However, the focus of the benchmark was on the Alfresco repository, so Solr physical scaling is treated as a black box for the rest of this document.
Alfresco and Share usage as a large scale Enterprise Collaboration platform is the main scenario/use case on which to proof the platform, especially in a search intensive use case.
The Alfresco Out of the Box content model and functional configurations were used.
A pre-defined, reproducible, incremental approach was used to run the benchmark to ensure that a like-for-like comparison between results is as precise as possible. Nevertheless, certain configurations have been changed during the benchmark while exploring larger data points, as product configuration bottlenecks were discovered. See the Error: Reference source not found for a detailed report on the configurations used.
For specifics of the technical implementation, refer to the following sections.
Out of scopeBased on the process, software, hardware, scenario, assumptions, and resources used to run the benchmarks, the results and, therefore, the analysis presented on this document are NOT applicable to:
Alfresco Community and Cloud editions
Alfresco versions other than 4.0.0; it is expected that 4.0.x maintenance releases will provide improvements on the metrics presented in this document, also based on the product configuration and bottlenecks studied in the benchmark
Highly customized instances of Alfresco, either in terms of content model, configuration, or custom developed functionalities
Different components of the stack (for example, JVM, DBMS, Application Server, Operative System) and their respective configuration
Different scenarios and APIs used, other than the Enterprise Collaboration Platform scenario that was reproduced for the benchmarks, as described in Alfresco 4 Benchmark details
Notice to readersBenchmarking is a formal and highly technical process for the optimal configuration and for reproducibility between different tests. Metrics collected from the benchmark are the result of a very fragile balance between stack components’ configurations, client and server resources, and the exact scenario used for testing.
Alfresco sizing, configuration, and tuning, as for any cross-stack vertical platform, are dependent on the type of operations executed against the platform, their frequency, and the APIs used to perform those operations.
In this sense, the results presented in this document should always be referred to as a sizing and benchmarking model, rather than as blind performance requirements metrics.
Most importantly:
7
Do not apply the benchmarks results and scalability analysis “as is” to your context. Instead, use them as a relative measurement to estimate the potential load requirements of your application. More simply, use them for what they are: results from an official benchmark to be used as an informative reference and source of inspiration.
Acronyms and symbolsThe following acronyms and symbols are used throughout the document:
ACRONYM/SYMBOL DESCRIPTION
ACE Alfresco Certified Engineer
ACA Alfresco Certified Administrator
BFSIT Bulk File System Import Tool12
TXN Transaction
DBMS DataBase Management System
HA High Availability (Available)
OOTB Out of the Box
ECM Enterprise Content Management
DR Disaster Recovery
I&AM Identity & Access Management
12 http://docs.alfresco.com/4.0/topic/com.alfresco.enterprise.doc/concepts/Bulk-Import-Tool.html?resultof=%22%62%75%6c%6b%22%20%22%6c%6f%61%64%22%20
8
Introduction to Alfresco ECM ScalabilityThis chapter defines the concepts of performance and scalability in the context of the typical ECM solutions and scenarios implemented on Alfresco Enterprise, discusses scalability limits of Alfresco 3, and provides an introduction to the Alfresco Enterprise 4 new architectural features that respond to high-scale content requirements.
Alfresco ECM SolutionsThanks to its modern and flexible design, as well as to its modular architecture, Alfresco Enterprise supports, and has been successfully deployed to, a vast range of vertical or horizontal solutions.
Alfresco core ECM SolutionsFigure 1 Alfresco Core ECM Solutions shows a common classification of the core landscape of typical Alfresco-based solutions:
Figure 1 Alfresco Core ECM Solutions
Systems of Record (or Headless Content Management Platform)
o Typically characterized by the use of Alfresco as a pure Content Server, as a back-end system to store massive archival or controlled document management systems; optionally, uses the Records Management module
o Typical Features: Massive migration or batch injection (10M+), ID or search based retrieval of content, integration using remote APIs/CMIS, simple content model, batch mostly write operations, users mostly read operations
o Although some support information will be provided, these use cases are not the main focus of the benchmarks covered in this document.
9
Systems of Engagement (or Enterprise Collaboration Platform)
o Characterized by the use of Alfresco – and Alfresco Share – as an Enterprise Collaboration platform, leveraging all the potential of the Share interface to offer project/team/department-based collaboration spaces to the extended Enterprise
o Typical Features: High interactive user concurrency, especially during working hours, strong reliance on search, use of the full UI functionalities of Share, users performing mostly read and search operations but also writes, global deployments
o This represents the main use case/scenario/solution that was tested in the benchmarks presented in this document; therefore, these results typically apply better to the deployment of this type of solution.
Web Content Services (WCS Solutions)
o Typically characterized by the use of Alfresco as an editorial and publishing platform: Alfresco is used as a back-end system to support the Enterprise Web Content delivery process, internally or across the firewall, either directly or via remote content deployment
o Typical Features: Use of Workflow, Rendition APIs, Replication or Transfer Service, CMIS, Web Quick Start, from small to large write interaction, depending on the editorial concurrency, mostly read on the front-end (if directly connected to Alfresco)
o Although some support information might be found in this benchmark, these use cases are not the main focus of the benchmarks covered in this document.
For a full description of this classification with more detailed examples, refer to the “Scale your Alfresco Solutions” document.
Alfresco extended ECM Solutions
deFigure 2 Alfresco extended ECM Solutions landscape
The capabilities of the Alfresco platform are not limited to the three main use cases already presented. Alfresco is increasingly being used to support use cases, such as:
10
Social Content Management and integration with social networks. This can be done via the Alfresco Social Publishing APIs or via supported integrations like the Alfresco/Jive integration13.
• Records Management and Archival based on the US DoD 5015.2 standard compliant module. This will soon be released in its new version (2.0) in order to simplify customization and extensibility to support compliance to other national/regional/particular standards and specifications.
• Business Intelligence to perform Content and/or Workflow OLAP analysis. In the latter case the BPMN 2.0 compliant Activiti14 platform can be leveraged as it is seamlessly integrated in Alfresco.
• Cloud connected content platform to manage content leveraging the cloud infrastructure, either with Alfresco in the cloud15 (see section A note on the Alfresco Cloud) or with custom private cloud Enterprise deployments. Alfresco is active in this area and private <-> public cloud synchronization features are expected in the product in the near future to enable even more distributed cloud deployments.
As is clearly shown from the wide variety of solutions presented, in which Alfresco plays a central role, the interaction pattern with the platform is highly dependent on the particular Alfresco scenario. It can vary considerably in terms of user concurrency and operation type, typical size of the repository, integration API or user interface in use. That typically translates into stressing potentially different internal components of Alfresco – or underlying component of the infrastructure – and therefore in potentially very different architecture, design and tuning requirements.
Alfresco ECM Solutions ScalabilityWhile obviously general performance tuning concepts apply, as previously explained, Alfresco ECM supports such a varied mix of use cases, processes, and APIs that scalability of the Alfresco platform is very strictly correlated with the usage scenario.
Quoting an interesting blog post16:
“ECM systems can be scalable or they can fail to scale well. They can have modular architectures that allow you to simply add more elements as required, rather than multiply the entire system as things expand. They can be scalable in that they have built in high availability, automatic failover support, run on enterprise grade application servers and databases. They can be scalable because they have been tested and proven to handle very high volumes (hundreds of millions of documents) in the repository and/or tested and proven to handle very high throughput rates (tens of thousands per hour or minute). There are many ways in which an ECM system can scale or not. But the biggest element determining whether the system can scale is your usage of it”As ECM semantics grow alongside the content explosion we are experiencing with the advent of social networks, collaboration platforms, and content mobility increasingly penetrating the Enterprise, it is fundamental to have a solution-oriented
13 http://www.alfresco.com/products/integrations/jive/14 http://activiti.org/15 https://cloud.alfresco.com16 http://www.realstorygroup.com/Blog/1403-Scalable-ECM
11
approach when trying to define your specific scalability requirements. In other words, there is no such concept as ECM Scalability per se unless that is put in a specific deployment context and usage scenario, such as the one that can be re-created in a benchmark lab.
Therefore, for the remainder of this document, the word “scalability” refers to the scalability of the Alfresco platform for an Enterprise collaboration scenario that is typical of systems of engagement, rather than referring to platform scalability in general.
With this in mind, we introduce the common factors typically required in a large-scale Alfresco ECM solution to provide benchmark results based on the considerations covered in the following chapters.
ECM solutions scalability factorsAlthough ECM scalability is often a fairly linearly defined concept, it is heavily solution-dependent. Some areas of scalability may affect the way Alfresco is deployed and architected, and should be taken into account when designing your solution. The following section provides a suggested classification of the scalability factors, and also identifies whether or not they are covered in this document.
PerformancePerformance is the most straightforward factor around scalability, as it is tightly related to the end user experience. In this sense, performance is typically one of the most stringent and discussed customer requirements. Clearly defining the required performance characteristics before the implementation of the project will help establish an estimate of the load that the Alfresco platform will be required to support. Performance is typically measured in Alfresco response time to user or batch operations: a simple approximated but still statistically valid way of defining performance of an ECM platform is the general average response time of all user operations. Throughput is another common measurement.
The type of user operations that compose the test scenario is fundamental, and, since they might hit different underlying components, it is also interesting to measure the average response time or throughput for a specific operation type (for example, read, write, or search).
Performance snapshots (measurements taken during one test run) are of interest for the benchmark; however, in the context of scalability, performance is typically measured across dimensions, evaluating the impact on performance caused by changes to these dimensions. For the scope of this document, using the Enterprise collaboration scenario, a possible definition of performance requirements could be:
Baseline performance:
Alfresco should support an average response time below t0
milliseconds for a baseline u0 concurrent users on a reference repository of c0 content items.
Performance scalability on users dimension:
12
Alfresco should support an average response time degradation tΔ u below a tΔ uMax threshold with respect to the baseline upon a
u growth on the users Δ dimension.
Performance scalability on content dimension:
Alfresco should support an average response time degradation tΔ c below a tΔ cMax threshold percentage with respect to the
baseline upon a c growth on the content dimension.Δ
Combined users/content dimensions comparative performance analysis is also possible, but it involves additional caveats and complexity. We discuss this in the following chapters, providing more specific definitions of the exact dimensions, data points, and thresholds used to prove Alfresco scalability.
Performance scalability across the aforementioned dimensions is the main set of results and metrics analyzed in the benchmarks and presented in this document.
Load distributionAnother fundamental aspect to take into account while defining a scalable architecture is how well incoming workload is distributed between the different components of the architecture, in order to provide a balanced framework that can be designed to scale arbitrarily.
While concepts like load balancing and distribution are even more fundamental in cloud deployments1718, they represent a key factor of a successful on-premise Alfresco ECM implementation. There are two fundamental reasons to leverage a modular platform that can distribute the load:
As explained, load requirements on Alfresco deployments might vary considerably with increased user adoption or with an increase in the number of ECM processes based on it. A platform that is flexible and can reactively redistribute load will be able to cope with these variations with minimal impact. Cost optimizations can be identified by a better spread of Alfresco load across tiers and cluster instances instead of having to rely on pure vertical scaling only to increase load requirements on your Alfresco solution.
Alfresco has always offered a great deal of flexibility and modularity on architectural deployments, ranging from supporting different stack components to remote independent logical tiers able to spread the overall platform load across multiple physical machines.
Together with performance, distribution of the load upon growth across the different dimensions will be discussed in the next chapters, based on the benchmark results.
AvailabilityAvailability is another factor determining the scalability of an ECM system. Alfresco offers core product answers to typical availability requirements, such as:
17 https://devcentral.f5.com/weblogs/macvittie/archive/2009/01/23/load-balancing-is-key-to-successful-cloud-based-dynamic-architectures.aspx18 http://gojko.net/2010/01/25/designing-applications-for-cloud-deployment/
13
High availability: The system should be functional and running for X% of the time (where typically X is close to 100%). This is especially important in global deployments where the system should be up and running 24*7, or in projects characterized by frequent release cycles for which no maintenance windows are provided.
Avoiding single points of failure: This is a common requirement when building resilient architectures, which can support mission critical processes. As explained below, Alfresco components offer horizontal scalability features to respond to this requirement at the application level and integrate with lower layers of horizontal scaling.
Hot Backup and Disaster recovery site: This is defined as the possibility to design architectures that allow batch data backup operations and – mission critical cases that have not considered actively to avoid single points of failure - keep a passive remotely-synchronized instance without any noticeable degradation in the application serviceability. Alfresco supports hot backup19 and advanced DR architectures.
Alfresco, since version 1.4, has supported application level clustering to provide a strong degree of horizontal scalability to the application, along with the more intuitive concept of vertical scalability (providing more processing power to a maxed out system will definitely raise the throughput). In general, Alfresco responds to availability scalability requirements with the following features:
Application Level Clustering via JGroups/Ehcache and index tracking
Mostly transparent to Application Server clustering on supported application servers
Support HA and failover solutions on underlying components: for example, Oracle RAC (active/active) or MySQL cluster (active/passive) for the database, as well as snapshot enabled disks for the content store
Since Alfresco 3, Alfresco Share is the default remote, independently scalable Web Tier to support Enterprise wide ECM Platform deployments
With Alfresco 4 - and the introduction of a fully externalized indexing tier based on Apache Solr (Alfresco patched) - a new range of architectural options becomes available for deploying Alfresco in HA mode, avoiding single points of failure and transparently scaling the indexing tier in a dedicated manner.
We will discuss, especially in the "Alfresco 4.x new scalability frontiers” section, Alfresco 4.x new scalability frontiers and how Alfresco 4 can be leveraged to address availability requirements. However, availability is not the main focus of this scalability document even though the benchmark architecture has high availability features, such as Alfresco clustering.
A note on stability
Application stability (or availability) is clearly distinguished from the concept of performance and is, therefore, treated separately. An unstable system at core resources level will probably result in unsatisfactory performance and a poor user experience. The following issues are typical signs of a system that is overloaded,
19http://docs.alfresco.com/4.0/index.jsp?topic= %2Fcom.alfresco.enterprise.doc%2Ftasks%2Fbackup-hot.html
14
undersized, or not properly tuned, and likely to present functional leaks or bottlenecks:
System close to Out of Memory
Processes running out of CPU
OS running out of file handles or I/O bandwidth
Frequent crashes due to OS general failures
In these cases, the suggestion is to tackle stability/availability issues earlier than performance fine tuning, concentrating on reaching a predictable and stable system to use as a model for performance and scalability corrections on the original system.
Out of scope performance scalability factorsAt least two more performance dimensions are considered while studying Alfresco scalability in the context of an Enterprise Collaboration platform deployment:
Geographic Performance scalability defined as: The average response time or throughput of the application is independent of or negligibly impacted by the geographic location of the user. This is especially important for global deployments in which an Alfresco “logical content tier” can be deployed across multiple physical Alfresco instances (for example, using Content Transfer and Replication) services.
Vertical Performance scalability defined as: The average response time or throughput is positively impacted by scaling up resources in one or more components of the infrastructure (typically adding CPU or RAM in one of the tiers). Typically, this type of scalability is better studied in a virtual environment where resources can be added seamlessly to a virtual machine. However, virtual environments have their own specific limitations with regards to performance.
Unfortunately, these types of scalability dimensions are very difficult to reproduce and prove in a single benchmark lab environment. Because of this, these scalability dimensions were left out of the analysis and are out of scope for this document.
Alfresco 3 architectures scalability gotchasAs discussed in the section Availability, Alfresco 3 already offers radical answers to scalability, including clustering/load balancing of the Alfresco repository and Share tiers, as well as more sensible distribution of operations dependencies between database and indexes. This section refers to “Scale your Alfresco Solutions” for a comprehensive discussion on the main HA points.
There are a few areas where Alfresco 3 is presented some limitations to indefinite theoretical scalability of the solutions built on top of it. This is especially evident in systems that are not properly tuned or that are undersized.
15
To understand how Alfresco 4 overcomes these potential bottlenecks, it is important to provide a high-level view of the Alfresco 3 shortcomings around scalability. As a reference for a typical Alfresco 3 multi-layered large-scale architecture, we will use the architecture presented in Figure 3 Alfresco 3 typical high scale architecture. We will not focus on the details of this architecture, as the focus of this document is on Alfresco 4 architectures.
Figure 3 Alfresco 3 typical high scale architecture
In process (or in transaction) content and metadata indexingIn Alfresco 3 by default, all metadata (and possibly some content) is indexed during the create/update transaction (“In-Transaction Indexing”).
Figure 4 Alfresco 3 - Internal Node Creation process
This model guarantees that the database and search indexes are kept in synch, which in turn allows developers to reliably use database- and search-based APIs side by side. For example, the NodeService API is database-based, while the
16
SearchService API relies on the Lucene indexes. However, this model poses some important questions about system scalability:
Performance:
o Performance might be adversely impacted by growth in the user concurrency dimension, especially when performing write operations.
o The in-transaction indexing model involves I/O pressure and contention on the index which is proportional to the number of concurrent users, because of the synchronous attempts to access the low-level file system indexing segments.
o Search and write operations contend the same index in a transactional fashion. This can create performance bottlenecks.
Load distribution: o Apart from the normal content repository operations, Alfresco uses
resources and transaction time updating Lucene indexes for every write transaction. This impacts the read and search requests that are being issued against the server.
o Since there is a very small separation between the different concerns of content storage and indexing, the load is completely concentrated on the Alfresco repository.
Lucene index tracked per node and Ehcache replicated independentlyAll Alfresco HA clustering models require the two following processes to be carried out independently by the Alfresco repository tiers (see Figure 5 Alfresco 3 Shared Repository Cluster):
Pull Lucene Index Tracking: Each Alfresco instance keeps its own copy of the Lucene indexes and the Index sync (or tracking) process has to be enabled in every cluster node to keep the indexes up to date with the transactions added by the different nodes of the cluster in the shared database.
Push Ehcache replication: Each Alfresco instance will broadcast L2 cache invalidation and update messages to the other members of the cluster upon completion of a transaction (see Figure 4 Alfresco 3 - Internal Node Creation process) to update them on the transaction updates required to prevent stale cache entries reads. This process runs on a frequent schedule and keeps the cluster node caches in sync.
This approach has the potential for consistency issues between the DB, the caches, and the indexes.
17
Figure 5 Alfresco 3 Shared Repository Cluster
For the purpose of this document, the following scalability limiting factors can be identified in this approach:
Performance:o Index maintenance (for example, Index Merging20) processes could
have an impact on performance, especially during concurrent activity peaks.
o PATH queries – especially using //* wildcards – are very slow and resource intensive, limiting the use of hierarchical queries for high concurrent usage.
Load distribution:o Index tracking and maintenance work has to be carried out by each
and every Alfresco front-end node. Each node also has to cope with front-end user interactions and there is no way of scaling out this set of operations (scaling up is the most common solution).
o The index related load (tracking and maintenance) is proportional to the number of nodes in the cluster, with increased access to the DB and content stores.
ACL post query permission checking Search is extensively used when Alfresco is deployed as a collaboration platform. In large global deployments, corporate I&AM systems are integrated with Alfresco, synchronizing large numbers of users and groups, mapping them in an arbitrarily complex matrix of hierarchical permissions on Alfresco spaces and sites. In these contexts, one of the most stressed Alfresco components is the one delegated to filter
20 http://wiki.alfresco.com/wiki/Index_Merging_Performance
18
search results according to the current user’s permissions: that is, evaluation of hierarchical ACLs against search results.
As shown in Figure 6 Alfresco 3 - Internal query process, permission checking is carried out after Lucene has generated the result set for the search.
Figure 6 Alfresco 3 - Internal query process
This model involves an additional iteration across all rows of the result set after selection, a process which, being resource and time intensive, is controlled by the following timeout parameters:system.acl.maxPermissionChecks
system.acl.maxPermissionCheckTimeMillis
These two parameters respectively limit the number of results that undergo permission checking and limit the total time that is spent on the permission-checking phase. Therefore, it is possible that one search will not return all the results visible for the user if permission-checking timeouts are hit. Also, permission-checking results are cached, so successive execution of the same search might return an increased number of results that could be permission checked. This type of effect is typically seen when the repository content nodes are in the order of millions.
Putting aside for a moment the pure consistency implications, from a scalability standpoint, we can identify potential bottlenecks in:
Performanceo Raising the system.acl.* properties might overcome the
consistency implications, allowing more time to check permissions; this could return a constant number of results. On the other hand, performance is traded for consistency: the performance cost increases as the repository grows in size and the ACLs grow in complexity.
Load distributiono The search ACL checking load is an additional responsibility
consolidated in the repository, inherently limiting the chances for scaling out such a complex operation.
19
Non-HTTP based Virtual File systems not available in HA modeWhile all the Alfresco components can be deployed in HA mode, with respect to the several UIs provided to end users, in Alfresco 3 there was a limitation on scaling virtual file systems horizontally. Virtual file systems are a powerful Alfresco feature that allow you to mount the Alfresco repository as a WebDav, FTP, CIFS, or NFS shared folder.
Whereas WebDav (HTTP based and relying on standard Alfresco servlets, caching, and clustering rules) has always been clusterable, FTP, CIFS, and NFS virtual file system interfaces could not be clustered in Alfresco 3: this was mainly due to the presence of additional caching layers and limitations in handling intra-cluster locking operations.
There are obviously implications on the degree of scalability that could be reached:
Performanceo Performance of virtual file systems could be adversely impacted by
high concurrent load.
Load distributiono In the case of high usage of the virtual file systems, it is easy to
overload a clustered production installation. All the HTTP traffic can be load balanced while all the virtual file server traffic must be directed to a single node.
o HA architectures using this approach are by definition unbalanced clusters, as all virtual file server traffic is redirected to only one member of the cluster.
Availabilityo Virtual file servers become a single point of failure.
Alfresco 4.x new scalability frontiersAlfresco 4 introduced a few key design and architecture innovations that were targeted at simplifying the deployment of large-scale solutions. Since these new deployment options are key to understanding the architecture of the benchmark, they are highlighted in this section.
In particular we will see, with respect to the scalability limitations presented in the Alfresco 3 architectures scalability gotchas, how Alfresco 4 addresses those and offers new frontiers towards even larger scale deployments.
Apache Solr indexing tierFrom an architecture and scalability standpoint, the main innovation in Alfresco 4 is the definition of an external, independent indexing tier implemented by integrating the mature enterprise search engine Apache Solr21.
Index subsystem design
Solr is itself based on a more recent version of Lucene, but exposes a remotely addressable REST API to allow remote querying, update, and administration of the underlying Lucene indexes. Solr itself supports large scale enterprise search intensive deployments and this brings immediate scalability benefits to the Alfresco indexing components, whose responsibilities can now be fully delegated to an
21 http://lucene.apache.org/solr/
20
independent (possibly remote) subsystem like Solr. Accordingly, we can identify the design/architecture innovation in Alfresco 4 as being:
The full decoupling of Alfresco core and UI functionality from an embedded indexing system, encapsulating indexing features in a
fully independent, and independently scalable, indexing tier based on Solr
In Alfresco terms: indexing and search capabilities have been refactored into a cohesive software module – the ‘Index subsystem22 - which is now clearly focused on providing search functions to Alfresco users and client applications. Alfresco 4 does not depend on an indexing and search engine for a correct behavior of most of its service. Alfresco now relies on the DBMS to ensure the essential functionality of the different UIs.
In addition, the allowed values for the property index.subsystem.name are:
lucene: Alfresco will continue to use an instance of Lucene embedded in Alfresco, so all the considerations around Alfresco 3 apply.
solr: Enables the Solr integration.
noindex: Configure Alfresco not to use any indexing system. In this case, obviously, search requests from Alfresco Explorer, Share, or HTTP APIs will not return any results.
The introduction of an independent subsystem in the scalability context offers immediate advantages like:
Distribution of core repository and indexing loads on separate tiers: This can potentially enable very distributed deployments and scalable architectures, as we will see, by using load balancing and caching techniques.
Independent vertical and horizontal scaling of the repository and the index tier
Separation of the front-end query load from the indexing load: This will become clearer in the next section, when we will discuss the concept of eventual indexing (eventual consistency).
This new approach to indexing involves a number of deep changes in the way Alfresco manages the lifecycle of the indexes and interacts with the index subsystem. This refactoring and its effects are key to understanding how the load is distributed between tiers in Alfresco 4 and, therefore, how that impacts the independent scaling of the different tiers. We provide an introduction to these key scalability facts in the following sections.
Transactional consistency vs. eventual consistency
Maintaining a transactional indexing model (like the Alfresco 3 model, which introduced In process (or in transaction) content and metadata indexing) in conjunction with an external indexing tier (involving distributed transactions and network latency) would have posed strong risks to the stability of the platform. Therefore, in order to overcome the scalability limitations of in-transaction indexing, Alfresco 4 introduces the concept of eventual consistency.
22 http://wiki.alfresco.com/wiki/Alfresco_And_SOLR#Configuring_Alfresco
21
Figure 7 Alfresco 4 - Node creation internal process
In Figure 7 Alfresco 4 - Node creation internal process, we show how the node creation process has been refactored in Alfresco 4 not to include any in a transactional indexing operation. Therefore, the Alfresco repository can focus on performing core content management operations delegating the indexing duties to a pull index tracking scheduled job running in the Solr tier23.
In other words, Alfresco 4 removes the requirement to have the database and indexes in perfect synch at any given time and relies on an index that gets updated on a configurable interval (default: 15s) by Solr itself. The index tracker will take care of polling Alfresco for new transactions and will proceed to update its index, pretty much like an Alfresco 3 cluster node will track updates in its own copy of the Lucene indexes. In this sense, indexes will eventually be consistent with the database. Alfresco implementers using the Search API24 should be aware of the following:
Implementers should assume that the indexes will not be consistent with the database at any given time, since the index
update delay might depend on the actual load on the repository/index tiers.
Note that Alfresco 4 does not rely on the index for its core functionality. Wherever transactional consistency was required (authentication, Document Library, bootstrap, check-in/out, etc.), the use of the Search API in Alfresco 3 has been refactored to use database queries.
In-query ACL checks
Another key Alfresco 4 scalability redesign is to delegate the search query results permission checking to Solr, where the index subsystem is configured to use it (note that if the Lucene implementation is in use, Alfresco 3 limitations around ACL performance remain). Leveraging the in-built Solr concept of filter queries25, Alfresco also provides user and group membership information over the wire so that Solr, which holds the document permission information, can return only results that are actually visible for the currently logged in user.
23 http://wiki.alfresco.com/wiki/Alfresco_And_SOLR#Tracking24 https://wiki.alfresco.com/wiki/Search#The_Search_API25 http://wiki.apache.org/solr/CommonQueryParameters#fq
22
Figure 8 Alfresco - Search query internal process provides a graphical representation of the refactored flow of operations:
Figure 8 Alfresco - Search query internal process
From a load distribution point of view, ACL checks are removed from the Alfresco repository tier and distributed to the index with important positive effects, as we will see, on the overall Alfresco user experience. In particular, search results will not be trimmed by any ACL-checking time constraints.
Solr needs to have knowledge of authorization information that is created and modified in Alfresco; therefore, the aforementioned index tracker will have to gather this information together with the content and model updates.
Alfresco and Solr interaction overview
In the last sections we have unveiled the new repository- index interaction pattern of Alfresco 4. This section describes how this is practically implemented in terms of architectural interaction between the Content and Index (Solr) tiers, which need to exchange content-related information in a number of use cases.
Figure 9 Alfresco <-> Solr interaction diagram provides an overview of the remote integration between the two components. This diagram covers two main use cases:
Alfresco repository using Solr as an implementation of the Search API: that is, routing its search requests to Solr and retrieving results
1. Alfresco issues a REST request to Solr, specifying the query and user/group membership information via request parameters and a JSON request body.
2. Solr retrieves the hits, checks their ACL, and returns them in JSON format.
23
Figure 9 Alfresco <-> Solr interaction diagram
Solr periodically polls Alfresco and tracks index updates1. Solr issues a scheduled REST request to Alfresco every 15s (by
default) to get index updates, specifying the last indexed transaction.
2. Alfresco returns index updates in JSON format concerning:
a. Node Properties and Content updated since the last index update
b. Content Model changes since the last index update, in order for Solr to start indexing according to new/change types, properties, aspects, and so on
c. Content ACLs modified since the last index update so that Solr can keep them indexed and use them to filter authenticated searches from Alfresco
As you might notice from the picture, the mapping of Alfresco stores (for example, Workspace, Archive) is very straightforward: each store is mapped onto a separate core in Solr and can be configured and tuned separately26.
Thanks to the design changes described extensively in the last sections, the consolidation of the indexing tier on Solr, the scalability of the platform is greatly increased because it is possible to scale out and load balance the indexing tier independently from the content tier. By introducing a load balancer layer in between the Content and the Indexing tier, it is possible to balance requests between multiple physical instances of Alfresco and Solr and vice versa.
The architecture presented in Figure 10 Alfresco 4 Solr Load Balanced architecture shows that this could be useful in two use cases:
1. Alfresco Solr search load balancing This is the most obvious use case for scalability purposes. Search requests are directed through Alfresco to a pool of Solr instances, each of which
26http://wiki.alfresco.com/wiki/Alfresco_And_SOLR
24
contains a full copy of the index and is able to service requests in a purely stateless fashion.
2. Solr Alfresco index tracking balancingIn the other direction, Solr nodes could use a load balancer to redirect their index tracking requests to one or multiple dedicated/shared Alfresco nodes. This could be useful in case of large indexing load, due to a heavy concurrent write/update scenario.
Figure 10 Alfresco 4 Solr Load Balanced architecture
NOTE: An evolution of the scenario in use case 2 was actually used in the benchmarks, for which each Solr node had a dedicated Alfresco instance installed in the same application server (in process) to speed up and evenly distribute the index tracking load. This has proven to be a very beneficial deployment model, as we will discuss in the next sections.
The scalability of the indexing tier is at the base of most considerations around benchmark results in the following chapter. However, for completeness, the following sections provide a brief overview of the other scalability features of Alfresco 4. Despite not being strictly covered in the benchmarks and, therefore, not the primary objective of this document, it is important to understand these features to know the pillars on which to build scalable and reliable Alfresco architectures.
Clustered file system interfacesAs discussed in the section Non-HTTP based Virtual File systems not available in HAmode, Alfresco 3 did not allow clustering of file system interfaces like CIFS and FTP.
Alfresco 4 solved this shortcoming with a strong refactoring of the file systems-related code and the introduction of a new caching mechanism (Hazelcast27). It is
27 http://www.hazelcast.com/
25
now supported28 to run CIFS and FTP in clustered mode balancing requests to multiple Alfresco nodes.
This allows the deployment of Alfresco solutions that are scalable in a transparent way regarding the use of the UI. It allows a more general, UI-independent, performance requirements definition, which is based on the type of operations performed on the platform.
Alfresco Transformation ServerThe Alfresco Transformation Server29 is a component available as of Alfresco 3.4. It was integrated with Alfresco to provide increased fidelity, reliability, and potential scalability of the transformation tier. This Windows-based (Microsoft Office) application accepts requests for content transformations from the Alfresco content tier. If your scenario is strongly oriented towards document transformations (especially Microsoft Office documents) you should consider this deployment option. Please contact Alfresco Support or your local Solution Engineer for further information around deployment and pricing of this component.
A note on the Alfresco Cloud serviceIn the last section, we discussed how Alfresco 4 Enterprise version delivers a revolutionary approach to ECM scalability for on-premise enterprise deployments. On a parallel track, Alfresco has recently launched a cloud-hosted service at http://my.alfresco.com.
The Alfresco Cloud service vision is to support consumer-oriented and Enterprise collaboration internally and across the firewall in a controlled and harmonized fashion by providing extreme scalability and a hybrid cloud model (with synchronization between private and public cloud, among other features). In other words, the Alfresco Cloud supports and services the quickly growing complex interconnected network of private, public, mobile, and social content that we identify as cloud-connected content. Figure 11 Cloud Connected Content roadmap presents a target roadmap for the cloud service and Figure 12 Alfresco Cloud Connected Content as a service shows the wide potential of cloud-connected content applications that could be built against the Alfresco cloud service in a pure content as a service style.
Figure 11 Cloud Connected Content roadmap
28 As of version ??? . Please check with Alfresco Support for exact support boundaries29http://docs.alfresco.com/4.0/topic/com.alfresco.enterprise.doc/concepts/ transerv-intro.html
26
Although this is an exciting new horizon in the deployment of ubiquitous, super-scale Alfresco solutions on both the virtually infinite cloud resources and space-based architectures30, the cloud initiative is also, to a certain extent, driving the Alfresco roadmap with interesting features that will be ported back into Alfresco Enterprise, especially to support large on-premise deployments.
Figure 12 Alfresco Cloud Connected Content as a service
There are two main product evolutions currently under development for the cloud services that are of specific interest to scalability. These could make it into Alfresco Enterprise in later releases: Transparent Multi-Tenancy and Index Sharding.
Transparent multi-tenancyAlfresco multi-tenancy has been strongly revamped and improved for the Cloud service in order to allow, among others, the following features:
Cross-user authentication between different tenants (or communities, as they are called in the cloud service)
Import/export/location independence of a specific tenant
These improvements would open new frontiers in terms of repository partitioning and geographical distribution, as well as for private cloud deployment scalability.
Index shardingCurrently, in Alfresco 4 each Solr instance deployed in load balancing mode has to keep a full copy of the indexes. Scaling out the Solr tier improves load balancing and, therefore, overall system stability, but cannot per se improve performance of the single search, which is executed anyway against a potentially large full index, present in each Solr node.
One solution to this issue would be to separate the index into separate Solr shards: different subsets of the index deployed on separate Solr cores (potentially in different physical machines), which could then execute a user search in parallel with a clear performance boost due to parallelization.
30 http://en.wikipedia.org/wiki/Space_based_architecture
27
Even though Apache Solr supports a distributed search model31, Solr index sharding is currently NOT supported in Alfresco
Enterprise.
Requirements for largely scalable and high performance searches are clear and Alfresco is working in this area to fully support an index sharding solution.
31 http://wiki.apache.org/solr/DistributedSearch
28
The Alfresco 4 BenchmarksThis chapter introduces the benchmark projects underway and completed by Alfresco on the Alfresco 4 release. Amongst the different benchmark practices, we will mostly focus on scalability benchmarks and Solr versus Lucene comparative benchmarks, providing details about the lab architecture used for running the benchmarks, technologies used, and implemented test scenario. Finally, we will present the raw results for the most relevant benchmarks as input for the scalability analysis presented in the next chapter.
Benchmark typesA full definition of the different potential benchmark types and practices is out of the scope of this document. To fully explain which benchmarks have been run and their characteristics/limitations, we will provide a brief ECM and Alfresco oriented classification of the potential benchmarks. For every benchmark type, an indication will be given on whether they have been run, planned, or are out of the scope of the Alfresco benchmark project.
For the purpose of this document, there are three benchmark types:
1. Scalability Benchmarks2. Comparative Benchmarks3. Tuning/Optimization benchmarks
Scalability BenchmarksThe main objectives of a scalability benchmark are:
Study the evolution of system performance and resource usage by testing against different points in the scalability space, defined by typically growing values across one or more scalability dimensions
Prove the stability and provide an idea of response times when using Alfresco on a very high-end large scale deployment scenario, especially if scalability points are sparse (that is, growth on selected dimensions is massive)
The game rules for a scalability benchmark typically involve:
A fixed benchmark scenario to be run against the different data points
A fixed configuration/tuning of Alfresco and the underlying components to compare performance on like-for-like conditions between different test runs
A discrete number of scalability points to run the test against and record measurements and metrics around performance and resource usage
The following limitations apply to this type of benchmark:
System configuration and tuning is defined a priori and on a best effort basis, and it is not changed per data point. This means that results for each scalability point might be sub-optimal. This is not an issue in the scope of these benchmarks as the main target is to study performance on a statistic basis between the different data points.
Scalability benchmarks were the initial focus of this Alfresco benchmark project and are, therefore, the main subject of this
document.
29
See the section The Alfresco 4 Scalability benchmark for a detailed description of the scalability benchmark we ran; see the chapter Error: Reference source not found for a full analysis of the results of this benchmark.
Comparative BenchmarksAlthough there are number of potential combinations in a comparative benchmark, typically this type of benchmark has the common objective of comparing platform performance in different conditions, normally using different software, infrastructure components, architecture, or deployment configuration.
The game rules that apply for a comparative benchmark involve:
A fixed benchmark scenario to be run against the different configurations
A variable configuration/tuning of Alfresco and the underlying components to compare performance in different conditions between different test runs
One or a limited number of meaningful scalability points to run the test against, as well as record measurements and metrics around performance and resources usage in the different configurations to compare
Comparative benchmarks were not the main focus of this Alfresco benchmark project. In order to reinforce the considerations from the previous chapter around index tier scalability, we ran and
will present relevant results of an index tier comparative benchmark.
In the section Error: Reference source not found we present the results of this benchmark, which involved running a well-defined scenario (the same used for scalability benchmarks) against Alfresco 4 using the embedded Lucene and the Solr subsystem respectively.
System boundary discovery benchmarksThe main objectives of a system boundaries discovery benchmark are:
Discover the physical scalability limits of a well-defined, optimized Alfresco deployment on a well-defined scenario on very high scalability points
Identify the optimal Alfresco and underlying component tuning for the tested data points
Extrapolate exact sizing information about the maximum load supportable by a given Alfresco deployment
The game rules that apply for a system boundaries discovery benchmark involve:
A fixed benchmark scenario to be run multiple times against different platform tunings
A variable tuning (often not architecture) of Alfresco and the underlying components to compare with previous test runs and offer better performance/stability against growing data points
Incrementally higher scalability points to run the tests against to increasingly stress the system and test different tuning configurations, up to the point that the system’s physical boundaries are reached and no further tuning is measured to improve performance or stability of the system
30
System boundaries discovery benchmarks were not the main focus of this Alfresco benchmark project, so there is no exact
detail for optimal tuning in each and every scalability point.
At the time of this writing, a benchmark effort is ongoing to discover the system boundaries of Alfresco on a well-defined scenario, similar to the one presented in the following sections. Results of this new benchmark effort might be added to newer versions of this document or published separately. Please check with Alfresco Support for further information.
Alfresco 4 Benchmark detailsIn the following sections we will provide the results for two types of benchmarks: the Alfresco 4 Scalability benchmarks and the Alfresco 4 Lucene vs. Solr Comparative benchmark.
First we will explain the common details between the two benchmarks. They share most of the definition aspects, such as testing scenario, scalability dimensions in which scalability/data points were defined, the lab hardware on which they were run, and some common architectural characteristics.
ScenarioThe scenario that was tested is the Enterprise collaboration scenario, based on the Share user interface: the scenario was implemented using the Apache JMeter32 technology and the script is available in the Alfresco public SVN33.
Running the scenario involves defining a configurable number of Jmeter threads (virtual users) and that will in turn simulate either a read-only or a read/write user session. The percentage of read-only vs. read/write cycles that are executed is also configurable. For full installation and configuration steps of the suite, refer to the suite README34.
The main rationale behind this scenario is in line with the common objective of the benchmark: to demonstrate Alfresco 4 scalability limits, e specially in the areas where earlier versions would present scalability boundaries, such as the index tier. In this sense, the scenario we are presenting is very search intensive and is mostly focused on document-oriented collaboration (that is, in the Document Library and Search functionalities of Alfresco Share).
In Table 1 Benchmark scenario user operations we provide an accurate description of the steps performed by the JMeter virtual users in the read-only and read/write cycles. Table 2 Benchmark search operations breakdown expands the Search test step in the exact different types of search steps that are performed.Table 1 Benchmark scenario user operations
READ ONLY READ/WRITE
Login Login
32 http://jmeter.apache.org/33 http://svn.alfresco.com/repos/alfresco-open-mirror/benchmark/scripts/SHARE/share-0001/34 http://svn.alfresco.com/repos/alfresco-open-mirror/benchmark/scripts/SHARE/share-0001/V4.0.0/readme.doc
31
READ ONLY READ/WRITE
User dashboard User dashboard
Site dashboard Site dashboard
Site document library Site document library
Search Upload File
View Document Details View Document Details
Logout Logout
Based on these tables, we will formalize the exact number and type of operations performed by the virtual users following these assumptions:
Transactions are defined as high-level end-user operations and might involve multiple low level HTTP or Database request (for example, Dashlets or document library asynchronous requests are)
Login and logout are out of the calculation scope
The Read Only flow has 4 read transactions and 6 search transactions, while the Read/Write flow has 4 read transactions and 1 write transaction.
With reference to the flows described previously, the Alfresco benchmarks used an 80% / 20% split between Read Only and Read/Write flows. This means that out of X number of users, 80% of X will execute Read Only flows and 20% will execute Read/Write flows.
The Enterprise collaboration scenario used in the benchmarks implements the following user transactions profile:
READ SEARCH WRITE
48% 48% 4%
32
SEARCH
Metadata search on current site
Full text search on current site
Metadata search on “All Sites”
Full text search on “All Sites”
Metadata search on “Full Repository”
Full text search on “Full Repository”
Table 3 Benchmark scenario read/search/write split
To calculate the actual load on the Alfresco platform, some additional details are fundamental to characterize the scenario and are, therefore, defined as mandatory input values for the JMeter test suite. The following table shows these details.
PARAMETER DESCRIPTION VALUE
think_time Inter-transaction user wait time in millisecondsDefines frequency of operations
10000 (10s)
write_percentage Defines the split between Read Only and Read/Write flows
20
users_count Number of threads/virtual users to simulate concurrent usageThis parameter was used to change the number of concurrent users hitting the platform.
Depending on test run
full_loops_count Full number of loops to be performed by a thread/virtual userThreads will perform write_percentage Read/Write loops and (100% - write_percentage) Read Only loops.
1000
test_full_time Test full execution time in seconds, including ramp upIf full_loops_count is not reached, threads are stopped gracefully (wait for finish the current loop).
7200 (2h)
ramp_up_time Threads/virtual users ramp up time 600 (10m)
Share asynchronous requests
With the scenario details provided, you should by now have an understanding of which tests were run and which Alfresco Share functionalities were tested.
However, one of the assumptions requires some additional information in order for you to understand how effective these tests are and the load they generate on Alfresco. We defined user transactions at a high-level, ignoring in the calculations the multiple/nested asynchronous HTTP requests typically executed via AJAX by the Share UI upon user clicks.
While we define a coarse-grained analysis entity for the sake of simplifying interpretation of the results, we obviously had to replicate and execute such nested requests (or at least the main ones) to actually execute operations on the repository and simulate a realistic load on the Alfresco and Share tiers.
While a full 1-1 mapping of all the Share AJAX requests goes out of the scope of JMeter and load testing (functional HTTP testing frameworks work best in those
33
scenarios), using a JMeter proxy35 we recorded the heaviest/more meaningful nested AJAX requests impacting performance and scalability of the platform and we embedded those in the main user interaction test requests. In terms of measurements, in line with the stated assumptions, we will generate only the main sample (time for main request + nested requests) rather than measuring the time for each single nested request.
SITE DASHBOARD
Site Dashboard main request
Dashlet “Recently Modified Docs”
Dashlet “Site Activities”In the following tables, we detail the nested requests in each test step to give an idea of the actual Share load simulation performed in the benchmark scenario.
USER DASHBOARD
User Dashboard main request
Dashlet “Recently Modified by Me”
SITE DOCUMENT LIBRARY
Site Document Library main request
Site Tags (left sidebar)
Site Document Library data
Site Tree node (left sidebar)
Despite not being a full simulation of each and every Share request, you can see from these tables that we have reconstructed a very large portion of the Share document-oriented collaboration pattern, which serves very well for the scalability/statistic purposes of this document.
Scenario limitations
While applying the results of this benchmark to real use cases, please keep in mind the following limitations:
Not all interfaces are being used: We only tested Share so applying scalability considerations to other UIs should be done with care, or avoided altogether.
Document-oriented collaboration only: In Share we focused on document-oriented collaboration; therefore, scalability considerations on other Share features (for example, forums, wiki, and datalists) should be deferred to a future benchmark or private dedicated load test.
35 http://jmeter.apache.org/usermanual/jmeter_proxy_step_by_step.pdf
34
SEARCH
Search main request
Load results YUI request
Not full AJAX interaction reproduced: Only the main AJAX Share requests were reproduced, so if case scalability issues are experienced with other nested Alfresco functionalities, further investigation might be required.
Scalability dimensions Also common to all Alfresco-run benchmarks was the selection of three scalability dimensions. These scalability dimensions define the space in which to identify scalability points against which to run the Enterprise collaboration scenario:
Number of concurrent users
Content items in the repository
Number of Alfresco repository cluster nodes
As shown in Figure 13 Benchmark scalability dimensions, we identified scalability points (or data points in the standard testing jargon) in a three dimensional space defined by the three dimensions:
Figure 13 Benchmark scalability dimensions
Concurrent users
This is the most intuitive dimension to test any platform against: it involves running the defined scenario with a different, typically increasing, number of virtual threads simulating increasing user concurrency on the system.
As we will see in the specific benchmarks, we scaled this dimension up to 1100 concurrent users, creating 1100 in the repository and 1100 associated Share sites, and then running the scenario with 1100 concurrent virtual threads. We will show how Alfresco performed in this scenario, proving scalability across the concurrent users dimension.
Content in repository
For an ECM platform, and especially looking at the huge scale implementations that are nowadays based on Alfresco, scalability across the content dimension is fundamental. Therefore, we loaded the Alfresco repository at different stages (up to 10 million documents) and ran the scenario against the loaded repository.
35
In order to define a realistic scenario, which would reproduce a potentially real use case while still stressing the underlying Alfresco components, we implemented the following repository loading strategies:
Content was bulk loaded using the Bulk File System Import tool (available in Google Code36 and integrated with Alfresco 4.037) prior to test runs and with no specific bulk loading tuning
Content was loaded in Share private sites to stress the ACL checking process, especially the Solr in-query results ACL checking feature
You can find some indicative results of the bulk loading process (up to 10 million documents) in the section Error: Reference source not found.
Additional details about the content loaded can help explain the exact scenario that was reproduced in the benchmarks:
Bulk loaded content had an average size of 250kb, a common office file size when working in collaboration environments
Loaded content had mixed MIME types, especially involving typical office document MIME types, such as Word (.docx), Excel (.xls), Powerpoint (.pptx), PDF, JPG, and plain-text. This was implemented to simulate the typical Alfresco transformation and indexing components load in Enterprise collaboration environments.
One private site per user was created by the test setup procedures to:
a. Provide an area with appropriate permissions for virtual users to upload documents
b. Fragment the ACL matrix when performing local/global searches
c. Simulate an Enterprise collaboration environment with several Share sites
Alfresco cluster nodes
The third dimension chosen for the scalability analysis was the number of Alfresco repository cluster nodes deployed in the architecture. Traditionally, Alfresco has supported horizontal scalability. This means that testing against an increased number of Alfresco repository nodes is a good way to scale out the platform and, therefore, support larger number of concurrent users or content in the repository.
One fundamental consideration must be covered on this third dimension. As introduced already, Alfresco 4 balances the content management and indexing loads between the repository and the Solr tiers: following the common sense, scaling out the repository tier because of required additional load on the platform typically requires scaling out the Solr tier, if the scenario is search intensive enough.
While scaling out Solr was required during the benchmarks – and we will provide indication of the data points for which it was required – Solr’s comparative scalability was not the main focus of this benchmark. In other words, we will provide information on how many Solr nodes were needed in each data point, but no comparison is
36 http://code.google.com/p/alfresco-bulk-filesystem-import/37 http://docs.alfresco.com/4.0/topic/com.alfresco.enterprise.doc/concepts/Bulk-Import-Tool.html?resultof=%22%62%75%6c%6b%22%20%22%6c%6f%61%64%22%20
36
available between the performance on the same scalability point against different Solr configurations.
In this sense:
While we will provide detailed setup information for platform scalability in a very search-intensive scenario,
repository tier scalability – not index tier scalability – is the 3rd scalability dimension used in the benchmarks.
MetricsThe metrics definition and collection process is also common between the benchmarks presented in this document. In this first version of the benchmarks, we have collected the following metrics for every test run:
Figure 14 Responses over time JMeter plugins report
Availability Metricso Error rate per user operation/transaction (HTML reports)
o Aggregate error rate (HTML reports)
Performance Metrics o Average response times per user operation/transaction (HTML
reports)
o Aggregate average response times (HTML reports)
o Responses over time38 (and other JMeter plugins39 graphs); an example of this is provided in the following resource load metrics
Resources Load Metrics o Per every machine
38 http://code.google.com/p/jmeter-plugins/wiki/ResponseTimesOverTime39 http://code.google.com/p/jmeter-plugins/
37
sar40 utility report throughout
JMX (JConsole) overview of CPU, memory, heap loaded classes, and threads as shown in Figure 15 JConsole load graphs
o DB and file system usage statistics
Figure 15 JConsole load graphs
In addition to these metrics, we also collected the Bulk File system Import tool reports during the bulk-loading phase; we will briefly present them in the section Error: Reference source not found. In the following sections, we will present meaningful excerpts of these results for the scope of studying scalability. For additional information on the results, please contact Alfresco Support.
Benchmark lab infrastructure
Benchmark lab hardware
Up to 15 machines were used during the benchmarks:
1 machine for the Balancing Tier
1 machine for the Web Tier
Up to 4 machines for the Content Tier
Up to 4 machines for the Index Tier
1 machine for Data Tier
1 machine dedicated to host the Shared Content Tier for the cluster nodes
40 http://linux.die.net/man/1/sar
38
Up to 3 Client machines in the Client Tier
Table 4 Benchmark Lab hardware details provides machine specifications.
TIER NODES CPU RAM HDD NOTES
Balancing lb1 Quad Core Xeon L5520 (2.26GHz) 6GB 2*600GB
RAID 1
Software load balancing between Share Alfresco Alfresco Solr
Web st1 Dual Hex Core X5650 2.66 GHz 24GB 4*128GB
RAID 5Users main entry point UI
Contentat1 at2 at3 at4
Dual Hex Core X5650 2.66 GHz 24GB 4*600GB
RAID 5Alfresco repository cluster
Data dt1 Dual Hex Core X5670 2.93 GHz 48GB 4*600GB
RAID 5For DBMS (PostgreSql)
Indexsn1 sn2 sn3 sn4
Dual Hex Core X5670 2.93 GHz 48GB 4*600GB
RAID 5For load balanced Apache Solr
Shared Storage
ms1 Quad Core Xeon L5520 2.26GHz 6GB
6 x 1TB 7k SATA
3.5
Exposing an NFS Share mounted on Alfresco nodes
Clientcld1 cld2 cld3
Dual Quad Core Xeon L5520
2.26GHz12GB
146 GB 15K SAS 3.5 x 2RAID 1
Reproducing virtual usage running JMeter on Windows 7
Table 4 Benchmark Lab hardware details
All machines used were physical41 and the network connection between machines was realized with Gigabit Ethernet technology.
Benchmark lab software
Table 5 Benchmark Lab Software details provides information on the main software components.
TIER MACHINES OS RELEVANT SOFTWARE DETAILS
Balancing Tier RHEL5 Apache Httpd 2.3 mod_proxy and mod_proxy_ajp42 to use Httpd to balance requests
Web Tier RHEL5 Alfresco Share 4.0.0 (792)
Alfresco Tier RHEL5 Alfresco 4.0.0 (792)Apache Tomcat 6.0.29
Using Shared Storage for shared content store via NFS
41 Dell R710: http://www.dell.com/us/enterprise/p/poweredge-r710/pd42 http://httpd.apache.org/docs/2.2/en/mod/mod_proxy_ajp.html
39
TIER MACHINES OS RELEVANT SOFTWARE DETAILS
DBMS Tier RHEL5 PostgreSql 9.0.4 Single node deployment
Index Tier RHEL5 Alfresco Solr 4.0.0 (792) Indexes on local RAID 5 disk
Shared Storage RHEL5 NFSd43 Exposing an NFS Share mounted on Alfresco cluster nodes
Load Test Client Drivers
Windows 7 Jakarta JMeter 2.5.1 Running JMeter on Windows 7
Table 5 Benchmark Lab Software details
The Alfresco 4 Scalability benchmarkThe Alfresco 4 Scalability benchmark was the largest effort of the benchmarking project and the objective of this document. This section provides details of tests run, architecture, and configuration and results, while we leaveError: Reference source not found, detailed considerations, and recommendations to the next chapter.
ScopeThe benchmark should be as realistic as possible and representative of a large-scale Enterprise-wide deployment of Alfresco Share as a collaboration platform.
The objective of the Alfresco 4 Scalability benchmark is to study Alfresco performance and resource load variations while running the Enterprise collaboration Scenario using different scalability points: #content, #users, and #clusterNodes.
A stretch goal of this benchmark was to prove horizontal scalability of the Alfresco repository tier as a means of coping with more content and users.
Out of scopeThe following was not in scope:
Study optimized configuration/tuning for each scalability point: Configuration was defined a priori and retained between test runs to analyze scalability on the selected dimensions.
Index tier horizontal scalability: While Solr had to be scaled up to 4 nodes in certain scalability points, the Index Tier should be considered a black box for the scope of this test.
Scalability points (data points)With respect to the space identified by the Scalability dimensions, each scalability point can be identified by a triplet: (#users, #content, #clusterNodes)
As shown in Table 6 Alfresco 4 Scalability benchmarks scalability points are therefore identified by selecting discrete values in each dimension. The test scenario is then run iteratively against linear combinations of the points identified across each scalability dimension in order to produce comparable results. If additional content is required in the repository, a phase of bulk loading (and Solr index tracking) is performed before the next test run.
43 http://linux.die.net/man/7/nfsd
40
DIMENSION SELECTED POINTS NOTES
Concurrent Users (x) 180 360 720 1080
0 is the baselineScale up to 1080 users required the 3 client load drivers
Content in repository (y) 0 500K 2M 10M
0 is the baselineContent loaded using BFSIT
Alfresco Cluster Nodes (z) 1 2 4
To study effects of repository horizontal scalabilitySolr was scaled as required up to 4 load balanced nodes
Table 6 Alfresco 4 Scalability benchmarks scalability points
The scalability test was not actually run against all the potential scalability points identified by the dimension values above (x*y*z = 4*4*3 = 48 total test runs), while a meaningful subset of those scalability points was identified to run the test against. The selection was done with the following main rationales:
Controlling the benchmark execution complexity
Avoiding redundant data collection
Focus the effort on realistic deployment scenarios
In
Concurrent users / Content in repo we provide an overview of the selected scalability points. This resulted in running the test scenario against 33 different triplets:
CONCURRENT USERS /CONTENT IN REPO
180 360 720 1080
0 docs (baseline) 1 node 2 nodes 4 nodes N/A
41
CONCURRENT USERS /CONTENT IN REPO
180 360 720 1080
500k docs 1 node 1 node 1 node 1 node
2 nodes 2 nodes 2 nodes 2 nodes
4 nodes 4 nodes 4 nodes 4 nodes
2M docs 1 node 1 node 1 node 1 node
2 nodes 2 nodes 2 nodes
4 nodes 4 nodes
10M docs 1 node 1 node 1 node 1 node
2 nodes 2 nodes 2 nodes
4 nodes 4 nodesTable 7 Benchmark Tests overview matrix
A note on Solr usageAlthough this topic is not the main objective of the benchmark, it is important to understand which index tier configuration/architecture was used throughout the benchmark. It is also helpful to know how an Enterprise infrastructure can cope with large-scale, highly search-intensive scenarios like the one implemented in the benchmark.
Based on the flexible load-balanced architecture described in the next sections, Solr nodes were added as required by the test. In parallel with
Concurrent users / Content in repo, we provide an indication of how many Solr nodes were used in each test run:
42
Legend :
1 Solr node
2 Solr nodes
4 Solr nodes
Table 8 Solr nodes usage throughout test executions
We can do a few considerations based on
:1. The vast majority of scalability points were covered with a single Solr node.
2. In both cases, despite being in the same (#users, #content) scalability area, the need for increasing the number of Solr nodes was originated by a step up in the Alfresco repository cluster nodes dimension. This suggests that Alfresco fully acts as a search client for Solr, which means that if the Alfresco
43
CONCURRENT USERS /CONTENT IN REPO
180 360 720 1080
0 docs (baseline) 1 node 2 nodes 4 nodes
500k docs 1 node 1 node 1 node 1 node
2 nodes 2 nodes 2 nodes 2 nodes
4 nodes 4 nodes 4 nodes 4 nodes
2M docs 1 node 1 node 1 node 1 node
2 nodes 2 nodes 2 nodes
4 nodes 4 nodes
10M docs 1 node 1 node 1 node 1 node
2 nodes 2 nodes 2 nodes
4 nodes 4 nodes
repository is maxed up, scaling out Alfresco nodes typically often requires to scale out Solr proportionally.
3. The proportionality factor between Alfresco and Solr scalability is tightly related to the read/search/write/split of the scenario. This is discussed in more detail in the section Error: Reference source not found.
ArchitectureUsing the Benchmark lab infrastructure, the architecture was built on the following assumptions:
Architecture should provide high availability and be based on the Alfresco Simple Repository Clustering model44
Optimized deployment for performance and load balancing, targeting to scale to the most challenging scalability points without changing the deployment scheme
Architecture should be easily reconfigurable and flexible to reconfigurations of the Alfresco cluster and underlying components in order to investigate in the #clusterNodes dimension
Under these assumptions, the logical architecture that was designed involved the following layers and separate lab machines:
Client Tier: includes the Load Test Client drivers machines running the tests
Web Tier: includes the single instance used of Alfresco Share
Load Balancing Tier: leveraging Apache Httpd as a balancer from Share to Alfresco and from Alfresco to Solr
Content Tier: including the 1 to 4 instances of Alfresco repository
Index Tier: including the 1 to 4 instances of Solr servers
Storage Tier: including a dedicated machine for DB and one for NFS shared storage
A high-level logical architecture diagram is provided in Figure 16 Scalability benchmark logical architecture.
44http://wiki.alfresco.com/wiki/ Cluster_Configuration_V2.1.3_and_Later#Simple_repository_clustering
44
Figure 16 Scalability benchmark logical architecture
45