[IC Manage] Workspace Acceleration & Network Storage Reduction

Workspace Acceleration and Storage Reduction:

A Comparison of Methods & Introduction to IC Manage Views

Roger March and Shiv Sikand, IC Manage, Inc.

Digital Assets Growing at Rapid Rate File systems are being stressed by rapidly expanding digital assets. These growing datasets are being driven by the high capacity needs of companies designing multi-function consumer electronics and biomedical devices and software companies developing video games and enterprise systems.

It is a world markedly different from traditional software development. In terms of scale, it is not uncommon to see Perforce depots encompassing many terabytes in size composed of hundreds of millions of files. A single client spec may define a workspace of many gigabytes and tens of thousands of files. An organization may have thousands of users spread across the globe.

Content is stored and modified on local drives by individual users working on workstations or laptops; however, regressions, analysis, and build farms that utilize this content will typically run on some form of networked file storage to allow large sets of machines to operate on the data seamlessly.

A 2012 independent study of 524 respondents pointed out the top challenges associated with network file systems, which showed the clear impact of these expanding digital assets. The top problems were slow remote and/or local workspace syncs and storage issues such as storage capacity not keeping up with expanding data volumes and the high cost of adding network attached storage (NAS) devices. Additionally, the respondents cited application slowdown, with network storage bottlenecks increasing tool iteration time by 30 percent.

File System Adaptation for Expanding Digital Assets An optimally managed file system utilizing Perforce SCM will have key factors, even in the face of continuously expanding digital assets.

Client workspace syncs should be rapid to encourage continuous integration and check-ins. Network disk space usage should be optimally used such that network storage capacity doesn’t interfere with daily execution. Directly related to this is the need to minimize network bandwidth to avoid creating bottlenecks that slow file access and must constantly be managed. The system must be designed to scale with a growing number of users and expanding digital assets. Further, such any infrastructure enhancements should be compatible with existing storage technologies and be storage vendor agnostic to allow organizations to adapt to rapidly changing infrastructures. The enhanced system must be reliable in the event of failures or errors.

Individual users must be able to maintain workspace file control and stability, without cumbersome, error-prone manual management of network cache storage and different versions. And finally, development teams should be able to build workspaces anywhere and on demand, avoiding problems and costs associated with disk space allocation.

High Demand on Network Storage Beyond sheer size, the character of the IT environment has shifted over the years. It used to be that nearly all Perforce workspaces resided on a user’s workstation. Today’s workspaces tend to reside on NAS devices where the workspaces are network mounted to the machines accessing the data. There are a number of advantages to this arrangement. One is that it allows the user to be easily relocated to another machine without having to physically move the workspace. For environments where virtualization is the norm, local workspace storage is not utilized, and all files are stored on some form of networked storage. Some organizations also feel that restricting the data to the storage appliance gives them better control for security. The NAS device is usually hardware-optimized to its tasks; it provides much greater performance and reliability than is available from a commodity server running a network file system (NFS) such as CIFS.

Unfortunately, the cost of storage on NAS is dramatically higher than commodity storage. Even with specialized hardware, use tends to expand to saturate the NAS device. The solution of adding NAS units often makes cost a barrier to scaling.

Figure 1: Current baseline NFSes over-rely on network storage

As shown in Figure 1, many current baseline NFSes over-rely on network storage. They duplicate file storage for every user workspace, utilizing precious Tier 1 storage space. The deduplication optimization performed to address this issue is very inefficient due to continually and rapidly changing datasets. Further, because local caching is underutilized due to the high cost of solid-state storage, user file access often requires full network bandwidth, creating bottlenecks that degrade file access time.

Network Bandwidth Bottlenecks Many users may be working on a particular project in different or changing roles. This makes it impractical or undesirable to continually tune the workspaces for the current role.

To address this, with traditional approaches, the workspace client is configured to cover the entire project, and the user must download a complete set of project files for every workspace. The clear drawback is that workspace syncs for large datasets can be extremely slow due to network bandwidth bottlenecks, lock contention on the Perforce server due to long running sync operations, and limited I/O bandwidth on the target storage device.

The performance impact is most severe for remote sites because the bandwidth available to the Perforce server at the remote site is typically a fraction of that available on the local network. If a large set of changes is submitted on the local network, doing simple syncs at the remote site can take a long time; the entire change set must make its way down the wire—even if parts of the change set have nothing to do with the area the user is currently working in. The Perforce proxy architecture can mitigate this for multiple accesses to the same file by multiple users. However, for wide and deep workspaces, the proxy consistency model results in large numbers of queries to the master site, which tends to be latency limited on long haul networks.

Dynamic Virtual Workspaces: A Novel Approach to Workspace Acceleration With a dynamic virtual workspaces approach, the user is presented with a fully populated workspace in near real time, through advanced virtualization techniques. The individual workspaces are displayed rapidly irrespective of client complexity, the size of the view mappings, or the total number of workspaces, which can be a limitation in some clone technologies.

As the files are accessed by the operating system due to an application or user request, they are delivered on demand to fast local storage caches through a server, replica, or proxy. After the preliminary cold cache startup cost, most applications request files sequentially and the on-demand nature of the access results in a major acceleration of workspace syncs compared with the traditional approach of populating the entire workspace before task execution.

One element of the dynamic virtual workspace system design is that the local cache can be any block device. The simplest form would be the unused disk space on a bare metal host. Typically, only a small fraction of this disk space is used by the operating system and the rest is unused. Another choice is the use of volatile memory instead of persistent storage such as tmpfs. For environments requiring even more performance, local solid-state storage drives can be utilized very cheaply for the cache.

The caches themselves are designed with built-in space reclamation capabilities, allowing the cache to be set up and tuned for each individual workload. The caches are kept under quota by automatically removing files via a least recently used (LRU) algorithm, and quota sizes can be individually controlled, on a per-workspace basis.

Figure 2: Workspace acceleration is a major advantage of the dynamic virtual workspace method

As Figure 2 illustrates, a major advantage of the dynamic virtual workspace method is the workspace acceleration: It achieves near-instant workspace syncs for both local and remote sites. Remote sites should be configured to use a local Perforce replica. Further, individual workspaces can be modified and updated dynamically with independent view mappings to align with advanced branching strategies.

Dynamic Virtual Workspaces Contrasted with Snap and Clone One alternate method to dynamic virtual workspaces is a “snap and clone” approach, where IT sets up a template, takes a snapshot, and creates clones. The clones can then be made available much faster than syncing a multitude of individual workspaces. The snap and clone method does achieve some workspace sync acceleration over traditional NAS volumes; however, the drawback is the way it restricts the mixing and matching of constantly changing file sets, particularly when those changes involve many thousands of files. Further, it requires ongoing scripting and maintenance for client workspace customization and under-utilizes local caching, so remote sites can still face network bottlenecks because of long latencies between the clone workspace and the master content source.

Local Caching Approach Many companies are looking to improve NFS performance, particularly in virtualized environments where boot storms, logout storms, and Monday morning sync storms result in reduced productivity. The main solution used is to add solid-state storage caches, either inline between the client and the filer for a vendor agnostic model or onboard the actual NAS device itself.

However, SCM environments such as Perforce have a different use model compared to unstructured data for which NAS environments are currently optimized. In the SCM model, the repository contains the workspace files, which are then replicated to users as partial projections of the depot. These files are then modified or used to generate a derived state from the source files.

A local caching approach makes optimal utilization of local caching and network storage. Network storage is used for transient storage for edited files only, with the edited files removed once they are checked into Perforce. Modified files checked into Perforce are automatically moved back to the local cache.

Figure 3: A local caching optimization approach

An advantage of this local caching optimization approach, as Figure 3 shows, is that expensive network disk space is freed up instantly, such that network disk space utilization can typically be reduced by a significant amount.

Intelligent File Redirection The intelligent file redirection approach separates reads from writes, storing the reads in local cache (see Figure 4). The modified files (writes) are automatically written to NAS, for safekeeping.

Figure 4: The intelligent file redirection approach

This approach takes advantage of on-board speeds for reads instead of slower network speeds—achieving typically twice the performance of network reads. Intelligent file redirection also ensures that modified files on the NAS device are removed after they are checked into the Perforce server, to instantly free up network disk space. The redirected reads, writes, and removals are all done automatically without the need for manual handling.

Intelligent file redirection has other significant advantages when widely deployed in an enterprise. By eliminating read traffic through redirection to local cache, the filers can be optimized for sequential write performance, increasing throughput for write-intensive tasks and prolonging the useful life of the filers by reducing both network and space utilization.

Real-Time Deduplication Reclaims Space for Check-Ins A supplemental enhancement available with an intelligent file redirect approach is to automatically purge write files checked into Perforce from the write network storage. This instantaneous de-duplication frees up network disk space.

Advanced Content Delivery to Minimize Network Bandwidth, Reduce Errors, and Increase I/O Availability Many organizations have large tool or third-party content libraries and geographically distributed sites. In many cases, this content must be synchronized to all sites to ensure that centrally developed methodologies will work seamlessly at any location. This content tends to have the characteristics of a very large canonical NFS mount point. Normally these directories live on the NAS device consuming large amounts of space and file traffic because every machine needs to access them for their tools and data. A variety of methods are used to synchronize them between sites. These methods include block-based replication provided by most NAS vendors, or file-based replication provided by tools such as rsync.

In many cases, large amounts of precious bandwidth between sites is consumed by this replication even though the specific content that was part of a big push to remote sites may not even be needed because the granularity of the push is too coarse. Additionally, replication creates synchronization boundaries that can be time consuming to resolve for fast-changing datasets. Server farms can also generate excessive I/O load on the filers as a series of jobs are queued and run on a large number of hosts.

A solution to this problem is to use a single read-only workspace instead of the canonical NFS mount point with replication. With this configuration, a single Perforce workspace is constructed as a read-only object. Multiple DVW instances from any number of locations can connect to the workspace sync state (have table). A single sync of the workspace will result in near instantaneous global synchronization of the metadata. The DVW instances can either reside as a single NFS mount point for all hosts in the farm or be configured as individual DVWs on each host with local caching. As a result, the I/O is highly localized with on-demand granularity, and similar workloads will benefit from warm caches on the execution hosts.

IC Manage Views: Accelerates Workspace Syncs and Slashes Network Storage IC Manage Views works with existing storage technologies to:

• Reduce network storage by four times through local caching techniques and real-time de-duplication.

• Achieve near-instant syncs of fully populated workspaces through dynamic virtual workspace technology.

• Deliver two times faster file access and speed up applications through automated intelligent file redirection (see Figure 5).

Figure 5: IC Manage Views: accelerates workspace syncs and slashes network storage

In addition, IC Manage Views features:

• 100% percent compatibility with existing storage technologies; NAS agnostic. • Scalability: savings increase with number of users and the size of databases. • Flexibility in building workspaces on demand; development teams can build workspaces

anywhere, avoiding problems and costs associated with disk space allocation. • Reliability: handles cache recovery in the event of failures or errors. • Stability: designers maintain workspace file control. No manual management of network

cache storage and different versions. Figure 6 presents some representative IC Manage Views benchmark results. In this example, the workspace was 1 GB and 10K files.

Figure 6: IC Manage Views benchmark results for 1GB workspace and 10k files

IC Manage Views can dramatically lower costs associated with storage and increase productivity through accelerated delivery of workspace content. It achieves these advantages through dynamic virtual workspace, local caching, instant de-duplication, and intelligent file redirection technologies.

About IC Manage IC Manage provides IC Manage Views, which accelerates Perforce client workspace syncs and drastically reduces the amount of storage needed to keep up with expanding software data. IC Manage Views gives software teams the flexibility to build workspaces anywhere, avoiding problems and costs associated with disk space allocation. IC Manage is headquartered at 2105 South Bascom Ave., Suite 120, Campbell, CA. For more information visit us at www.icmanage.com.

Shiv Sikand, Vice President of Engineering, IC Manage, Inc. Shiv founded IC Manage in 2003 and has been instrumental in the company achieving technology leadership in high-performance design and IP management solutions. Prior to IC Manage, Shiv was at Matrix Semiconductor, where he worked on the world’s first 3D memory chips.

Shiv also worked at MIPS, where he led the development of the MIPS Circuit Checker (MCC). While working on the MIPS processor families at SGI, Shiv created and deployed cdsp4, the Cadence-Perforce integration, which he later open sourced. Cdsp4 provided the inspiration and architectural testing ground for IC Manage. Shiv received his BSc and MSc degrees in physics and electrical engineering from the University of Manchester Institute of Science and Technology.

Roger March, Chief Technology Officer, IC Manage, Inc. Prior to IC Manage, Roger worked at Matrix Semiconductor. He designed and helped build most of its CAD infrastructure. He worked mainly on the physical and process side and also on optimizing layout for manufacturability.

Roger began his career as a circuit and logic designer in the early days of microprocessors and dynamic RAMs. While working as a designer at Data General and Zilog on long-forgotten products like the microNova, microEclipse, and Z80K, he found himself drawn to the CAD side to provide design tools that the marketplace was not yet offering. He wrote circuit, logic, and fault simulators that were used to build and verify microprocessors of the day.

Roger then joined MIPS as its first CAD engineer. Here he built the infrastructure with a combination of vendor and internal tools. This system was used to build all the MIPS microprocessors as well as most of their system designs. He wrote module generators for chip physical design, placement and allocation tools for board designs, test pattern generators and coverage tools for verification, and yet another logic simulator—found to be 30 times faster than Verilog-XL in benchmarks. After MIPS was acquired by Silicon Graphics, Roger became a Principal Engineer at the company. Working in the microprocessor division, he worked on problems in logic verification and timing closure. This dragged him more deeply into the realm of design databases. He wrote several tools to help analyze and manipulate physical, logical, and

parasitic extraction datasets. This included work in fault simulation, static timing verification, formal verification, physical floor planning, and physical timing optimization.

[IC Manage] Workspace Acceleration & Network Storage Reduction

Technology

Transcript of [IC Manage] Workspace Acceleration & Network Storage Reduction