Post on 30-Apr-2020
Technical Report
Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2 Bikash Roy Choudhury, NetApp
Harsh Vardhan, Cadence Design Systems
Rajnish Gupta, Cadence Design Systems
February 2014 | TR-4270
Abstract
Cell library characterization is an integral part of the chip design and manufacturing
process. Standard cell libraries are developed and distributed either by a library
vendor who uses the process and device characteristics and electrical information
provided by the foundries, or by the foundries themselves. Large fabless
semiconductor companies may choose to design and characterize their own libraries
for business and competitive reasons as well. The characterization step provides
timing, power, and performance behavior across a wide range of voltage, temperature,
and process variations that make up the electrical behavior and specifications of the
device that is being designed.
A foundry manufactures only the physical system on a chip and does not get involved
in any end-use application design unless it is creating a functional block that is
intended for reuse and licensing by its customers who will use them as a preverified
subcircuit. During the design process, the layout is checked with the schematics of the
chip, which must be modeled according to the standard requirements of all the
common foundries.
The Cadence® Virtuoso® Liberate™ tool is used to characterize and I/O model the
standard cell libraries during the design phase. A lot of distributed computing power is
required to simulate and process the input data into a single database file, from a
shared storage infrastructure, which is used later in the place-and-route phase.
The NetApp® scale-out architecture with the NetApp clustered Data ONTAP® 8.2
operating system provides all the storage efficiencies for the Virtuoso Liberate files in
the characterization process. It also improves job completion times by up to 15% with
NFSv4.1/pNFS and with adequate storage optimization and best practices. This report
highlights some performance tuning on the NetApp storage and Linux® clients in the
compute farm to improve the efficiency of the application license costs and time to
market. These optimizations do not change the cell characterization workflow, and
they have very little impact on the existing infrastructure.
2 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
TABLE OF CONTENTS
1 Introduction ............................................................................................................................................ 4
2 Target Audience and Objectives .......................................................................................................... 5
3 Virtuoso Liberate Cell Library Characterization ................................................................................. 5
3.1 Virtuoso Liberate Cell Library Characterization Workflow—Arc Flow ............................................................. 5
3.2 Virtuoso Liberate Tool in a Clustered Data ONTAP 8.2 Environment ............................................................ 6
4 Clustered Data ONTAP 8.2 for Cell Library Characterization Workloads ........................................ 6
4.1 Performance ................................................................................................................................................... 6
4.2 High Availability and Reliability ....................................................................................................................... 7
4.3 Capacity ......................................................................................................................................................... 8
4.4 Storage Efficiency .......................................................................................................................................... 8
4.5 Agile Infrastructure ......................................................................................................................................... 8
4.6 Data Protection ............................................................................................................................................... 9
4.7 Manageability ................................................................................................................................................. 9
4.8 Cost ................................................................................................................................................................ 9
5 Cadence Virtuoso Liberate Validation with NetApp Clustered Data ONTAP 8.2 .......................... 10
5.1 Performance Validation Objectives .............................................................................................................. 10
5.2 Virtuoso Liberate Test Lab Details ............................................................................................................... 10
5.3 Test Plan ...................................................................................................................................................... 10
5.4 Virtuoso Liberate Lab Test Results .............................................................................................................. 11
5.5 Virtuoso Liberate Performance Test Observations ....................................................................................... 11
6 Best Practices for Virtuoso Liberate Tool with Clustered Data ONTAP 8.2 .................................. 12
6.1 Storage Cluster Node Architecture ............................................................................................................... 12
6.2 Storage Cluster Node Sizing and Optimization ............................................................................................ 13
6.3 File-System Optimization.............................................................................................................................. 14
6.4 Storage Network Optimization ...................................................................................................................... 15
6.5 Flash Cache Optimization ............................................................................................................................ 16
6.6 Network File System (NFSv3) Optimization ................................................................................................. 17
6.7 Parallel Network File System (pNFS) ........................................................................................................... 18
7 Other Features in Clustered Data ONTAP 8.2 for Virtuoso Liberate Workloads .......................... 24
7.1 SnapVault ..................................................................................................................................................... 24
7.2 Storage QoS ................................................................................................................................................. 25
7.3 Nondisruptive Operation (NDO) ................................................................................................................... 27
8 Compute Farm Optimization .............................................................................................................. 27
8.1 RHEL 6.5 Clients in the Compute Farm ....................................................................................................... 28
8.2 Best Practices for Compute Nodes .............................................................................................................. 28
3 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
9 Summary .............................................................................................................................................. 30
10 Conclusion ........................................................................................................................................... 31
LIST OF FIGURES
Figure 1) Cell library characterization overview. ............................................................................................................ 5
Figure 2) Cell library characterization workflow. ............................................................................................................ 5
Figure 3) Cell library characterization with clustered Data ONTAP in a data center. ..................................................... 6
Figure 4) Workload balancing for cell library characterization. ....................................................................................... 7
Figure 5) Virtuoso Liberate performance test results. .................................................................................................. 11
Figure 6) Clustered Data ONTAP logical stack layout. ................................................................................................ 13
Figure 7) pNFS implementation. .................................................................................................................................. 19
4 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
1 Introduction
Standard cells consist of preconfigured and laid-out functional block elements that provide a particular
operation (such as the output of a two-input AND function being true only if both of the two inputs it
contains are true). These cells can be simple or complex digital gates, flip-flops, and I/O cells. They are
named standard cells because they typically must conform to a standard height (although their width can
vary) to simplify the task of placing them on the chip, connecting them automatically by the place and
route software, and to provide the most efficient packing density. These cells are designed for a specific
function and usually come in libraries. Each of these libraries consists of a few hundred to a few thousand
different cells.
Standard cell libraries are required for the following reasons:
Design complexity keeps increasing because of area and yield optimization. Standardizing cell libraries by foundries reduces manufacturing costs and enhances yields.
Complete custom chip design is no longer possible. Standard cell libraries expedite the chip design process through logical cell layout models.
Standardization allows efficient automated software routing and connectivity among these cells for maximum packing density and improved performance. The optimized wire lengths prevent performance slowdown caused by poor connections or routing.
The characterization step provides timing, power, and performance behavior across a wide range of
voltage, temperature, and process variations that make up the electrical behavior and specifications of
the device being designed. Standard cell library characterization is an important component in the chip
design cycle for the following reasons:
Functionality behavior; electrical characteristics extraction; and simulation of gates, flip-flops, and so on, in a chip must be fast and simple.
The variability of performance across a wide range of processes, voltage, and temperatures must be analyzed and accounted for to guarantee performance specifications.
Accurate timing and power analysis is essential to guarantee device behavior under different operating conditions.
The Cadence Virtuoso Liberate tool is designed to quickly generate the timing, noise, and power profile of
the individual gates, which allows analysis of timing and power behavior at the chip level. This requires a
lot of input files to simulate the library layout models and generates a lot of output files, depending on the
number of cell libraries characterized for various functions. The Virtuoso Liberate tool uses a network of
compute nodes to access files of various sizes from a shared storage infrastructure such as NetApp over
a file-based protocol such as Network File System (NFS).
The NetApp clustered file system in Data ONTAP 8.2 offers scale-up and scale-out storage architectures
that provide storage for the large and complex cell library characterizations that are required by chip
designers. It addresses the growing storage needs of these customers while efficiently handling the
different workloads that are generated during the entire cell library characterization workflow. NetApp
clustered Data ONTAP 8.2 provides the following key drivers to shorten the chip design process with a
faster time to market and improved return on investment (ROI):
Performance
High availability and reliability
Capacity
Storage efficiency
Agile infrastructure
Data protection
Manageability
Low cost
5 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
2 Target Audience and Objectives
Virtuoso Liberate tool is one of the most popular cell characterization tools used by chip manufacturers.
This technical paper is intended for cell library engineers, storage administrators, and architects. This
paper provides the following information:
Best practices and sizing required with clustered Data ONTAP 8.2 to support the performance, capacity, availability, and manageability requirements for Virtuoso Liberate workloads
How to use NetApp’s scale-out clustered file system solution for the Virtuoso Liberate application and for validating the performance improvements during the cell library characterization process
3 Virtuoso Liberate Cell Library Characterization
Virtuoso Liberate cell library characterization tools are critical in the chip design process because they
provide library views for signal integrity, timing, and power analysis for cell layout on a chip. During this
characterization process, Simulation Program with Integrated Circuit Emphasis (SPICE) netlists that
contain connectivity information about a cell are provided as input to each of the cells in the library, and
the output load is validated and analyzed for signal integrity (noise), timing, and power. Figure 1 provides
an overview of the input and output parameters during a cell characterization process.
Figure 1) Cell library characterization overview.
3.1 Virtuoso Liberate Cell Library Characterization Workflow—Arc Flow
An arc is a path from the input to the output pin of a cell. Many cell libraries are modeled with noise,
timing, and power arcs to test specific functions in a chip design. The Virtuoso Liberate tool creates,
characterizes, and validates the noise, timing, and leakage arcs in the cell libraries. This tool works
primarily in three different phases to achieve this task. Figure 2 illustrates the various phases of the cell
characterization processes.
Figure 2) Cell library characterization workflow.
Pre-analysis phase: The master node first starts to read all the SPICE netlists and other sources of input. It then spawns the slave nodes in the compute farm and starts to submit the jobs. The slaves read the data from the volume located in the NetApp storage. During this phase, additional data structures are generated, and the results are reported to the master node.
Analysis phase: In this phase, the slave nodes start to read all the .spi, .sp, and .eldo input files
from the NetApp storage. Every slave node in the compute farm performs circuit simulation on the input files. This simulation is done locally on each of the compute nodes. During the circuit simulation, a lot of temporary files and a few persistent files are generated. All these files are written back onto the NetApp storage.
6 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Assembly phase: In the final phase, all the data created for a cell by the slaves is read from the
NetApp storage by one of the slave nodes. A single new database file (.ldb) in the Liberty format is
created and written to the NetApp storage. Finally, all the other files on the NetApp storage are
deleted. The .ldb file is used for timing analysis throughout the phases of the chip design.
3.2 Virtuoso Liberate Tool in a Clustered Data ONTAP 8.2 Environment
It is not uncommon for tools to run on a high amount of compute nodes—in the magnitude of thousands
of cores—with job schedulers such as Load Sharing Facility (LSF) or Sun Grid Engine (SGE) in cell
library characterization environments. Apart from all the optimization that happens at the Virtuoso
Liberate application layer, it is imperative to also optimize and tune the compute nodes, the network, and
the storage layers to complement the faster job completion times. Depending on the size and type of
standard and custom cells, a large number of files is generated that needs storage along with efficiency
and performance. Figure 3 provides a typical storage integration with the different cell library
characterization workflows in a data center.
Figure 3) Cell library characterization with clustered Data ONTAP in a data center.
4 Clustered Data ONTAP 8.2 for Cell Library Characterization
Workloads
NetApp clustered Data ONTAP provides advanced technologies for software-defined storage that
abstracts data from the underlying hardware by virtualizing the storage infrastructure with storage virtual
machines (SVMs) to allow an efficient, scalable, nondisruptive environment. Some of these virtualization
capabilities may be similar to past NetApp vFiler® unit functionality. Others go beyond anything else
available today. Clustered Data ONTAP is built on the same trusted hardware that NetApp has been
selling for years. We bring together the different hardware platforms, connect them, and give them the
intelligence to communicate with each other in a clustered environment. The following sections detail the
key benefits that clustered Data ONTAP provides for Virtuoso Liberate workloads.
4.1 Performance
Cell or circuit design environments mostly use NFS to mount volumes from storage on compute nodes.
With NFS, scaling the number of nodes in the compute farm is very easy. With clustered Data ONTAP,
however, the storage can also scale seamlessly to provide the enhanced I/O operations per second
(IOPS), bandwidth, performance, and efficiency that are required by different chip design tools.
7 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
The following are strong requirements in chip design production scenarios to provide top-notch
performance:
Larger memory footprint
Greater number of cores for concurrent processing
Higher capacity limits
Users may require 1,000,000 IOPS from multiple volumes on the storage for different standard and
custom cell library characterization projects. In Data ONTAP operating in 7-Mode, this IOPS requirement
is constrained, with symlinks and volumes limited to a single controller or a high-availability (HA) pair.
However, with clustered Data ONTAP, the symlinks are replaced by cluster namespace junctions that can
have all the volumes that are part of a single project spread out on the different nodes in the cluster.
Every node in the cluster contributes to the IOPS requirement for that project.
Figure 4) Workload balancing for cell library characterization.
Figure 4 illustrates how the IOPS requirement is spread across different controllers. “Proj11” has six
volumes that are spread out on four FAS nodes in a production cluster. Each node is capable of doing
more than 250,000 IOPS from the cache. The 1,000,000-IOPS requirement for Proj11 can be achieved
by spreading the flexible volumes in the cluster namespace within an SVM. These volumes can grow and
shrink in size, and also can be moved seamlessly, without disrupting the application, to any cluster node
that is capable of providing the desired performance.
The PCI Express (PCIe)-based NetApp Flash Cache™ intelligent caching in each controller in the cluster
setup continues to boost metadata and random and sequential read performance for electronic design
automation (EDA) workloads.
4.2 High Availability and Reliability
With scale-out architectures, it is very important to have volumes that are highly available and accessible
at all times by the cell library characterization applications. Clustered Data ONTAP 8.2 provides a high
level of availability at the following levels for all cell design workloads.
Storage controller
Network
NFS protocol
The cluster storage can be set up to fail over to its surviving partner node in the HA pair, to another
network port in the cluster space, and to NFS access through a different node in the cluster in case the
NFS clients cannot reach the desired volumes. A cell design scenario typically consists of a single large
aggregate on each controller.
NetApp RAID-DP® technology provides more data resiliency against single- or double-disk failures.
Nondisruptive upgrades (NDUs) for clustered Data ONTAP versions and disk shelf firmware provide
nondisruptive operations to the chip design application. This allows clustered Data ONTAP to provide
five-nines (99.999%) reliability.
8 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
4.3 Capacity
Clustered Data ONTAP 8.2 supports larger aggregates and flexible volumes for various hardware
platforms. The number of supported flexible volumes can be higher on a single FAS controller for high-
end platforms. For further details, refer to the Clustered Data ONTAP 8.2 Release Notes at
https://library.netapp.com/ecm/ecm_get_file/ECMP1196821 and the System Configuration Guide at
https://hwu.netapp.com/Resources/generatedPDFs/8.2_Clustered_Data_ONTAP-FAS.pdf.
Flexible volumes that host different chip designs can nondisruptively move to an aggregate on a different
controller for capacity load balancing. The flexible volume can move from an aggregate on midrange
platforms to aggregates in high-end platforms to provide a higher capacity limit. This provides more
autonomy for applications and services, and dynamically responds to the shift in workloads.
4.4 Storage Efficiency
Clustered Data ONTAP 8.2 provides almost all the storage efficiencies—including NetApp Snapshot™
copies, thin provisioning, space-efficient cloning, deduplication, and data compression—that Data ONTAP
operating in 7-Mode provides for all EDA tier 1 applications:
Thin provisioning. Thin provisioning makes a huge impact while provisioning storage space for volumes that are part of individual projects. Thick-provisioned volumes are guaranteed to use 100% of the space from the start to the finish of a project even if the project files do not require the entire space. There is very little space left to provision for newer projects; Project X cannot borrow space from Project Y. Therefore, thin provisioning is enabled by default for cell design volumes mounted over NFS.
As the files generated from different projects continue to be created, updated, and deleted, the free
space is managed at the aggregate level. Statistics have proven that at any given point in time, the
actual user data fills up about 30% to 60% of the aggregate space, from the start to the finish of a
project. Almost 33% of the unused aggregate space is available to accommodate any new chip
design projects. The NetApp OnCommand® Workflow Automation (WFA) tool provides alarms that
are triggered when the aggregates are filled to the configurable limit (normally 80%). Administrators
can then move the volumes nondisruptively to an aggregate on a different controller that has less
space utilized.
Space efficiency. Clustered Data ONTAP 8.2 and the NetApp WAFL® (Write Anywhere File Layout)
file system still provide a 4,000-block size. Cell library characterization applications have a combination of random reads and writes along with sequential write workloads for the log files generated during the workflow. This involves many small and large file sizes. Unlike other storage vendors, we do not mirror the small files, thus improving the space efficiency of the storage while storing the design files. Also, deduplication and compression are preserved at the destination when data is moved by NetApp SnapVault
® technology from the primary storage.
4.5 Agile Infrastructure
With clustered Data ONTAP 8.2, volumes and IP addresses are no longer tied up with the physical
hardware. The SVM with the cluster namespace spans multiple controllers in the cluster. Storage can be
tiered with different types of disks, such as SSDs, SAS, or SATA, depending on the service-level offerings
for different chip design workloads. Other infrastructure features include:
Provisioning or scaling out. New SVMs that consists of cell library volumes can be created on the existing hardware for different applications and tools. The existing SVMs can grow seamlessly by new hardware being added to the existing cluster. SVMs can be provisioned on the fly for individual departments, companies, or applications.
Multi-tenancy. The physical clustered nodes can be used by many tenants. SVMs provide a secure logical boundary between tenants. The bottom line is that data constituents such as volumes are decoupled from the hardware plane to provide more agility to the storage infrastructure.
Unified storage. Clustered Data ONTAP 8.2 offers unified storage that natively supports NFS, CIFS, FCP, and iSCSI:
Because EDA workloads are mostly on NFS, different versions of NFS, such as NFSv3 and NFSv4.1/pNFS, can coexist and access the same file system that is exported from the storage.
9 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Storage quality of service (QoS). Clustered Data ONTAP 8.2 introduces storage QoS, in which IOPS and bandwidth limits can be set on files, volumes, and SVMs to isolate test and development and rogue workloads from production. Storage QoS provides the following functionalities:
Enables the consolidation of mixed workloads without affecting the performance of different chip design volumes or files in a multi-tenant environment
Isolates and throttles resource-intensive workloads to deliver consistent performance
Simplifies workload management
Nondisruptive operation (NDO). Chip design volumes and logical interface (LIF) movement within the SVM allow nondisruptive lifecycle operations that are completely transparent to the applications. NDOs can be applicable in the following scenarios:
Unplanned events:
Infrastructure resiliency against hardware and software failures
Planned events:
Capacity and performance load balancing
Software upgrades and hardware technical refreshes
These features make the infrastructure more agile and enable IT-managed data centers to provide IT as
a service.
4.6 Data Protection
Clustered Data ONTAP provides a high level of data protection through file-system-consistent Snapshot
copies, NetApp SnapMirror® technology, and SnapVault. Snapshot copies and SnapVault are the most
commonly used tools for data protection in the cell library characterization space. In clustered Data
ONTAP 8.2, SnapVault performs a logical replication at the volume level that can be done within an SVM,
across SVMs, and across clusters. Because a common use case for SnapVault is often for remote or off-
site backups, the remote sites can have single-node and two-node switchless clusters to help EDA
customers scale with minimal cost and complexity.
4.7 Manageability
Manageability becomes a lot easier with SVMs and cluster namespaces in the clustered scale-out
architecture, compared with managing different islands of storage as used to be the case with Data
ONTAP operating in 7-Mode. Clustered Data ONTAP 8.2 offers a single virtualized pool of all storage. A
single logical pool can be provisioned across many arrays.
In traditional Data ONTAP systems operating in 7-Mode, SnapMirror was used to move volumes for more
capacity, for more compute power, or for archiving purposes. This is no longer the case with clustered
Data ONTAP. Volumes can be moved nondisruptively in the namespace with clustered Data ONTAP. The
clustered storage can be set up and configured to provision storage and set policies for different types of
workloads and nondisruptive operations by using the On Command Unified Manager and Workflow
Automation (WFA) tools.
4.8 Cost
Clustered Data ONTAP 8.2 can provide a virtual pool of storage across different FAS platforms. A four-
node cluster can have a midrange FAS3270 with SAS disks to handle entry-level projects. Later, these
projects can be moved into a high-end FAS6290 with SSD and SAS disks, along with PCIe-based Flash
Cache, for high performance during the mid- to final stages. Finally, with FAS3270 and SATA disks, the
project files can be moved to archiving. During the entire life of the project, the chip design volumes can
move across different tiers of storage that are set up according to price and performance. This is the
unique aspect of SVM in clustered Data ONTAP: The namespace spans different tiers of storage that are
set up with respect to price and service-level objective (SLO) for different phases of the cell design
workloads.
10 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
5 Cadence Virtuoso Liberate Validation with NetApp Clustered Data
ONTAP 8.2
The various chip design houses constantly run cell library characterization on cells or circuits during the
chip design cycle based on standard guidelines developed by the foundries. With the complex chip
requirements, the cell layout is always checked with the schematics on a silicon chip. The complex cell
library characterization process leads to high-performance demands, with low latency and faster job
completion times.
The Virtuoso Liberate application performance was validated at the Cadence lab at San Jose, California.
For this performance validation, NetApp, through its partnership with Cadence, ran the tool on real-
production 28nm standard 385-cell data from a large foundry customer. The test results from the
Cadence lab demonstrated consistent improvement with clustered Data ONTAP 8.2 optimization,
network, and compute nodes.
5.1 Performance Validation Objectives
The primary objectives of the Virtuoso Liberate performance validation with clustered Data ONTAP 8.2
were the following:
Validate that the Virtuoso Liberate job-completion time (wall-clock time) on clustered Data ONTAP 8.2 is on par with or better than the baseline for jobs performed on Data ONTAP 8.1.2 operating in 7-Mode over NFSv3.
Explore new technology such as pNFS for cell library characterization workloads. Validate that the Virtuoso Liberate job-completion time with clustered Data ONTAP 8.2 over pNFS is comparable to or better than that of Data ONTAP 8.1 operating in 7-Mode over NFSv3.
Enable reduction in the job-completion time, allowing users to move on to other cell library characterizations and make better use of their license costs. This improves the ROI of the Virtuoso Liberate tool.
5.2 Virtuoso Liberate Test Lab Details
The Cadence lab consists of three physical compute nodes. Each compute node has 32 cores and
768GB of physical memory, with a 10GB connection to the storage through a 10GB switch plane. The
tests were able to scale up to 96 cores. All of these compute nodes ran on the Red Hat Enterprise Linux®
(RHEL) 6.5 kernel 2.6.32-431.14.1.el6.x86_64. LSF was used as the scheduler to submit the
jobs in the compute farm. Virtuoso Liberate 13.1 ISR1_e86 was used for this test.
On the storage side, we had a four-node FAS6280 cluster with 10GB data ports. Each cluster node had
two 10GB data ports aggregated, to provide 20GB of network bandwidth that can provide up to
2.4GB/sec throughput. Each of the cluster nodes had four shelves of 10,000-RPM SAS disks. Each
cluster node had 512GB of a PCIe-based Flash Cache card. The customer was running on clustered
Data ONTAP 8.2P4.
For the Data ONTAP operating in 7-Mode setup, a single FAS6280 HA pair was used with two 10GB data
ports aggregated, four shelves of 10,000-RPM SAS disks, and a 512GB PCIe-based Flash Cache card
on each controller. This was running on Data ONTAP 8.1.2 operating in 7-Mode.
Both the clustered and the 7-Mode storage had three RAID groups, with 66 disks on each of the
aggregates.
5.3 Test Plan
The cell libraries from a major foundry were used to run the various tests on NetApp storage to generate
the workload with the Virtuoso Liberate tool. The tool ran on 385 cells to put stress on the storage. Four
different sets of tests were performed, including:
Cell library characterization tests over NFSv3 with Data ONTAP 8.1.2 operating in 7-Mode. This is a typical Cadence environment that the sample customer uses in production. This test was considered to be the baseline for the rest of the tests that followed.
11 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Cell library characterization tests over NFSv3 with clustered Data ONTAP 8.2P4 without any optimization.
Cell library characterization tests over NFSv3 with clustered Data ONTAP 8.2P4 fully optimized for Red Hat clients and storage.
Cell library characterization tests over NFSv4.1/pNFS with delegations on clustered Data ONTAP 8.2 fully optimized.
5.4 Virtuoso Liberate Lab Test Results
The tests were performed in isolation from the standard production environment of Cadence. The
Virtuoso Liberate tests were performed with 23 slaves, each using four cores. A standard cell 28nm
library with 385 cells was used for the testing.
Certain procedures and conditions were observed after every test cycle:
While host-side and server (storage)-side caching is pretty common in any production EDA design environment, these tests were performed with no caching on the host and the storage. After every test run, the storage buffer cache was flushed, and every test was run on a different directory to avoid any host-side caching.
Cadence’s production environment already has a 20GbE network. Customers usually do not have 20GbE network connections in their environment. The baseline performance numbers were obtained in an already-optimized environment in Cadence’s production setup.
Cadence’s production environment also uses RAID groups with 66 disks or more, which is also part of the recommendation for having an optimized performance. The baseline performance numbers again had the advantage of using such an optimized environment in Cadence’s production setup.
Figure 5) Virtuoso Liberate performance test results.
5.5 Virtuoso Liberate Performance Test Observations
Figure 5 indicates that tests run with Data ONTAP operating in 7-Mode performed better than did tests
run with unoptimized clustered Data ONTAP, considering the fact the network and disk subsystem were
already optimized for better performance in the Cadence environment. However, there are no further
feature and functionality improvements in Data ONTAP operating in 7-Mode. Data ONTAP operating in 7-
Mode does not provide the scale-out and agility features that clustered Data ONTAP provides, as
described in section 4.
Most of the recent and future installed base in EDA and semiconductor customers has clustered Data
ONTAP. Figure 5 illustrates that NFSv4.1/pNFS provides the best results, with up to 15% improvement
over unoptimized clustered Data ONTAP. This is a huge improvement, considering the optimization that it
provides with regard to the license costs associated with the Virtuoso Liberate application. Overall,
12 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
clustered Data ONTAP with NFSv4.1/pNFS provides performance that is comparable to Data ONTAP
operating in 7-Mode. It also offers a lot of enterprise-level features and functionality, including scalability,
storage efficiency and QoS, high reliability, NDO, and data protection.
Red Hat has officially announced that NFSv4.1/pNFS is generally available in the RHEL 6.5 release.
While Cadence’s platform team is validating this new release, the Virtuoso Liberate R&D team is
supporting this release with the latest version of the Virtuoso Liberate tool. All the newer host-side
hardware comes with new device drivers and software, so NetApp highly recommends running the hosts
in the compute farm on the latest version of Linux.
Based on this performance validation, it was observed that RHEL 6.5 is a stable Linux kernel that
provided better performance with NFSv4.1/pNFS than with NFSv3.
6 Best Practices for Virtuoso Liberate Tool with Clustered Data
ONTAP 8.2
The Virtuoso Liberate tool is one of the most common tools used for cell library characterization during
chip design cycles. An increasing number of customers are deploying clustered Data ONTAP 8.2 for
storage, supporting the characterization phase. Scale-out clustered FAS storage has to be properly
architected to handle the Virtuoso Liberate workload. The aggregates and volumes that store the cell
libraries and inputs such as the netlists, and so on, must optimally laid out in the cluster nodes.
The best practices and recommendations in this section provide guidance to optimize clustered Data
ONTAP, the network layer, and the compute nodes for Virtuoso Liberate workloads. It is imperative to
also validate some of the key clustered Data ONTAP 8.2 features and functions to improve the overall
efficiency of the Virtuoso Liberate application.
6.1 Storage Cluster Node Architecture
NetApp highly recommends implementing the right storage platform in a clustered Data ONTAP setup,
along with adequate storage sizing and configuration to accommodate cell library characterization
workloads for standard and custom cells that have different SLOs. If the workload is performance driven
and has the highest SLO, NetApp recommends storage controllers with multiple cores and a large
memory footprint. Faster serial-attached SCSI (SAS) disks should always be used for designs that require
a faster response time.
Choosing the Right Hardware for Virtuoso Liberate Workloads in a Clustered Scale-Out Architecture
The Virtuoso Liberate cluster setup can provide different SLOs for standard and custom cells and other
dependencies that can coexist in the same or different SVMs. The choice of hardware can be different based on
the price-to-capacity (GB) and price-to-performance ratios for various SLOs:
If the Virtuoso Liberate workload requires that performance be at the highest level, NetApp strongly
recommends FAS6290 controllers with a minimum of 900GB second-generation SAS disks and a minimum of
6GB backplane.
If the cluster setup is designed to accommodate library database (.ldb) files for archiving, NetApp
recommends a minimum of FAS3270 controllers with SATA disks.
NetApp recommends having a minimum of 1TB PCIe-based Flash Cache 2 cards on each controller.
A four- or eight-node, or larger, cluster with different types of disks (SSD, SAS, and SATA) can be configured
based on the SLOs for different workloads.
Note: NetApp highly recommends engaging with the appropriate NetApp Sales Account team to evaluate your
business requirements before architecting the cluster scale-out setup in your environment.
NetApp clustered Data ONTAP consists of a data and a network stack as part of the operating system.
Figure 6 shows these logical stacks, which are transparent to the administrator and the users. Each of
these stacks consists of different components that allow the I/O requests to communicate back and forth
13 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
to the clients while accessing the data. The figure illustrates the different components in the network stack
and the data stack.
Figure 6) Clustered Data ONTAP logical stack layout.
Data stack. This consists of WAFL, RAID, data, metadata, the memory manager, and the lock manager. The aggregates, RAID group sizes, and volumes all exist in the data layer. These components are very similar to what was always in Data ONTAP operating in 7-Mode. PCIe-based Flash Cache is also part of the data stack.
Network stack. This consists of a user space and a kernel space. The kernel space includes all the
networking (interfaces, ifgrps, and so on) and the protocols (NFS, CIFS, and so on). All of the
export rule evaluation, NIS, LDAP, and DNS lookup happens in the user space.
Clustered Data ONTAP 8.2 can help improve Virtuoso Liberate performance compared with clustered
Data ONTAP versions earlier than 8.2. Following is a list of improvements offered by version 8.2 that can
make a real difference in performance, on top of all the optimizations that occurred in releases of
clustered Data ONTAP earlier than 8.2.
Why Use Clustered Data ONTAP 8.2 for Virtuoso Liberate Workloads?
Provides high levels of network parallelism
Mitigates the impact of large file deletions
Improves write performance with CP smoothing
Enhances storage QoS for performance efficiency
Delivers SnapVault for data protection
Improves coalescing free space after large deletions:
Contiguous segment cleaning
Offers multiprocessor support for NFSv4.x:
Fast path for local data path
Zero-copy support such as NFSv3
6.2 Storage Cluster Node Sizing and Optimization
After the right hardware is configured and clustered Data ONTAP 8.2 is installed on the cluster setup, the
following sizing efforts must be made to the RAID disk subsystem for optimal performance. All the cluster
nodes are multipathed in a storage failover configuration. In the event of hardware failure, the surviving
partner node takes over the disks from the failing node to provide continuous accessibility to the data
volumes.
14 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
How to configure clustered Data ONTAP for creating aggregates, volumes, and NFS protocol is not within
the scope of this paper. Refer to TR-4067, “Clustered Data ONTAP NFS Best Practice and
Implementation Guide,” for clustered Data ONTAP 8.2 and NFS configuration details.
6.3 File-System Optimization
After the volumes have been created in the SVM, NetApp recommends certain best-practice
configurations on the aggregate and volumes to address the following issues in a Virtuoso Liberate
environment:
Fragmentation after constant writes and deletions to the file system during the assembly phase
Fragmentation of free space for writes to complete a full stripe
The file system can be kept healthy at all times with the help of some maintenance and housekeeping
activities on the storage as it ages and grows in size, including:
Defragmenting the file system. Reallocate is a low-priority process that helps constantly
defragment the file system, and it can run in the background. However, NetApp recommends implementing measures to keep the aggregate utilization under 80%. If the aggregate runs close to 90% capacity, the following considerations apply:
Some amount of free space is required to temporarily move the data blocks to free space and to rewrite to those blocks in full and complete stripes in contiguous locations on the disk, thereby optimizing the reads that follow.
Insufficient space in the aggregate allows reallocate to run in the background, but
defragmentation of the file system is never completed.
An NDO to move the production chip design volume to another controller that is part of the cluster setup for capacity balancing must occur.
New shelves must be added to the original controller to provide more space to the aggregate that is running low on space.
Perform reallocate start -vserver vs1_eda_lib -path /vol/VOL06 -force
true for all the volumes in that aggregate.
Aggregate and RAID Group Sizing Best Practices
Disk spindles help to improve write performance by reducing CPU utilization and the time it takes to look for free space, while writing full stripes of data on the storage controller as the file system ages from constant deletions and insertions. For optimal sizing:
A RAID group should have 28 SAS disks.
A minimum of 6 RAID groups should be present in a single aggregate.
Spread the volumes across all the high-end platform (FAS6290) cluster nodes that are part of a single project that has a high-performance requirement. All of these volumes are connected by the dynamic cluster namespace junctions:
Spreading out the volumes across the high-end platform prevents putting all your eggs (volumes) in one basket (aggregate). In this way, all the project volumes can meet the IOPS and bandwidth requirements from all the controller nodes that these volumes are part of and not just saturate a single controller.
Spreading out the volumes also allows moving the volumes nondisruptively in the cluster namespace for workload balancing to meet the high-performance SLO.
During the verification phase, a lot of transient data is generated that has large writes and reads accompanied by a lot of deletions. Isolating the volume that writes the transient data on a different cluster node from other volumes that are part of the project helps to prevent a single controller from being the bottleneck.
15 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
This forces all the existing volumes to spread out on the new disk spindles that were added to the aggregate. Otherwise, the new writes coming into that aggregate go only to the new disks.
Defragmenting free space. Continuous segment cleaning, which was introduced in clustered Data ONTAP 8.1.1 and further optimized in clustered Data ONTAP 8.2, helps coalesce the deleted blocks in the free pool to use for subsequent writes.
Thin provisioning. The volumes in the cluster namespace can be thin provisioned by disabling
space-guarantee. This provides flexibility to provision space for chip design or different project
volumes that can autogrow in increments of 1GB.
NetApp recommends enabling the following storage options to optimize the entire life of the file system.
File-System Optimization Best Practices
The following settings cannot be put into place from the cluster shell. They can be made only at the CLI mode:
bumblebee::*> vol modify -vserver vs1_eda_lib -volume vol6 -min-readahead
false
(volume modify)
Volume modify successful on volume: VOL06
bumblebee::*> aggr modify -aggregate aggr1_fas6280c_svl09_1 -free-space-
realloc on
bumblebee::*> reallocate start -vserver vs1_eda_lib -path /vol/VOL06 -
space-optimized true -interval 3
bumblebee::*> vol modify -volume VOL06 -read-realloc space-optimized
(volume modify)
Volume modify successful on volume: VOL06
NetApp recommends always setting up an alarm that triggers as soon as the aggregate reaches 80% capacity. The critical chip design volumes that need more space can automatically use WFA or manually be moved to another aggregate on a different controller.
NetApp recommends thin provisioning the volumes. This can be done when the volumes are created, or they can be modified later. It can also be implemented by using OnCommand System Manager 3.0 from a GUI:
bumblebee::*> vol modify -vserver vs1_eda_lib -volume VOL06 -space-
guarantee none
(volume modify)
Volume modify successful on volume: VOL06
Adequate sizing is required for the number of files in each directory and the path name lengths:
Longer path names lead to a higher number of NFS LOOKUP operations.
Default quotas cannot be implemented for users and groups:
Include an explicit quota entry for users and groups.
6.4 Storage Network Optimization
After you create the aggregates and volumes based on the recommended sizes to support the cell library
workload, you must then configure the network. At that time, the cluster, management, and data ports are
all physically connected and configured on all the cluster nodes to the cluster switches. Configuring the
network includes:
Data port aggregation. Before the LIFs and routing tables are configured for each SVM, it is very important to aggregate at least two 10GbE data ports for handling the cell library workloads.
16 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Depending on the number of chip design and tool volumes that each controller has, NetApp recommends aggregating a larger number of data ports than required to achieve the desired SLO.
LIF failover. As mentioned in section 4.5, LIF IP addresses are no longer tied to physical network ports. They are part of the SVM. When LIF IP addresses are created, NetApp recommends configuring a failover path in case the home port goes offline. If a data port failure occurs, the LIF can fail over nondisruptively to another controller. This allows the application to continue accessing the volume even though the LIF moved to a different controller in the SVM.
Storage Network Optimization Best Practices
Aggregate at least two 10GbE data ports on each cluster node that interface with the compute farm:
bumblebee::*> network port ifgrp create -node fas6280c-svl07 -ifgrp e7e -
distr-func ip -mode multimode
bumblebee::*> network port ifgrp add-port -node fas6280c-svl07 -ifgrp e7e
-port e0d
bumblebee::*> network port ifgrp add-port -node fas6280c-svl07 -ifgrp e7e
-port e0f
Use the following option to configure the LIF failover for any LIF configured in the SVM, clusterwide:
bumblebee::*> net int modify -vserver vs1_eda_lib -failover-group
clusterwide -lif vs1_eda_lib_data3 -home-node fas6280c-svl09 -home-port
e9e -address 172.31.22.172 -netmask 255.255.255.0 -routing-group
d172.31.22.0/24
Always follow a ratio of 1 volume to 1 LIF. That means that every volume has its own LIF. If the
volume moves to a different controller, the LIF should move along with it.
6.5 Flash Cache Optimization
A caching tier on the storage supplements the number of I/O requirements for the cell library workload.
Flash Cache enables the read workload: metadata and random and sequential reads. Random data
access is a function of the disks. A higher number of disk spindles helps to generate a greater amount of
read I/O. Storage platforms with PCIe-based Flash Cache improve read performance:
Flash Cache serves additional I/O requests from the flash-based cache, while disk spindles help improve the performance of write-intensive workloads.
All the movement of the data between the base memories (DRAM), Flash Cache, and disks happens transparently to the application.
NetApp recommends enabling flexscale.lopri_blocks in Flash Cache. The Virtuoso Liberate
tool tends to read the data from the storage as soon as it is written. This option allows the I/O requests to be served from the Flash Cache cache instead of from the disks, which improves application performance. This option also allows caching sequential data from the disks.
To cache random read workloads, flexscale.normal_data_blocks should be enabled.
17 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Flash Cache Optimization Best Practices
The following settings cannot be made from the cluster shell. They can be made only at the node shell CLI of each controller. These commands allow you to get from the cluster shell to the node shell on each controller:
bumblebee::*> system node run -node fas6280c-svl09
Type 'exit' or 'Ctrl-D' to return to the CLI
fas6280c-svl09> priv set diag
Warning: These diagnostic commands are for use by NetApp
Personnel only.
fas6280c-svl09*>
Enable Flash Cache:
options flexscale.enable on
Enable caching of metadata and random read data:
options flexscale.normal_data_blocks on
Enable caching of sequential read data:
options flexscale.lopri_blocks on
Type exit in the node shell to get back to the cluster shell:
fas6280c-svl09*> exit
logout
bumblebee::*>
6.6 Network File System (NFSv3) Optimization
Almost all of the cell library characterization workload accesses the file system from the back-end storage
controllers over the Network File System version 3 (NFSv3) protocol:
NFSv3 is a stateless protocol and is geared primarily toward performance-driven workloads such as the Virtuoso Liberate environment with asynchronous writes.
Communication between the NFSv3 client and the storage happens over Remote Procedure Calls.
Red Hat Enterprise Linux (RHEL) 5.x is the most common Linux vendor–supported version that is used by most of the semiconductor companies in Virtuoso Liberate compute farm environments.
NFS runs in the kernel space of the network stack in the clustered Data ONTAP code. Minimal tuning is required for NFS running on the network stack.
As one of the benefits of clustered Data ONTAP 8.2, a fast path for the local data path is available for NFSv3.
With the large number of compute nodes accessing files from a single controller, the TCP receives window size or the receive buffer may quickly become exhausted. The storage will not accept any further TCP windows over the wire until the receive buffer is freed up. NetApp therefore recommends increasing the TCP receive buffer value.
NetApp recommends enabling NFS failover groups to provide another layer of protection at the protocol level.
18 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
NFSv3 Optimization Best Practices
The command force-spinnp-readdir enables making effective readdir calls from the data
stack; increasing the TCP buffers also optimizes performance. The buffer size also must be increased:
nfs modify -vserver vs1_eda_lib -force-spinnp-readdir true -tcp-max-xfer-
size 65536
The following steps must be followed to configure the NFS failover groups. The example shows
how the LIFs vs1_eda_lib_data3 and vs1_eda_lib_data4, which are assigned to an NFS
failover group, move the NFS traffic over port e7e on node fas6280-svl07.
bumblebee::*> network interface failover-groups create -failover-group
lib_failover_group -node fas6280c-svl07 -port e7e
bumblebee::*> network interface failover-groups show -failover-group
lib_failover_group -instance
Failover Group Name: lib_failover_group
Node: fas6280c-svl07
Port: e7e
1 entries were displayed.
bumblebee::*> network interface modify -vserver vs1_eda_lib -lif
vs1_eda_lib_data3,vs1_eda_lib_data4 -failover-group lib_failover_group
2 entries were modified.
6.7 Parallel Network File System (pNFS)
NFSv3 has been very popular and is the de facto protocol required by most of the cell library
characterization applications. The NFSv3 protocol has generally met the performance needs of most cell
design applications to date. NFSv4 was never a performance player. That version of NFS was intended
mainly to be more of a security and reliability play, with features like Kerberos, access control lists, and
delegations. However, with NFSv4.1 file delegations and pNFS, we can achieve performance along with
the security and reliability that the NFSv4.x protocol provides.
pNFS is an extension to the minor version NFSv4.1. Unlike NFSv3, NFSv4, and NFSv4.1, which have
metadata and data going on a single I/O path, pNFS isolates the metadata from the data. pNFS primarily
consists of two main components:
Metadata server (MDS): Handles all metadata operations, such as GETATTR, ACCESS, LOOKUP,
SETATTR, and file layout information.
Data server (DS): Stores all the inode information and the real data. The clients get a direct path to access the DS.
Figure 7 shows how a pNFS client communicates with the MDS and the DS. The diagram on the right
also illustrates how pNFS is implemented in clustered Data ONTAP 8.2. pNFS is purely a clustered Data
ONTAP implementation. Data ONTAP operating in 7-Mode does not support pNFS. For more details on
pNFS, refer to TR-4063, “Parallel Network File System Configuration and Best Practices for Clustered
Data ONTAP 8.2.”
19 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Figure 7) pNFS implementation.
The diagram on the left side of Figure 7 shows the generic pNFS implementation, in which a pNFS client
communicates with the metadata server to get file location information. File layout information is sent to
the client from the MDS, which hands out the location of the file in the DSs and also the information about
the network path to get to that location. The control protocol provides synchronization between the MDS
and the DS.
The diagram on the right side of Figure 7 illustrates that every node in clustered Data ONTAP is an MDS
and a DS. For any LIF IP address in the SVM that is mounted by the pNFS client, that cluster node
becomes the MDS for that client. If there is a data volume located in that cluster node or any other node
in the cluster setup, it becomes the DS. Eventually the pNFS clients can reach the data volumes that are
located in the cluster namespace through a local data path. NetApp products implement pNFS only over
files. There is no block or object implementation for pNFS available at this time.
Enable NFSv4.1/pNFS with a Delegation in Clustered Data ONTAP 8.2
Clustered Data ONTAP 8.2 supports NFSv4.1 file delegations. The following options must be enabled on the SVM to use NFSv4.1/pNFS with read and write delegations:
nfs modify -vserver vs1_eda_lib -v4.1-pnfs enabled -v4.1-read-delegation
disabled -v4.1-write-delegation disabled
NFSv4.1/pNFS has a client dependency. RHEL 6.5 is now the generally available version that supports
pNFS over files. Clustered Data ONATP 8.2 is optimized to perform better with RHEL 6.4 over NFSv3
and NFSv4.1/pNFS with file delegations enabled. Section 5 provides more information about the
performance validation that was done in the Virtuoso Liberate lab. Cadence is currently validating and
qualifying RHEL 6.5 for Virtuoso Liberate applications.
NFSv4.1/pNFS is a step in the right direction to handle the growing performance demands of the cell
library characterization workloads specifically in the assembly phase. Newer versions of the Virtuoso
Liberate tool support a 64-bit architecture. A parallel file system such as pNFS and clustered Data
ONTAP can achieve the high concurrency and performance requirements of the Virtuoso Liberate
application and complete jobs faster over NFSv3. The performance details are discussed in the following
sections.
Virtuoso Liberate workloads always consist of large amounts of metadata. In a traditional NFSv3 setup, a
single controller gets bottlenecked because of large metadata operations. Because pNFS isolates the
20 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
metadata from the data, an innovative way to spread the metadata over all the controllers in a cluster
setup was tested by using an on-box DNS round-robin. This also allows distribution of the NFSv4.1 locks
across all the cluster nodes rather than bottlenecking a single node where the cell libraries are located.
On-Box DNS Round-Robin
Clustered Data ONTAP 8.2 provides the ability to leverage the named service on each node to service
DNS requests from clients. Clustered Data ONTAP 8.2 also issues data LIF IP addresses based on an
algorithm that calculates CPU and node throughput to provide the least utilized data LIF for proper load
balancing across the cluster for mount requests. When a mount is successful, the client continues to use
that connection until remount. This differs from round-robin DNS, because the external DNS server
services all requests and has no insight into how busy a node in the cluster is. Instead, the DNS server
simply issues an IP address based on which IP is next in the list.
Additionally, round-robin DNS issues IP addresses with a time to live (TTL). This caches the DNS request
in Microsoft® Windows® for 24 hours by default. On-box DNS issues a TTL of 0, which means that DNS is
never cached on the client and a new IP is always issued based on load.
On-Box DNS Round-Robin Configuration
The following steps illustrate how to configure the on-box DNS round-robin on clustered Data ONTAP 8.2
and Microsoft Windows Server® 2008 R2.
Clustered Data ONTAP 8.2
The following example consists of a four-node FAS6280 cluster. The SVM – vs1_lib_eda spans all
four cluster nodes. There are four LIF IP addresses configured for this SVM. Each cluster node has an
LIF IP address configured on its home port:
FAS6280-svl07: 172.31.22.170
FAS6280-svl08: 172.31.22.171
FAS6280-svl09: 172.31.22.172
FAS6280-svl10: 172.31.22.173
The Windows Server 2008 R2 DNS IP address is 172.31.22.151.
21 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Enable the LIFs to Query the DNS Server
Check whether the DNS is configured correctly on the SVM. Refer to TR-4067, “Clustered Data ONTAP NFS Best Practice and Implementation Guide,” to configure DNS on clustered Data ONTAP 8.2:
bumblebee::*> vserver services dns create -vserver vs1_eda_lib -domains
eda.local.com -state enabled -timeout 2 -attempts 1 -name-servers
172.31.22.151,172.31.21.151
bumblebee::*> dns show
(vserver services dns show)
Name
Vserver State Domains Servers
--------------- --------- ----------------------------------- --------------
--
vs1_eda_lib enabled eda.local.com, 172.31.22.151,
eda-win-1.eda.local.com 172.31.21.151
Configure the LIFs to query the DNS server. This is a new feature in clustered Data ONTAP 8.2:
bumblebee::*> net int show -vserver vs1_eda_lib -fields address
(network interface show)
vserver lif address
----------- ----------------- -------------
vs1_eda_lib vs1_eda_lib_data1 172.31.22.170
vs1_eda_lib vs1_eda_lib_data2 172.31.22.171
vs1_eda_lib vs1_eda_lib_data3 172.31.22.172
vs1_eda_lib vs1_eda_lib_data4 172.31.22.173
4 entries were displayed.
bumblebee::*> net int show -vserver vs1_eda_lib -fields dns-zone,listen-for-
dns-query
(network interface show)
vserver lif dns-zone listen-for-dns-query
----------- ----------------- ----------------- --------------------
vs1_eda_lib vs1_eda_lib_data1 lib.eda.local.com true
vs1_eda_lib vs1_eda_lib_data2 lib.eda.local.com true
vs1_eda_lib vs1_eda_lib_data3 lib.eda.local.com true
vs1_eda_lib vs1_eda_lib_data4 lib.eda.local.com true
4 entries were displayed.
Windows Server 2008 R2 for DNS
From the DNS Manager, create a new delegation for the host name lib. Do not append the fully
qualified domain name. The DNS server appends that automatically. Figure 8 shows a screen shot of this action.
22 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Figure 8) Configuring the delegation on the DNS server.
Add the entire four IP addresses (listed in this example) one by one to host the delegated zone.
23 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Resolving lib to Different IP Addresses by Using the On-Box DNS Round-Robin
C:\Users\Administrator>nslookup lib
Server: localhost
Address: 127.0.0.1
Non-authoritative answer:
Name: lib.eda.local.com
Address: 172.31.22.171
C:\Users\Administrator>nslookup lib
Server: localhost
Address: 127.0.0.1
Non-authoritative answer:
Name: lib.eda.local.com
Address: 172.31.22.173
C:\Users\Administrator>nslookup lib
Server: localhost
Address: 127.0.0.1
Non-authoritative answer:
Name: lib.eda.local.com
Address: 172.31.22.172
C:\Users\Administrator>nslookup lib
Server: localhost
Address: 127.0.0.1
Non-authoritative answer:
Name: lib.eda.local.com
Address: 172.31.22.171
C:\Users\Administrator>nslookup lib
Server: localhost
Address: 127.0.0.1
Non-authoritative answer:
Name: lib.eda.local.com
Address: 172.31.22.173
C:\Users\Administrator>
24 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Mounts on the Compute Nodes
Because of the on-box DNS round-robin, compute nodes mount different IP addresses each time they
mount the chip design volumes by using the hostname lib.
Node 1:
lib:/VOL6 on /vol6-pnfs type nfs
(rw,bg,rsize=65536,wsize=65536,hard,intr,proto=tcp,timeo=600,vers=4,minorvers
ion=1,clientaddr=172.17.44.232,addr=172.31.22.170)
Node 2:
lib:/VOL6 on /vol6-pnfs type nfs
(rw,bg,rsize=65536,wsize=65536,hard,intr,proto=tcp,timeo=600,vers=4,minorvers
ion=1,addr=172.31.22.173,clientaddr=172.31.22.160)
NFSv4.1/pNFS Best Practices
Read and write file delegations should be enabled for NFSv4.1 to promote aggressive caching.
pNFS provides data locality. The volume can be accessed over a direct path from anywhere in the cluster.
There is no requirement to have a ratio of 1 LIF to 1 volume for NFSv4.1/pNFS with delegations, compared with the recommendation in section 6.4 for NFSv3.
If a volume is moved for capacity or workload balancing, there is no requirement to move or migrate the LIF around in the cluster namespace to provide local access to the volumes.
NFSv4.1 is a stateful protocol, unlike NFSv3. If there is ever a requirement to migrate an LIF, the I/O operations stall for 45 seconds to migrate the lock states over to the new location.
7 Other Features in Clustered Data ONTAP 8.2 for Virtuoso Liberate
Workloads
7.1 SnapVault
SnapVault was introduced for the first time in clustered Data ONTAP 8.2. SnapVault performs a logical
replication at the volume level, asynchronously. In addition:
SnapVault does not generate too much metadata while replicating the data.
The SnapVault destination can have an asymmetric number of Snapshot copies from the origin.
Users can “browse and restore” single files by using ndmpcopy and also specific volumes.
Storage efficiency is preserved with SnapVault. The SnapVault destination preserves the deduplication and compression of data when replicated from the origin.
SnapVault does SVM and cluster peering to mirror data across different SVMs and clusters, respectively.
SnapVault SVM Peering Configuration
The following commands configure SVM peering to start the SnapVault process:
25 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
bumblebee::*> vserver peer show
There are no Vserver peer relationships.
bumblebee::*> vserver peer create -vserver vs1 -peer-vserver vs1_eda_lib -
applications snapmirror
Info: 'vserver peer create' command is successful.
bumblebee::*> vserver peer show
Peer Peer
Vserver Vserver State
----------- ----------- ------------
vs1 vs1_eda_lib peered
vs1_eda_lib vs1 peered
2 entries were displayed.
bumblebee::*> vol create -vserver vs1 -volume VOL06VAULT -aggregate
aggr1_fas6280c_svl07_1 -size 3t -type dp
(volume create)
[Job 16024] Job succeeded: Successful
bumblebee::*> snapmirror create -S vs1_eda_lib:VOL06 vs1:VOL06VAULT -type
XDP
Operation succeeded: snapmirror create the relationship with destination
vs1:VOL06VAULT.
bumblebee::*> snapmirror show
Progress
Source Destination Mirror Relationship Total Last
Path Type Path State Status Progress Healthy
Updated
----------- ---- ------------ ------- -------------- --------- ------- -----
---
vs1_eda_lib:VOL06
XDP vs1:VOL06VAULT
Uninitialized
Idle - true -
7.2 Storage QoS
Storage QoS provides another level of storage efficiency in which IOPS and bandwidth limits can be set
for workloads that are not critical or when setting up SLOs on different workloads. In EDA environments,
storage QoS plays an important role:
Rogue workloads can be isolated with proper IOPS and bandwidth limits set in a different QoS policy group for users who generate these kinds of workloads in a production environment. This can be done at an SVM, volume, or specific file level.
In an IT-managed cloud infrastructure, storage QoS helps to run multiple tenants with different service-level offerings. New tenants can be added to the existing one as long as the storage platform has the headroom to handle all the workload requirements. Different workloads, such as builds,
26 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
verifications, cell library characterization, and other EDA tools, can coexist on the same storage controller. Each of the individual workloads can have different performance SLOs assigned to them.
Storage QoS Configuration
A QoS policy group must be created for different SVMs in the cluster. In the following example, two
QoS policy groups are created; business_critical and non_critical have different IOPS
and bandwidth settings:
bumblebee::*> qos policy-group create -policy-group business_critical -vserver
vs1_eda_lib -max-throughput 1.2GB/sec
bumblebee::*> qos policy-group create -policy-group non_critical -vserver
vs1_eda_lib -max-throughput 2000IOP
bumblebee::*> qos policy-group show
Name Vserver Class Wklds Throughput
---------------- ----------- ------------ ----- ------------
business_critical
vs1_eda_lib user-defined - 0-1.20GB/S
non_critical vs1_eda_lib user-defined - 0-2000IOPS
2 entries were displayed.
Volume vol06 is then set with the QoS policy group non_critical:
bumblebee::*> vol modify -vserver vs1_eda_lib -volume CMSGE -qos-policy-group
non_critical
(volume modify)
Volume modify successful on volume: CMSGE
The file writerandom.2g.88.log has been set to a non_critical QoS policy group. You
cannot set a QoS policy group on a file when the volume that holds that file already has a QoS
policy group set on it. The QoS policy group on the volume must be removed before the policy can
be set on a particular file in that volume:
bumblebee::*> file modify -vserver vs1_eda_lib -volume VOL06 -file
//OpenSPARCT1/Cloud_free_trial_demo/OpenSparc-
T1/model_dir/farm_cpu_test/writerandom.2g.88.log -qos-policy-group non_critical
bumblebee::*> qos workload show
Workload Wid Policy Group Vserver Volume LUN Qtree File
-------------- ----- ------------ -------- -------- ------ ------ -------------
CMSGE-wid12296 12296 non_critical vs1_eda_lib
CMSGE - - -
file-writerandom-wid11328
11328 non_critical vs1_eda_lib
VOL06
- -
/OpenSPARCT1/Cloud_free_trial_demo/OpenSparc-
T1/model_dir/farm_cpu_test/writerandom.2g.88.log
2 entries were displayed.
27 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
7.3 Nondisruptive Operation (NDO)
NDO completely changes the way that clustered Data ONTAP keeps data alive and available to the
application and the users who access the data. Disruptive scenarios were tested in the Cadence lab
under Virtuoso Liberate workloads to determine whether there was disruption to users at the application
layer:
When a data port was taken offline, the LIF IP address instantly failed over to another node in the cluster. This did not cause any outage for the user accessing the data under the load.
The chip library volume was moved to a different cluster node under the active Virtuoso Liberate load for capacity- and workload-balancing reasons. The volume and the LIF were moved to the new location in the cluster namespace without disrupting the user’s running jobs on the chip design volume.
Nondisruptive Operation with Volume Move
In this example, the volume VOL06 is moved from an aggregate in FAS6280-svl10 to an aggregate
in FAS6280-svl07 while the Virtuoso Liberate workload is in progress. There is no disruption to the application when the volumes are moved on the storage.
bumblebee::*> vol move start -vserver vs1_eda_lib -volume VOL06 -
destination-aggregate aggr1_fas6280c_svl07_1
(volume move start)
[Job 17268] Job is queued: Move "VOL06" in Vserver "vs1_eda_lib" to
aggregate "aggr1_fas6280c_svl07_1". Use the "volume move show -vserver
vs1_eda_lib -volume VOL06" command to view the status of this operation.
job show <job_id> can be used to check the status of the “vol move.”
bumblebee::*> job show 17268
Owning
Job ID Name Vserver Node State
------ -------------------- ---------- -------------- ----------
17268 Volume Move bumblebee fas6280c-svl10 Success
Description: Move "VOL06" in Vserver "vs1_eda_lib" to aggregate
"aggr1_fas6280c_svl07_1"
NDO can also be performed during hardware technical refreshes when all the volumes on an entire node can be evacuated to another cluster node and moved back nondisruptively to the new controllers after the refresh process.
Nondisruptive upgrades (NDUs) can also be performed on clustered Data ONTAP versions and the shelf and disk firmware without causing any outage to the application.
8 Compute Farm Optimization
The engineering compute farms in an chip design environment consist of tens of thousands of cores,
which translates to hundreds to thousands of physical compute nodes. Virtualization is usually not
deployed in these farms. Linux is the most commonly used operating system in the compute farm. Linux
clients in the compute farm provide the number of cores that are required to process the number of jobs
submitted.
For better client-side performance with clustered Data ONTAP 8.2, the cell library characterization
application and the schedulers, such as Sun Grid Engine (SGE) or Load Sharing Facility (LSF), must be
run on RHEL 5.8 and later or RHEL 6.5 and later. Cadence has validated and certified RHEL 6.5 for cell
library characterization tools.
28 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
8.1 RHEL 6.5 Clients in the Compute Farm
Why Deploy the RHEL 6.5 Kernel in the Compute Farm
RHEL 6.5 has more optimization in the TCP stack to handle NFS requests, compared with the earlier
version. RHEL 6.4 is also a generally available version for pNFS support for files. Versions earlier than
RHEL 6.4 are not qualified to run NFSv4.1/pNFS.
Clustered Data ONTAP 8.2 and RHEL 6.5 have been tested and validated by NetApp’ s NFS Engineering
QA team along with Red Hat NFS Engineering. A lot of bugs have been jointly scrubbed and fixed on the
RHEL 6.5 kernel and on clustered Data ONTAP 8.2, as well. For more details, refer to TR-3183, “Using
Red Hat Client with NetApp Storage over NFS.”
The RHEL 6.5 generally available kernel does not have all the bugzilla fixes. You must perform a yum
update, which downloads the latest .bz stream from Red Hat Network that has the fixes for most of the
known issues. After the kernel has been updated, the new kernel should show as 2.6.32-
431.14.1.el6.x86_64.
How to Deploy the RHEL 6.5 Kernel in the Compute Farm
It will not be easy to upgrade hundreds of compute nodes to RHEL 6.5. RHEL 6.5 can be added as a new
deployment in the compute farm after the release becomes qualified to be used in your environment. The
file system exported from clustered Data ONTAP 8.2 can be mounted on various Linux kernels in your
compute farm. All your pre-RHEL 6.5 clients can continue mounting the file system over NFSv3, and the
same file system can also be mounted over NFSv4.1/pNFS on the RHEL 6.5 clients. This means that
NFSv3 and NFSv4.1/pNFS can coexist in the compute farm for one or more exported file systems.
EDA tool vendors such as Cadence have qualified and provide support for RHEL 6.5. After RHEL 6.5 is
qualified by Cadence for its cell library characterization applications to support x86_64-bit kernels,
NFSv4.1/pNFS will be just another protocol like NFSv3 to be used in the Virtuoso Liberate environment.
The benefits of having both NFSv3 and NFSV4.1/pNFS protocols coexisting in the compute farm include:
No change is required for the existing compute nodes that mount the file systems over NFSv3. There is no disruption to the existing clients in the compute farm as more nodes on RHEL 6.4 are added to scale the number of jobs. The same file system can also be mounted over NFSv3 or NFSv4.1/pNFS from the new pNFS-supported RHEL 6.4 clients.
Based on the performance validation documented in section 5, NFSv4.1/pNFS definitely provides a significant performance improvement in job completion times. Critical chip designs can be isolated from the rest to provide faster job completion time and better SLO.
8.2 Best Practices for Compute Nodes
Considering the high volume of nodes in the compute farm, it is unrealistic to make significant changes
dynamically on each of the clients. Based on the Virtuoso Liberate workload evaluation, the following
recommendations on the Linux clients make a big contribution to improving the job completion times for
various chip design activities.
29 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Compute Node Optimization for NFSv3 Mounts
Turn off hyperthreading on the BIOS setting of each of the Linux nodes.
Use the recommended mount options while mounting over NFSv3 on the Linux compute nodes: vers=3,rw,bg,hard,rsize=65536,wsize=65536,proto=tcp,intr,timeo=600.
Set sunrpc.tcp_slot_table_entries = 128; this improves TCP window size. This option
is fine for pre-RHEL 6.4 kernels that mount over NFSv3. RHEL 6.4, however, includes changes to the TCP slot table entries. Therefore, the following lines must be included when mounting file systems on a RHEL 6.5 kernel over NFSv3. The following lines are not required when mounting over NFSv4.1, however. NetApp storage may have its network buffers depleted by a flood of RPC requests from Linux clients over NFSv3 :
Create a new file: /etc/modprobe.d/sunrpc-local.conf
Add the following entry: options sunrpc tcp_max_slot_table_entries=128
If the compute nodes are using 10GbE connections the following tuning options are required. The following changes do not apply for clients that use 1GbE connections:
Disable irqbalance on the nodes:
[root@ibmx3650-svl51 ~]# service irqbalance stop
Stopping irqbalance: [ OK ]
[root@ibmx3650-svl51 ~]# chkconfig irqbalance off
Set net.core.netdev_max_backlog = 300000; avoid dropped packets on a 10GBE
connection.
30 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
Compute Node Optimization for NFSv4.1/pNFS Mounts for RHEL 6.5 Clients
Turn off hyperthreading on the BIOS setting of each of the Linux nodes.
Use the recommended mount options while mounting over NFSv4.1 on the Linux compute nodes:
vers=4,rsize=65536,wsize=65536,hard,proto=tcp,timeo=600,minorversion=1
Set the ntp or time server on all the compute nodes:
[root@ibmx3650-svl50 /]# ntpdate -q 172.17.0.11
server 172.17.0.11, stratum 3, offset 40.629293, delay 0.02606
31 Jul 12:55:59 ntpdate[1567]: step time server 172.17.0.11 offset 40.629293
sec
[root@ibmx3650-svl50 /]# ntpdate 172.17.0.11
31 Jul 12:56:52 ntpdate[1568]: step time server 172.17.0.11 offset 40.629315
sec
[root@ibmx3650-svl50 /]# chkconfig ntpdate on
[root@ibmx3650-svl50 /]# service ntpd restart
Shutting down ntpd: [FAILED]
Starting ntpd: [ OK ]
Set the tuned-adm profile latency performance for compute-intensive workloads. The
following parameters are changed at the kernel level:
cat /sys/block/sdd/queue/scheduler set to [deadline] ; default is
[cfq]
cat /etc/sysconfig/cpuspeed. ‘governor’ is set to ‘performance’;
default is the ‘governor’ is set to nothing. This uses the performance
governor for p-states through cpuspeed.
In RHEL 6.5 and later, the profile requests a cpu_dma_latency value of 1.
If the compute nodes are using 10GBE connections, the following tuning options are required. The following changes do not apply for clients that use 1GBE connections:
Disable irqbalance on the nodes:
[root@ibmx3650-svl51 ~]# service irqbalance stop
Stopping irqbalance: [ OK ]
[root@ibmx3650-svl51 ~]# chkconfig irqbalance off
Set net.core.netdev_max_backlog = 300000; avoid dropped packets on a 10GBE
connection.
9 Summary
Cell or circuit design is getting more complicated with respect to size and yield optimization on the silicon
layers. As more and more <20nm chips are designed and manufactured for different consumer products,
characterization of standard cell libraries becomes critical to profile the different characteristics and
behavior of functions of a chip design across a broad range of operating conditions.
It is imperative to expedite the characterization process of the standard cell libraries to improve the overall
chip design process time. Storing, accessing, and managing all the cell libraries in a shared storage
infrastructure require low latency, high reliability, efficiency, and a single pane of manageability of the cell
library data. All the validations and best practices listed in this report clearly indicate that NetApp
clustered Data ONTAP 8.2, with the recommended storage optimizations and sizing, can accelerate the
cell library characterization process.
31 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
The main objective for integrating Cadence’s Virtuoso Liberate tool with the NetApp clustered file system
is to improve the job completion time at the application layer. Another very important factor that is
conducive to the chip design process is the fact that NetApp clustered Data ONTAP 8.2, with adequate
storage optimization and sizing, can improve the performance of workloads generated from the various
design and characterization tools that coexist in the chip design and manufacturing process. With QoS,
workloads can be tied to different SLOs for tools running on the same node or in different cluster nodes in
a scale-out architecture.
NetApp clustered Data ONTAP 8.2 allows load balancing of the cell library volume in the cluster
namespace by moving it to different controller nodes in the cluster that is under load without any
disruption to the application.
10 Conclusion
There is a high level of complexity in the cell library characterization process where foundry process,
voltage, and temperature are validated and modeled for smaller silicon surface areas that are designed to
perform different functions. As Cadence keeps optimizing the Virtuoso Liberate tool in every new release,
it is important that the compute nodes, the storage, and the protocol all contribute to the overall job
completion time.
NetApp originally set out to improve the job completion time for the Virtuoso Liberate application with
clustered Data ONTAP 8.2. With all the validations and optimizations at the compute nodes, NFS
protocol, and storage layers, we ended up achieving improvement of up to 15% for cell library
characterization with NFSv4.1/pNFS. These improvements provide a huge benefit on top of what we
already provide with Data ONTAP operating in 7-Mode.
This result leads to the conclusion that sizing adequately, optimizing the storage along with regular file-
system maintenance, and choosing the right protocol can improve cell library characterization
performance significantly. This translates into two important factors that drive business in the EDA
industry:
Improved ROI with optimized license costs
Faster time to market
32 Optimizing Standard Cell Library Characterization with Cadence Virtuoso Liberate and NetApp Clustered Data ONTAP 8.2
NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer’s responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.
© 2014 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, Data ONTAP, Flash Cache, OnCommand, RAID-DP, SnapMirror, Snapshot, SnapVault, vFiler, and WAFL are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. Cadence, the Cadence logo, and Virtuoso are registered trademarks and Liberate is a trademark of Cadence Design Systems, Inc. Linux is a registered trademark of Linus Torvalds. Microsoft, Windows, and Windows Server are registered trademarks of Microsoft Corporation. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. TR-4270-0214
Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact product and feature versions described in this document are supported for your specific environment. The NetApp IMT defines the product components and versions that can be used to construct configurations that are supported by NetApp. Specific results depend on each customer's installation in accordance with published specifications.