Florin TAU’18 panel Machine Learning · Florin Dartu •Deputy director, TSMC •Responsible for...

TAU’18 panelMachine Learning:

Confluence with timing/EDA

Monterey, CAMar 15-16, 2018

Panel organizationPanelists

ArunVenkatachar(Synopsys)

FlorinDartu(TSMC)

KerimKalafala(IBM)

RichardPhillips(nVidia)

ShaanAwasthi(Intel)

ShivaRaja(Cadence) Debjit Sinha Qiuyang Wu

Cognitive ecosystem

1955-80

1990-00

2006+

2*images from Google images

Why now?q Big data

ü Cheap storage

q Computeü Cloudü GPU

q Algorithmü Open

source

Machine learning – Endless opportunities

3*images from Google images

Y = F(X)

Learn

Y = F(X)

Learn

Y = F(X), Z (delayed rewards)

Learn

Machine learning (ML) – Confluence with timing/EDA

q Timing applicationsq Miscorrelation mitigation, prediction?q Accuracy-performance tradeoffs

q Synthesis, P&Rq Better optimizationq What-if analysis

4

*images from keynote TAU’16 (A.B. Kahng)

q Perfect confluence?

ML in timing/EDA – Natural? Hype? Somewhere-in-between?

What we know todayq ML has value (but quality matters)q Investment needed – Data, infrastructure, software, skills

What would we like to predict todayq ROI on ML investment for timing/EDA

q Suggestions on maximizing this ROI(we love optimization problems)

What would we like to have in a future TAU:5*images from Google images

AlexaTime, I need(timing) closure

t

Panelists

6

ArunVenkatachar

• Group Director, R&D - Synopsys’ Verification group business unit (ML and big data team lead)• Applying ML techniques in production environments• Developed multiple technologies in the areas of simulation, compilers and debuggers• 20+ years of EDA experience, several papers/patents – distributed computing, simulation

FlorinDartu

• Deputy director, TSMC• Responsible for library characterization and timing• Prior roles – ATG/PrimeTime Synopsys, and SCL Intel

KerimKalafala

• Member, IBM Academy of Technology and IBM Master inventor, STSM in IBM Systems• Lead architect of static timing and noise analysis software tools in IBM EDA• DAC’04 best-paper, co-authored a top-10 most cited paper in the 50 year history of DAC• 49+ issued patents

RichardPhillips

• Director of ASIC-PD methodology at nVIDIA• Responsible for STA signoff methodologies, timing closure workflows• Worked across the design flow from RTL to timing closure since ‘98

ShaanAwasthi

• Senior manager of Design Automation, Intel Programmable solutions group• Charter to bring the latest and the greatest of tools, flows and methodologies to advance nodes• Collaborating on ML and big data technologies to augment the EDA flows• Prior roles - Technical account manager in Ansys, design engineer in Sun Micro systems

ShivaRaja

• Senior architect in Cadence• 22+ years of EDA experience - timing, SI, reliability, characterization and fast spice tools• Interests - statistical timing, data analytics, ML applications for EDA• Prior roles at Texas Instruments and CLK-DA

Questions for the panel

7

1. How do I identify if my problem is a good candidate for ML?q Candidates in the space of timing and power (analysis, optimization)

q E.g.: Path enumeration, variability, SI?q Which algorithm do I use?

2. Your “top 2” problems in timing/EDA for ML?q Data availability – Do we have “big data” for your problem(s)

q If a candidate for supervised learning, do we have “labeled” data?q Is the data open source – Can EDA industry and academia be mutually helpful?

3. ROI of ML in timing/EDAq Compute investment [CPU based à GPU based]q Re-write tools from scratch with ML baked in? ML around the tool?q Can we live with non-deterministic result? Designer adoption?

4. Need for timing/EDA tool updates for ML hardware (e.g. accelerator chip)q Treat like ”John/Jane Doe” chip?

5. Expertise development – Need?q EDA thinking (algorithm central) vs. ML thinking (data central)

q Large fraction of CS/CE majors with ML and big data background – Is EDA industry attractive enough (or do we get the Google, Facebook leftovers?)

6. Your prediction of the next gen. timing (analysis/optimization) tool

AlexaTime, I need(timing) closure

t

Arun VenkatacharGroup Director, R&DMarch 15, 2018

Machine Learning – Confluence with EDA

© 2018 Synopsys, Inc. 2

ML definition in context of EDA

• EDA has been using some form of learning over decades without using the phrase “ML”

– Statistical approaches, Heuristics, Prediction etc in our algorithms

Why ML now?

• Increased complexity of solving certain EDA problems are pushing the limits of

traditional algorithmic/heuristic approaches

• Little/no benefit is derived from vast volumes of data generated by tools

• Machine Learning can help users gain better insights to manage complexity growth

– ML uses statistical models derived from data to drive results

– ML-based computing techniques can be applied now to these problems

ML in context of EDA can be stated as“Statistical optimization at scale for prediction, classification,

and/or estimation”

Background


BETTER PERFORMANCE & QOR SMARTER OPERATIONS & PRODUCTIVITY

Digital Intelligence: Synopsys Machine Learning InitiativesFew case studies

Higher performance, smarter tools, smarter flows, and smarter operations

Formal PerformanceVC Formal

Route PredictionIC Compiler II

Low Power OptimizationPrimeTime

Coverage ClosureVCS

Development ML AppsBug Triaging, Check-in Analytics,Test Failure Predictor

Regression ML AppsTest selection, Grid Scheduling,Failure Triaging

Release ML AppsScoreboard, Quality insights

Optimal PPAImproved QoR

Synopsys Platform

Within the Tools Around the Tools


Which EDA Problems Good Candidates for ML? General Questions…

• Most EDA solutions are deterministic in nature– Can probabilistic results be tolerated ? How tight are the bounds required ?

• Does data for analysis exist or can it be generated ? – Can Data labels be generated ?

• Is learned model robust and portable ? – How tight is dependency to design (family), process etc.

• How stringent are debug requirements ?– Debugging ML models is a hard problem


ML Applicability to EDA Challenges –How to decide if a problem is a good candidate

• Divide-and-conquer

• Incremental algorithms

• Multiple heuristics

• Parallelization

• Sampling Methods

• Can we estimate divisions using ML ?• Is conquer step error tolerant ?

• Can ML order sub-problems to maximize incrementality?

• Can ML rank heuristics for throughput ?

• Can ML partition and schedule sub tasks?

• Can ML generate better sampling to guide optimization algorithms ?


ML Applicability to R&D Productivity –Improve productivity & quality, reduce cost

• Bug & FailureTriaging

• Test Selection

• Optimized Grid

• Exploration

• Improved support

• Learn from past bugs, failures to identify duplicates, route and reduce triage time

• Learn from past behaviors and patterns to select right tests to run

• Improve farm utilization by optimizing jobs fired on grid

• Gain insights into what variables play a part in your desired results

• Improve support TAT with NLP techniques


Case Study: Bug Triaging

#0 0x00007ffff4bd332d in waitpid ()#1 0x000000000c3353eb in SNPSee_9ea8d ()#2 0x000000000c3353eb SNPSee_9ea8db ()#3 0x000000000c33674d in SNPSee_9ea8dbbd5e7 ()#4 0x00007ffff4bd27db in read ()#5 0x000000000ea33f2f in SNPSee_82c794b1 ()#6 0x000000000ea342b4 in SNPSee_51710451 ()#7 0x000000000ea345b5 in SNPSee_23a87142 ()#8 0x000000000d08a015 in SNPSee_5d373cd8 ()

Enter the fatal stack trace or the CRM STAR number:

Matching CRM STARs

STAR NumberSimilarity

Score

9001077065 1

9001095737 0.99

9001107968 0.96

9001114191 0.959001056625 0.94

Search

9001077065...#0 0x00007ffff4bd332d waitpid ()#1 0x000000000c3353eb SNPSee_9ea8d ()#2 0x000000000c3353eb SNPSee_9ea8db #8 0x000000000d08a015 SNPSee_5d373cd8

Work log history:…

Stack Trace Similarity Tool

https://navigator/SWE_NAVIGATOR/STAR/details.nhtml?starID=9001077065






Work Flow

Case Study : Failure Triage

RegressionRuns

Failure Logs

ML based Grouping Engine

Failure Clusters

Tracer works out of clusters

Assigns to failure owners

Synopsys Confidential Information

Triage regression snapshots from live tracking page

Value Proposition

Real Time Test Status

Reduce cost of Computation

Predict which check-in

Gain insights into issues in the past


Case Study : Improved Support using ML/NLP

Incoming Case/Bug

Similar Case/Bug.

ML Engine

Trained on previous support tickets using

NLP Engine

• Natural Language Processing (NLP) engine has been trained on previous support tickets

• Accepts natural language questions in the query

• Will send auto-emails for matches (in development)

User can ask a natural language

question

Synopsys Confidential Information


What’s The Bottom Line?• In general, ML may not replace algorithmic methods in the short term• Look to ML to assist current/emerging solutions – low hanging fruit:

– Problems that depend on constraint solvers, Non-linear optimization engines, many parameters

– Problems that involve many or unstable heuristics—ML helps in heuristic selection, initial solution selection, resource prediction

• ML does quite well when applied to improve productivity– Regression test selection, Resource Predictions, Risk assessments etc

• Work on collecting and curating data for most part– Data engineering is 60%-80% of time. ML effort is lesser

• Don’t be shy to try new open source tools– Fail fast and explore freely


Data Challenge

• Big data for these problems is not the problem– It is not about size of data, it is about quality of data

• The problem is data diversity– Data from one design may not translate to another design– Data from one process may not translate to other processes– Models learned from sparse corpus from limited set of designs may not

translate well– Customers may need to share more data for improving model efficiency

• Invest in a big-data platform strategy– Data collection and management is a big part of analytics and ML– Either build your own or use from external vendors

Yes, it is a challenge in EDA…


Machine Learning Technology Platform

• Optimized and preconfigured for verification workflows

– Across tools

– Single-source data access

– Automated native integration

• Highest performance & scalability

• Easy, transparent installation

• Automatic monitoring; fully resilient

• Real-time access

Machine Learning Technology Platform

Optimized for Advanced Workflows

High-Performance Big Data Storage & Query Layer

Fault Tolerant Distributed Compute Layer

Real Time Data Transfer & Communications Protocol Layer

Machine Learning Libraries

Glo

bal M

onito

ring

&

Man

agem

ent

Physical Data Storage

RegressionsOperations R&D

Common Access APIs - Native Code & Web Services

Around the tools Apps

Within the tools Apps

Other Internal Applications


People Challenge

• Hiring high quality talent in EDA is hard– Applying ML in EDA requires domain knowledge since field is very complex

• Invest in your engineers to build talent pool– Statistics background is a good plus and a good starting point

• Hire interns and explore talent pool– Hire the good ones and groom them. They are not shy to try latest tools!

• Attend lots of conferences and meet ups– You will build contacts and get connected to potential hires

How to attract talented machine learning engineers to EDA?

ON

Machine Learning for EDA

Kerim Kalafala on behalf of IBM Systems & EDAMember IBM Academy of Technology, Master Inventor, STSM

Past is Prologue• Kerim’s very simple model for ML:

1. Study the “past” 2. Automateformulation of a model

3. Predict the “future”

Need #1: Raw data, features thereof Need #2:

Regularity / patterns that can be inferred

Need #3: “Temporal” continuity of patterns

I. Applied ML research opportunities– Along with key questions

II. Preparing the battlefield– How do we “feed the beast” ?

Interesting Candidates for ML – Synthesis parameter tuning• Can we use a re-enforced

learning model ? • Model overall synthesis flow

as a set of “moves” • Each of which can be “played”

through a subset of tunable parameters

• At the end of each move we have (probabilistic ) score on metrics of interest

• Can we infer a strategy which produces a better final score than any individual training run ?

• To what degree is knowledge transferrable ?

Interesting Candidates for ML – Outlier detection

1. Scalar projection of a multi-dimensional surface(correlates timing, device type, logic depth…)

• Given a set of measurements that we are interested in studying

• Can we use unsupervised learning techniques (e.g., clustering) to detect outliers for further analysis/follow-up ?• Negative anomaly

we want to squash• Unexpected positive

result we want to replicate

2. Response surface of cross-hierarchy paths wrt budget considerations

Interesting Candidates for ML – Improving Correlation with Sign-Off Timing• Given the vast amount

of training data available to us• (# Designs for

which we have timing data throughout analysis) X (cell instances per design)

• Can we derive regression models that improve correlation between early analysis and sign-off timing ?

What do we desire from STA to enable ML ?1. Persistence of timing results in a real database (not just in text reports)

1. Captured at various points in the flow 2. In a manner that is traversable (since we don’t know a priori all the

questions we want answered)2. And with context

1. Not all equal slacks are “equal” 2. Timing needs to be understood in the context of device selection,

topology, wiring, power, etc.3. Reduction of discontinuities

1. Variable detailed analysis thinking has led us down the road of cutoff based decision-making

2. Small changes in slack around a cutoff value can lead to large swings in subsequent results è avoid this behavior

ML for Timing EDAA Perspective

Richard PhillipsDirector, ASIC-PD Methodology

InvestmentBuilt on strong parallel processing infrastructureDevices developed using standard methodologiesFlexible, programmable solutionsWide support for frameworks, networksAvailable on multiple cloud services

Internal CAD DevelopmentLocal DGX clusterData rich environment

NVIDIA in AI

When should be applied? Lacking solid theoretical mathematical foundationInexactApplying human expertise repetitively“Better”!/$

ML Candidate Attributes

Traditional algorithmsMathematical foundation is preferred

Machine LearningFeature identification requires domain expertise

Deep LearningRequires large datasets & computation

One size does not fit all – try different algorithumsLinear Regression, Random Forest Convolutional Neural Networks

ML Solution Lifecycle

Issue PredictionEarly Design analysis – Synthesized netlist to signoffSensitivity analysis – IR, Aging

OptimizationFine grained path optimizationParallel trial ranking Convergence prediction

TimingTool Correlation – Optimization to STA, STA to STA

ML Timing Tool Candidates

Programmable Solutions Group

Shaan Awasthi, Intel


Identifying problems suited for Machine Learning

• Transfer functions are not exactly known or easily derivable, lots of stochastic noise

• Repeated tasks, cannot be defined in a step by step procedural language

• Lots of training and properly labeled data available

• Multiple corners multi mode analysis

EDA tool Collect Data Feature engineering

Choose architecture/

hyper-parameters

Train the weights and

bias

Estimate the output

Input Data from current

project

Divide the data into

dev/testing/training

Implement the ML model

Reduce Mean Squared

Error(MSE)

Pre-processing of Dataset

End

Start

Normalize

Data


Top IC design problems for ML

• Tools (synthesis/Physical Design/simulation) optimization settings

• .lib generation for different PVT scenarios

• Modeling of IC components

• Node/Design migration

• Post silicon/lithography defect identification

• Guided layout optimization and fixing

• Resource prediction and planning

Tool optimization Modeling/simulation Analytics


Example Algorithms choice

Linear

/Polynomial

regression

KNN

silicon design

space

Resource planning/

resource prediction

PVT/scenario

reduction

Tool optimization

settings

Waveform

processing/lithography

Standard cell

modeling

Hard IP modeling

DNN CNN

RNN

LSTM


IP protection and data sharing

• No large scales database like in other fields.

• Data labeling is expensive and require work from experienced engineers (anyone can label image data)

• Models/algorithms are commoditized, data is where the value proposition is

• Infrastructure is needed to share data including anonymization and obfuscation standards

• Data protection mindset must change in order for the ML to reach a wide adoption in EDA

ML

Database

EDA Vendors

Academia

Industry

Training

Data

Trained

Models

Trained

Model

Propriety Training

Data

Data Abstraction

Foundry


Talent acquisition

• Expertise in electrical engineering as well as software

• Domain experts are needed to perform feature engineering

• Close collaboration with Academia and Industry (CAEML)

• Training programs for in-house resources

• Clear goals and problems that needs to be solved

• Implement and Monitor Sourcing Strategies


Word of Caution

• Pooling, back propagation, are you really at global minima, empirical models

• Mostly models are empirical, physical models have value

• Each aspect of the model is not understood well, black box learning, e.g.: pooling, back propagation

• Beware of the hype, not a cure for all, manage expectations

• Is the model optimum, can it be improved further, is it stuck in local minima?

Local minima

Global minima


Next generation tools

• Work with less data

• Learn from repetition, reinforcement

• More robust to noise in the system

• Understanding of the context of the environment it’s working in, and being able to learn and adapt based on changes in that environment.

• Tools which work across domains

• Reuse the trained networks for similar tasks --- Transfer learning

Machine LearningIn TimingShiva Raja, Senior ArchitectTau WorkshopMonterey, CA15 March, 2018

© 2018 Cadence Design Systems, Inc. All rights reserved.2

Machine Learning

• What is it not ?– Deterministic computation– Exact solution– Single possible outcome from single instance of input

• What is it ?– Predict best case outcome – Not exact solution– Use knowledge– Can use deterministic computation to build knowledge


Machine Learning

• Addition– Not ML

• Predicting by quadratic interpolation between 4 points– Not ML

• Predicting between 4 points– ML (interpolation method can be adaptive / driven by knowledge)

• SPICE simulation – Not ML

• Fast-MC SPICE simulation– ML

• WNS calculation– Not ML

• Set of critical path identification– Can use ML


Machine Learning – What does it cover ?

• Learning – Neural Network Deep Learning– Rule based learning

• Data analytics– Trend analysis

• Data visualization– Intuitive plots


ML application in Timing

• Timing across multi-PVT– Use characterized library to build knowledge

– Detect critical paths across multi-PVT as non characterized PVT can show worse critical paths

• Characterization data– Keep additional useful simulation data in library

– Data analytics, visualization of characterized data


Challenges in ML

• Quality of results using ML difficult to validate

• System to “learn by example” can have lots of potential

Florin TAU’18 panel Machine Learning · Florin Dartu •Deputy director, TSMC •Responsible for...

Documents

Transcript of Florin TAU’18 panel Machine Learning · Florin Dartu •Deputy director, TSMC •Responsible for...