Florin TAU’18 panel Machine Learning · Florin Dartu •Deputy director, TSMC •Responsible for...
Transcript of Florin TAU’18 panel Machine Learning · Florin Dartu •Deputy director, TSMC •Responsible for...
TAU’18 panelMachine Learning:
Confluence with timing/EDA
Monterey, CAMar 15-16, 2018
Panel organizationPanelists
ArunVenkatachar(Synopsys)
FlorinDartu(TSMC)
KerimKalafala(IBM)
RichardPhillips(nVidia)
ShaanAwasthi(Intel)
ShivaRaja(Cadence) Debjit Sinha Qiuyang Wu
Cognitive ecosystem
1955-80
1990-00
2006+
2*images from Google images
Why now?q Big data
ü Cheap storage
q Computeü Cloudü GPU
q Algorithmü Open
source
Machine learning – Endless opportunities
3*images from Google images
Y = F(X)
Learn
Y = F(X)
Learn
Y = F(X), Z (delayed rewards)
Learn
Machine learning (ML) – Confluence with timing/EDA
q Timing applicationsq Miscorrelation mitigation, prediction?q Accuracy-performance tradeoffs
q Synthesis, P&Rq Better optimizationq What-if analysis
4
*images from keynote TAU’16 (A.B. Kahng)
q Perfect confluence?
ML in timing/EDA – Natural? Hype? Somewhere-in-between?
What we know todayq ML has value (but quality matters)q Investment needed – Data, infrastructure, software, skills
What would we like to predict todayq ROI on ML investment for timing/EDA
q Suggestions on maximizing this ROI(we love optimization problems)
What would we like to have in a future TAU:5*images from Google images
AlexaTime, I need(timing) closure
t
Panelists
6
ArunVenkatachar
• Group Director, R&D - Synopsys’ Verification group business unit (ML and big data team lead)• Applying ML techniques in production environments• Developed multiple technologies in the areas of simulation, compilers and debuggers• 20+ years of EDA experience, several papers/patents – distributed computing, simulation
FlorinDartu
• Deputy director, TSMC• Responsible for library characterization and timing• Prior roles – ATG/PrimeTime Synopsys, and SCL Intel
KerimKalafala
• Member, IBM Academy of Technology and IBM Master inventor, STSM in IBM Systems• Lead architect of static timing and noise analysis software tools in IBM EDA• DAC’04 best-paper, co-authored a top-10 most cited paper in the 50 year history of DAC• 49+ issued patents
RichardPhillips
• Director of ASIC-PD methodology at nVIDIA• Responsible for STA signoff methodologies, timing closure workflows• Worked across the design flow from RTL to timing closure since ‘98
ShaanAwasthi
• Senior manager of Design Automation, Intel Programmable solutions group• Charter to bring the latest and the greatest of tools, flows and methodologies to advance nodes• Collaborating on ML and big data technologies to augment the EDA flows• Prior roles - Technical account manager in Ansys, design engineer in Sun Micro systems
ShivaRaja
• Senior architect in Cadence• 22+ years of EDA experience - timing, SI, reliability, characterization and fast spice tools• Interests - statistical timing, data analytics, ML applications for EDA• Prior roles at Texas Instruments and CLK-DA
Questions for the panel
7
1. How do I identify if my problem is a good candidate for ML?q Candidates in the space of timing and power (analysis, optimization)
q E.g.: Path enumeration, variability, SI?q Which algorithm do I use?
2. Your “top 2” problems in timing/EDA for ML?q Data availability – Do we have “big data” for your problem(s)
q If a candidate for supervised learning, do we have “labeled” data?q Is the data open source – Can EDA industry and academia be mutually helpful?
3. ROI of ML in timing/EDAq Compute investment [CPU based à GPU based]q Re-write tools from scratch with ML baked in? ML around the tool?q Can we live with non-deterministic result? Designer adoption?
4. Need for timing/EDA tool updates for ML hardware (e.g. accelerator chip)q Treat like ”John/Jane Doe” chip?
5. Expertise development – Need?q EDA thinking (algorithm central) vs. ML thinking (data central)
q Large fraction of CS/CE majors with ML and big data background – Is EDA industry attractive enough (or do we get the Google, Facebook leftovers?)
6. Your prediction of the next gen. timing (analysis/optimization) tool
AlexaTime, I need(timing) closure
t
© 2018 Synopsys, Inc. 2
ML definition in context of EDA
• EDA has been using some form of learning over decades without using the phrase “ML”
– Statistical approaches, Heuristics, Prediction etc in our algorithms
Why ML now?
• Increased complexity of solving certain EDA problems are pushing the limits of
traditional algorithmic/heuristic approaches
• Little/no benefit is derived from vast volumes of data generated by tools
• Machine Learning can help users gain better insights to manage complexity growth
– ML uses statistical models derived from data to drive results
– ML-based computing techniques can be applied now to these problems
ML in context of EDA can be stated as“Statistical optimization at scale for prediction, classification,
and/or estimation”
Background
© 2018 Synopsys, Inc. 3
BETTER PERFORMANCE & QOR SMARTER OPERATIONS & PRODUCTIVITY
Digital Intelligence: Synopsys Machine Learning InitiativesFew case studies
Higher performance, smarter tools, smarter flows, and smarter operations
Formal PerformanceVC Formal
Route PredictionIC Compiler II
Low Power OptimizationPrimeTime
Coverage ClosureVCS
Development ML AppsBug Triaging, Check-in Analytics,Test Failure Predictor
Regression ML AppsTest selection, Grid Scheduling,Failure Triaging
Release ML AppsScoreboard, Quality insights
Optimal PPAImproved QoR
Synopsys Platform
Within the Tools Around the Tools
© 2018 Synopsys, Inc. 4
Which EDA Problems Good Candidates for ML? General Questions…
• Most EDA solutions are deterministic in nature– Can probabilistic results be tolerated ? How tight are the bounds required ?
• Does data for analysis exist or can it be generated ? – Can Data labels be generated ?
• Is learned model robust and portable ? – How tight is dependency to design (family), process etc.
• How stringent are debug requirements ?– Debugging ML models is a hard problem
© 2018 Synopsys, Inc. 5
ML Applicability to EDA Challenges –How to decide if a problem is a good candidate
• Divide-and-conquer
• Incremental algorithms
• Multiple heuristics
• Parallelization
• Sampling Methods
• Can we estimate divisions using ML ?• Is conquer step error tolerant ?
• Can ML order sub-problems to maximize incrementality?
• Can ML rank heuristics for throughput ?
• Can ML partition and schedule sub tasks?
• Can ML generate better sampling to guide optimization algorithms ?
© 2018 Synopsys, Inc. 6
ML Applicability to R&D Productivity –Improve productivity & quality, reduce cost
• Bug & FailureTriaging
• Test Selection
• Optimized Grid
• Exploration
• Improved support
• Learn from past bugs, failures to identify duplicates, route and reduce triage time
• Learn from past behaviors and patterns to select right tests to run
• Improve farm utilization by optimizing jobs fired on grid
• Gain insights into what variables play a part in your desired results
• Improve support TAT with NLP techniques
© 2018 Synopsys, Inc. 7
Case Study: Bug Triaging
#0 0x00007ffff4bd332d in waitpid ()#1 0x000000000c3353eb in SNPSee_9ea8d ()#2 0x000000000c3353eb SNPSee_9ea8db ()#3 0x000000000c33674d in SNPSee_9ea8dbbd5e7 ()#4 0x00007ffff4bd27db in read ()#5 0x000000000ea33f2f in SNPSee_82c794b1 ()#6 0x000000000ea342b4 in SNPSee_51710451 ()#7 0x000000000ea345b5 in SNPSee_23a87142 ()#8 0x000000000d08a015 in SNPSee_5d373cd8 ()
Enter the fatal stack trace or the CRM STAR number:
Matching CRM STARs
STAR NumberSimilarity
Score
9001077065 1
9001095737 0.99
9001107968 0.96
9001114191 0.959001056625 0.94
Search
9001077065...#0 0x00007ffff4bd332d waitpid ()#1 0x000000000c3353eb SNPSee_9ea8d ()#2 0x000000000c3353eb SNPSee_9ea8db #8 0x000000000d08a015 SNPSee_5d373cd8
Work log history:…
Stack Trace Similarity Tool
© 2018 Synopsys, Inc. 8
Work Flow
Case Study : Failure Triage
RegressionRuns
Failure Logs
ML based Grouping Engine
Failure Clusters
Tracer works out of clusters
Assigns to failure owners
Synopsys Confidential Information
Triage regression snapshots from live tracking page
Value Proposition
Real Time Test Status
Reduce cost of Computation
Predict which check-in
Gain insights into issues in the past
© 2018 Synopsys, Inc. 9
Case Study : Improved Support using ML/NLP
Incoming Case/Bug
Similar Case/Bug.
ML Engine
Trained on previous support tickets using
NLP Engine
• Natural Language Processing (NLP) engine has been trained on previous support tickets
• Accepts natural language questions in the query
• Will send auto-emails for matches (in development)
User can ask a natural language
question
Synopsys Confidential Information
© 2018 Synopsys, Inc. 10
What’s The Bottom Line?• In general, ML may not replace algorithmic methods in the short term• Look to ML to assist current/emerging solutions – low hanging fruit:
– Problems that depend on constraint solvers, Non-linear optimization engines, many parameters
– Problems that involve many or unstable heuristics—ML helps in heuristic selection, initial solution selection, resource prediction
• ML does quite well when applied to improve productivity– Regression test selection, Resource Predictions, Risk assessments etc
• Work on collecting and curating data for most part– Data engineering is 60%-80% of time. ML effort is lesser
• Don’t be shy to try new open source tools– Fail fast and explore freely
© 2018 Synopsys, Inc. 11
Data Challenge
• Big data for these problems is not the problem– It is not about size of data, it is about quality of data
• The problem is data diversity– Data from one design may not translate to another design– Data from one process may not translate to other processes– Models learned from sparse corpus from limited set of designs may not
translate well– Customers may need to share more data for improving model efficiency
• Invest in a big-data platform strategy– Data collection and management is a big part of analytics and ML– Either build your own or use from external vendors
Yes, it is a challenge in EDA…
© 2018 Synopsys, Inc. 12
Machine Learning Technology Platform
• Optimized and preconfigured for verification workflows
– Across tools
– Single-source data access
– Automated native integration
• Highest performance & scalability
• Easy, transparent installation
• Automatic monitoring; fully resilient
• Real-time access
Machine Learning Technology Platform
Optimized for Advanced Workflows
High-Performance Big Data Storage & Query Layer
Fault Tolerant Distributed Compute Layer
Real Time Data Transfer & Communications Protocol Layer
Machine Learning Libraries
Glo
bal M
onito
ring
&
Man
agem
ent
Physical Data Storage
RegressionsOperations R&D
Common Access APIs - Native Code & Web Services
Around the tools Apps
Within the tools Apps
Other Internal Applications
© 2018 Synopsys, Inc. 13
People Challenge
• Hiring high quality talent in EDA is hard– Applying ML in EDA requires domain knowledge since field is very complex
• Invest in your engineers to build talent pool– Statistics background is a good plus and a good starting point
• Hire interns and explore talent pool– Hire the good ones and groom them. They are not shy to try latest tools!
• Attend lots of conferences and meet ups– You will build contacts and get connected to potential hires
How to attract talented machine learning engineers to EDA?
ON
Machine Learning for EDA
Kerim Kalafala on behalf of IBM Systems & EDAMember IBM Academy of Technology, Master Inventor, STSM
Past is Prologue• Kerim’s very simple model for ML:
1. Study the “past” 2. Automateformulation of a model
3. Predict the “future”
Need #1: Raw data, features thereof Need #2:
Regularity / patterns that can be inferred
Need #3: “Temporal” continuity of patterns
I. Applied ML research opportunities– Along with key questions
II. Preparing the battlefield– How do we “feed the beast” ?
Interesting Candidates for ML – Synthesis parameter tuning• Can we use a re-enforced
learning model ? • Model overall synthesis flow
as a set of “moves” • Each of which can be “played”
through a subset of tunable parameters
• At the end of each move we have (probabilistic ) score on metrics of interest
• Can we infer a strategy which produces a better final score than any individual training run ?
• To what degree is knowledge transferrable ?
Interesting Candidates for ML – Outlier detection
1. Scalar projection of a multi-dimensional surface(correlates timing, device type, logic depth…)
• Given a set of measurements that we are interested in studying
• Can we use unsupervised learning techniques (e.g., clustering) to detect outliers for further analysis/follow-up ?• Negative anomaly
we want to squash• Unexpected positive
result we want to replicate
2. Response surface of cross-hierarchy paths wrt budget considerations
Interesting Candidates for ML – Improving Correlation with Sign-Off Timing• Given the vast amount
of training data available to us• (# Designs for
which we have timing data throughout analysis) X (cell instances per design)
• Can we derive regression models that improve correlation between early analysis and sign-off timing ?
What do we desire from STA to enable ML ?1. Persistence of timing results in a real database (not just in text reports)
1. Captured at various points in the flow 2. In a manner that is traversable (since we don’t know a priori all the
questions we want answered)2. And with context
1. Not all equal slacks are “equal” 2. Timing needs to be understood in the context of device selection,
topology, wiring, power, etc.3. Reduction of discontinuities
1. Variable detailed analysis thinking has led us down the road of cutoff based decision-making
2. Small changes in slack around a cutoff value can lead to large swings in subsequent results è avoid this behavior
InvestmentBuilt on strong parallel processing infrastructureDevices developed using standard methodologiesFlexible, programmable solutionsWide support for frameworks, networksAvailable on multiple cloud services
Internal CAD DevelopmentLocal DGX clusterData rich environment
NVIDIA in AI
When should be applied? Lacking solid theoretical mathematical foundationInexactApplying human expertise repetitively“Better”!/$
ML Candidate Attributes
Traditional algorithmsMathematical foundation is preferred
Machine LearningFeature identification requires domain expertise
Deep LearningRequires large datasets & computation
One size does not fit all – try different algorithumsLinear Regression, Random Forest Convolutional Neural Networks
ML Solution Lifecycle
Issue PredictionEarly Design analysis – Synthesized netlist to signoffSensitivity analysis – IR, Aging
OptimizationFine grained path optimizationParallel trial ranking Convergence prediction
TimingTool Correlation – Optimization to STA, STA to STA
ML Timing Tool Candidates
Programmable Solutions Group
Identifying problems suited for Machine Learning
• Transfer functions are not exactly known or easily derivable, lots of stochastic noise
• Repeated tasks, cannot be defined in a step by step procedural language
• Lots of training and properly labeled data available
• Multiple corners multi mode analysis
EDA tool Collect Data Feature engineering
Choose architecture/
hyper-parameters
Train the weights and
bias
Estimate the output
Input Data from current
project
Divide the data into
dev/testing/training
Implement the ML model
Reduce Mean Squared
Error(MSE)
Pre-processing of Dataset
End
Start
Normalize
Data
Programmable Solutions Group
Top IC design problems for ML
• Tools (synthesis/Physical Design/simulation) optimization settings
• .lib generation for different PVT scenarios
• Modeling of IC components
• Node/Design migration
• Post silicon/lithography defect identification
• Guided layout optimization and fixing
• Resource prediction and planning
Tool optimization Modeling/simulation Analytics
Programmable Solutions Group
Example Algorithms choice
Linear
/Polynomial
regression
KNN
silicon design
space
Resource planning/
resource prediction
PVT/scenario
reduction
Tool optimization
settings
Waveform
processing/lithography
Standard cell
modeling
Hard IP modeling
DNN CNN
RNN
LSTM
Programmable Solutions Group
IP protection and data sharing
• No large scales database like in other fields.
• Data labeling is expensive and require work from experienced engineers (anyone can label image data)
• Models/algorithms are commoditized, data is where the value proposition is
• Infrastructure is needed to share data including anonymization and obfuscation standards
• Data protection mindset must change in order for the ML to reach a wide adoption in EDA
ML
Database
EDA Vendors
Academia
Industry
Training
Data
Trained
Models
Trained
Model
Propriety Training
Data
Data Abstraction
Foundry
Programmable Solutions Group
Talent acquisition
• Expertise in electrical engineering as well as software
• Domain experts are needed to perform feature engineering
• Close collaboration with Academia and Industry (CAEML)
• Training programs for in-house resources
• Clear goals and problems that needs to be solved
• Implement and Monitor Sourcing Strategies
Programmable Solutions Group
Word of Caution
• Pooling, back propagation, are you really at global minima, empirical models
• Mostly models are empirical, physical models have value
• Each aspect of the model is not understood well, black box learning, e.g.: pooling, back propagation
• Beware of the hype, not a cure for all, manage expectations
• Is the model optimum, can it be improved further, is it stuck in local minima?
Local minima
Global minima
Programmable Solutions Group
Next generation tools
• Work with less data
• Learn from repetition, reinforcement
• More robust to noise in the system
• Understanding of the context of the environment it’s working in, and being able to learn and adapt based on changes in that environment.
• Tools which work across domains
• Reuse the trained networks for similar tasks --- Transfer learning
© 2018 Cadence Design Systems, Inc. All rights reserved.2
Machine Learning
• What is it not ?– Deterministic computation– Exact solution– Single possible outcome from single instance of input
• What is it ?– Predict best case outcome – Not exact solution– Use knowledge– Can use deterministic computation to build knowledge
© 2018 Cadence Design Systems, Inc. All rights reserved.3
Machine Learning
• Addition– Not ML
• Predicting by quadratic interpolation between 4 points– Not ML
• Predicting between 4 points– ML (interpolation method can be adaptive / driven by knowledge)
• SPICE simulation – Not ML
• Fast-MC SPICE simulation– ML
• WNS calculation– Not ML
• Set of critical path identification– Can use ML
© 2018 Cadence Design Systems, Inc. All rights reserved.4
Machine Learning – What does it cover ?
• Learning – Neural Network Deep Learning– Rule based learning
• Data analytics– Trend analysis
• Data visualization– Intuitive plots
© 2018 Cadence Design Systems, Inc. All rights reserved.5
ML application in Timing
• Timing across multi-PVT– Use characterized library to build knowledge
– Detect critical paths across multi-PVT as non characterized PVT can show worse critical paths
• Characterization data– Keep additional useful simulation data in library
– Data analytics, visualization of characterized data