The Effect Of Hyperparameters In The Activation Layers Of ...
NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High...
Transcript of NIH.AI Workshop on Hyperparameters Optimization Intro to ...hpo-introduction.pdf · HPO and High...
DEPARTMENT OF HEALTH AND HUMAN SERVICES • National Institutes of Health • National Cancer InstituteFrederick National Laboratory is a federally funded research and development center operated by Leidos Biomedical Research, Inc., for the National Cancer Institute.
NIH.AI Workshop on Hyperparameters Optimization
Intro to hyperparameter sweeps techniquesGeorge Zaki, Ph.D. [C]BIDS, Frederick National Lab for Cancer Research (FNLCR)
July 18, 2019
Model’s Parameters
• Are fit during training• Result of model fitting or training• Also optimized during training
• Examples• Slope and intercept in linear modeling
• Weights and biases in Neural Nets
What are Hyperparameters?
• Parameters of your system with no straightforward method on how to set their values:– Usually set before learning process– Is not directly estimated from the data
deepai.org
Examples of Hyperparameters
• The depth of a decision tree• Number of trees in a forest• Number of hidden layers and neurons in a neural network, • Degree of regularization to prevent overfitting• K in K-means• Learning rate schedule in Stochastic Gradient Descent (SGD)• …. Activation functions
28
Generalized Machine Learning Workflow
https://github.com/ECP-CANDLE/Tutorials/tree/master/2019/ECP
What is Hyperparameter Optimization
• Models have a large number of possible configuration parameters, called hyperparameters
• Applying optimization can automate part of the design of machine learning models
• Involves two problem: – How to set the values of the hyperparameters?– How to manage multiple evaluations on compute resources?
Hyperparameter Optimization (tuning) = HPO
Generalized HPO Diagram
https://sigopt.com/blog/common-problems-in-hyperparameter-optimization/
Basic HPO Strategies
• Grid search: – Generate all possible combinatorial configurations
6 hyperparameters, each with 4 values: 4^6 = 4096 configurations
• Random search– Randomly select some configurations to evaluate
• Sequential grid search: – Sequentially adjust one hyperparameter at a time, while fixing all other
hyperparameters
• Generic optimization– Evolutionary algorithms (Simulated annealing, particle swarm, genetic algorithms)
– Bayesian Optimization– Gradient-Based Optimization– Model-based optimization (mlrMBO in R)
U-Net Hyper Parameters Example:288 possible configurations
ONLY 2 Levels of Max-Pooling
N_layers = {2,3,4,5}
How many convolution filters?
Num_filters= {16,32,64}
What is the activation function?
Activation= {relu, softmax, tanh}
Size of conv filter?
Filter_size = {3x3, 5x5}
Drop out some results to avoid
overfitting?
Drop_out = {0, 0.2, 0.4, 0.6, 0.8}
Effect of Hyper Parameters Sweepon the Objective Function
00.10.20.30.40.50.60.70.80.91
1 51 101 151 201 251
Configuration ID
DICE Values Intersection
Union
Ground Truth
Predicted
DICE =
Frederick National Laboratory for Cancer Research
Baseline Methods: Grid Search & Random Search
• Embarrassingly parallel• Curse of dimensionality
• Embarrassingly parallel• Does not learn from history
Frederick National Laboratory for Cancer Research
Bayesian Optimization
1. Initially select random configurations to evaluate
2. Build a surrogate gaussian process as approximation of the objective function based on seen evaluations (posteriorydistribution)
3. Select good configurations to evaluate next based on a surrogate function (acquisition function) of your real objective.
4. Balance exploration versus exploitation
5. Repeat steps 2:4 until you reach your compute budget
Gaussian process approximation of objective function from Eric Brochu, Cora and Freitas 2010
Frederick National Laboratory for Cancer Research
Random versus Bayesian
https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f
Frederick National Laboratory for Cancer Research
HPO packages
• Python:– Hyperopt– scikit-optimize – Spearmint
• R:– mlrMBO
• Cloud:– Google’s Hypertune– Amazon’s SageMaker
• NN hyperparameter-specific optimization– NEAT, Optunity, …
Frederick National Laboratory for Cancer Research
HPO and High Performance Computing (HPC)
• HPO requires good amount of compute resources• HPC Used to manage large-scale training runs
– Hyperparameter searches O(104) jobs
– Cross validation (5-fold, 10-fold, etc.)
• Each job could be 10’s to 100’s of nodes• At NIH, we can use the Biowulf HPC cluster to perform these
evaluations
Survey
• Please follow the following link to share your thoughts about the workshop:
https://bit.ly/2JPagbe
Frederick National Laboratory for Cancer Research
References
• https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf
• https://cloud.google.com/blog/products/gcp/hyperparameter-tuning-cloud-machine-learning-engine-using-bayesian-optimization
• https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html
• https://roamanalytics.com/2016/09/15/optimizing-the-hyperparameter-of-which-hyperparameter-optimizer-to-use/
• https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-model-hyperparameters