Adaptive Sampling for Risk-Averse Stochastic Learning tldr ...Sebastian Curi, KfirY. Levy, Stefanie...

Non-convex: SGD error on ERMConvex:

Adaptive Sampling for Risk-Averse Stochastic Learning

What is the CVaR? • In high-stake applications, we want to do well even in rare events. • Standard ERM may sacrifice large-but-rare losses for the sake of

performing well in average.• Rather than focusing on the mean, the CVaR optimizes the average of the

tail of the distribution and focuses on harder examples.

AdaCVaR: A DRO Game

Related Work and Stochastic OptimizationMost of the previous work (e.g., Fan et al. (2019)) optimize the CVaR using the variational formula of Rockafellar & Uryasev (2000).

Experimental ResultsConvex Optimization Tasks:

• AdaCVaR has lower CVaR in Regression. • AdaCVaR has highest accuracy and low CVaR in Classification.• AdaCVaR has highest accuracy and lowest CVaR with distribution shift.

ReferencesRockafellar, R. T., & Uryasev, S. (2000). Optimization of conditional value-at-risk. Journal of risk, 2, 21-42.

Shapiro, A., Dentcheva, D., & Ruszczyński, A. (2014). Lectures on stochastic programming: modeling and theory. Society for Industrial and Applied Mathematics.

Fan, Y., Lyu, S., Ying, Y., & Hu, B. (2017). Learning with average top-k loss. In Advances in neural information processing systems.

Alatur, P., Levy, K. Y., & Krause, A. (2020). Multi-player bandits: The adversarial case. Journal of Machine Learning Research, 21.

Kulesza, A., & Taskar, B. (2012). Determinantal Point Processes for Machine Learning. Foundations and Trends® in Machine Learning, 5(2–3), 123-286.

Sebastian Curi, Kfir Y. Levy, Stefanie Jegelka, Andreas Krause

Mean

VaR↵Probability ↵

CVaR↵

Loss

Frequ

ency

• AdaCVaR has highest accuracy and lowest CVaR in image recognition.• AdaCVaR performs consistently better under distribution shift.

Non-Convex Optimization Tasks:

VGG16 ResNet18

CodePaper

Unfortunately, this formula is not well suited for large-scale stochastic optimization. The variance of gradients is increased due to:• Truncating the losses to zero• Multiplying losses by

Instead of using the variational formula of Rockafellar & Uryasev (2000), we use the distirbutionally robust formulation of the CVaR (Shapiro et al. 2014).

tldr: AdaCVaR, a novel algorithm for CVaR optimization in deep learning

• Game between a learner and a sampler. Challenge: DRO set is combinatorial.• Sampler plays k.EXP3 from Alatur et al. (2020) to find the hardest

distributions for the models the learner selects, adaptively. • Learner plays SGD on the examples proposed by the sampler. • We exploit the problem structure i.e., combinatorial set with additive losses

implementing k.EXP3 with k-DPPs (Kulesza & Taskar 2012).

min✓

CVaR↵[L(✓)] = min✓,`2R

`+1

↵E [max {0,L(✓)� `}]

`

Data

Loss `+

1↵ max {0;L(✓)� `}

L(✓)

min✓

CVaR↵[L(✓)] = min✓2⇥

maxq2Q↵

Eq[L(✓)]

Model ✓

Model ✓

Model ✓

Distribution q

Distribution q

Distribution q

Sampling Prob

Data

Sampling Prob

Data

Sampling Prob

Data

Adaptive Sampling for Risk-Averse Stochastic Learning tldr ...Sebastian Curi, KfirY. Levy, Stefanie...

Documents

Transcript of Adaptive Sampling for Risk-Averse Stochastic Learning tldr ...Sebastian Curi, KfirY. Levy, Stefanie...

CURI: A Benchmark for Productive Concept Learning under ...

The TLDR on MOOCs - bersin.com1).pdf · The short story on what MOOCs really mean for L&D . ... Massive Open Online Courses Most employers ... Cohort-based + semi-synchronous + blended

Updated Brazilian s Georeferenced Soil Database An …€¦ · Database An Improvement for International Scientific Information Exchanging Marcelo Muniz Benedetti 1, Nilton Curi 2,

What Can Neural Networks Reason About?What Can Neural Networks Reason About? Keyulu Xu Jingling Liy Mozhi Zhangy Simon S. Du z Ken-ichi Kawarabayashix Stefanie Jegelka Massachusetts

Curi viquiel

Jo Curi Creative 5

Akber Depok 10 Januari 2015 : "Curi Perhatian Dengan CV Menarik!" w/ Sita Sidharta @agatogata

Masayuki Okumura.; Rafael B.Ribeiro.; Nelson Tsuno*.; Nagib Curi and Joaquim Gama Rodrigues

ugelangaraes.edu.peugelangaraes.edu.pe/index.php/files/9/Oficios/47/LISTA-POR-AULA... · curasma curi curi curi curi cusi cutti cl-n-rr cuiti damian davila dÁvila de la cruz de la

A delight to discover. Impossible to forget.news.curio.com/assets/CURI/docs/factsheets/Curio_Collection_Medi… · LondonHouse Chicago See what others are saying about Curio Collection

“Yana Curi” Report - ChevronToxicochevrontoxico.com/assets/docs/yana-curi-eng.pdf · “Yana Curi” Report The impact of oil development ... (Sistema del Oleoducto Trans-Ecuatoriano

Alcides Beretta Curi - fhuce.edu.uy BERETTA CURI... · Capitulo V. La sociedad nacional de ... un componente insoslayable de la enseñanza superior, ... para comprender y transformar

i Entire Million - chroniclingamerica.loc.govchroniclingamerica.loc.gov/lccn/sn83030272/1902-12-15/ed-1/seq-9.pdfoilier uji In tin nud lvhed upon tldr nuktii harks Many of thtM nniiltle

Eastern Kentucky University EDU CURI UG Elementary … · Eastern Kentucky University EDU CURI UG Elementary Education (P-5) ... PRAXIS exams prior to graduation, but candidates are

Focused Model-Learning and Planning for Non …zi-wang.com/pub/fmlp.pdfFocused Model-Learning and Planning for Non-Gaussian Continuous State-Action Systems Zi Wang Stefanie Jegelka

Cooperative Cuts: Graph Cuts with Submodular …people.csail.mit.edu/stefje/papers/subcutsShort.pdfCooperative Cuts: Graph Cuts with Submodular Edge Weights Stefanie Jegelka, Jeff

Curi 12012

Tldr solr-courseload

tldr - UC Berkeley School of Information...tldr INTERFACES FOR LARGE-SCALE ONLINE DISCUSSION SPACES Master’s Final Project - 2009 by Srikanth NarayanAdvised by Coye Cheshire© 2009

curi Freeze curi · curi 18-Sep-19 01:31 AM this is achieved, to a first approximation, by everyone non-violently pursuing their individual interests. Freeze 18-Sep-19 01:31 AM ah