About Data From A Machine Learning Perspective

Oriol Pujol

ABOUT DATA FROM A MACHINE LEARNING

PERSPECTIVE

TWO FLAVOURS AND ONE PRESENTATION

Share experiences in openness from the machine learning community

To propose potential solutions to the problem of sharing sensible data

A COMMUNITY SHIFT

THE ROLE OF DATA IN MACHINE LEARNING

• Machine learning is sometimes called learning from data. Thus we are talking about building/coding algorithms that learn from data.

• Data is the key component in most research, but it is obviously crucial in machine learning.

Automatic captioning

EXAMPLES OF TASKS IN MACHINE LEARNING

CURRENT MACHINE LEARNING RESEARCH FOCUS

• Improving results in terms of generalization, robustness, speed, and interpretability. • Comparison with other techniques is fundamental. • Many different data sets are needed to assess each property we are tackling in the research, e.g. we need large datasets for speed and big data.• Sometimes, their application largely depends on the domain, e.g., financial products forecasting, targeted gene identification.

AS A RESEARCHER

DISEMINATION: REPORTING PROCESS, CLAIMING AUTHORSHIP

QUALITY: COMMUNITY VALIDITY ASSESSMENT, PEER REVIEW (KPI USED BY ORGANISATIONS FOR RESEARCH QUALITY)

We pursue to contribute to advance science: reproducibility of experimentation that helps to eliminate individual biases.

FOR THIS REASON WE USE PUBLISHING

DISSEMINATIONQUALITY ASSESSMENT

FOR THIS REASON WE USE PUBLISHINGLONG PROCESS > 1 YEARNOT A GOOD DISEMINATION SOURCE, OR ATTRIBUTION PROC.

PEER QUALITY ASSESSMENT CRITICAL FEEDBACKGUARD AGAINST ERRORSMAJOR SOURCE FOR RANKING RESEARCHERS

NON-‐PUBLISHED RESEARCH IS NOT RESEARCH

FOR THIS REASON WE USE PUBLISHING

FAIRER PROCESS > 3-‐6 month

QUALITY ASSESSMENTRANKING RESEARCHERS

NON-‐PUBLISHED RESEARCH IS NOT RESEARCH

THE NIPS CASE: WHEN CONFERENCE PAPERS ARE OLD NEWS

IN MACHINE LEARNING 3-‐6 MONTHS IS TOO LONG

DECEMBER 5TH – 10TH , 2016

THE NIPS CASE: WHEN CONFERENCE PAPERS ARE OLD NEWS

24/7 PUBLISHING CYCLE

IMMEDIACY, 24/7 PUBLISHING CYCLE

IMMEDIATE ATTRIBUTION

NO QUALITY ASSESSMENTRANKING RESEARCHERSHOW TO DISTINGUISH GOOD RESEARCH FROM FRAUD

HOWEVER, IN REALITY

TOWARDS REPRODUCIBILITY: WE NEED DATA

AN EXAMPLE: PLOS ONE

FOCUS SHIFT:

METHODOLOGICAL CORRECTNESS

MACHINE LEARNING DATASET REPOSITORIES

Data Sets -‐ Raw data as a collection of similarily structured objects.Material and Methods -‐Descriptions of the computational pipeline.Learning Tasks -‐ Learning tasks defined on raw data.Challenges -‐ Collections of tasks which have a particular theme.

MACHINE LEARNING DATASET REPOSITORIES

• The U.S. Securities and Exchange Commission (SEC) 2010 proposal, financial products must include Pythoncode.

WHEN DESCRIPTION AND DATA IS NOT ENOUGH

GITHUB PROJECTS

BUT … DEVIL IS IN THE DETAILS

DOCUMENTING THE DETAILS

JUPYTER NOTEBOOKS

EXECUTABLE CODELATEXMARKDOWNHTML+CSSHYPERLINKSEMBEDDED FRAMES… and more.

PUTTING ALL TOGETHER

BOTTOM LINE

1. SCIENCE IS MORE THAN A PUBLICATION

2. AT ITS CORE LIES REPRODUCIBILITY…THIS INVOLVES DATA

3. BUT DATA IS NOT EVEN ENOUGH… CODE AND PRECISE EXPERIMENT REPRODUCIBILITY

4. DATA AND PROGRAMMING LITERACY IS NEEDED

BUT … SOMETHING ELSE ABOUT QUALITY ASSESSMENT: OPEN REVIEW CASE

THE RULES

• Submissions posted on arXiv before being submitted to the conference• Program committee assigns blind reviewers (as always) marked as designated reviewers• Anyone can ask the program chairs for permission to become an anonymous designated reviewer (open bidding).• Open commenters will have to use their real names, linked with their Google Scholar profiles.• Papers that are presented in the workshop track or are not accepted will be considered non-‐archival, and may be submitted elsewhere (modified or not), although the ICLR site will maintain the reviews, the comments, and the links to the arXiv versions

SOME POINTS ABOUT OPEN REVIEW

• Reading the answers from others is valuable

• Rejected papers have influence

•May become a popularity contest

SHARING DATA IN THE AGE OF DEEP LEARNING

Privacy budget

More privateLess private

SIFIER

Research is based on reproducibility:documentation, data, code, and details.

Machine learning methods can help in the challenge for democratizing data.

CONCLUSIONS

@oriolpujolvila

About Data From A Machine Learning Perspective

Education

Transcript of About Data From A Machine Learning Perspective

Anomaly Detection in Gamma Ray Spectra: A Machine Learning Perspective

A Machine Learning Perspective on Managing Noisy DataA Machine Learning Perspective on Managing Noisy Data ... • Human errors • Machine failures • Code bugs Data errors are everywhere.

05 history of cv a machine learning (theory) perspective on computer vision

Machine Learning: A Probabilistic Perspectivemurphyk/MLbook/pml-toc-1may12.pdf · Machine Learning A Probabilistic Perspective Kevin P. Murphy ... 1.3.3 Discovering graph structure

Machine Learning CSE546 - courses.cs.washington.edu · 䡦Machine Learning: a Probabilistic Perspective; Kevin Murphy Optional Books (free PDF): 䡦The Elements of Statistical Learning:

MACHINE REASONING: A PERSPECTIVE AND …...“From machine learning to machine reasoning”. Machine Learning Volume 94 Issue 2, 2004, Pages 133-149 Machine Learning Volume …

Applied Machine Learning at Facebook: A Datacenter ... · PDF fileApplied Machine Learning at Facebook: A Datacenter Infrastructure Perspective Kim Hazelwood, Sarah Bird, David Brooks,

A Perspective on Machine Learning Methods in Turbulence ...

Computer Security: A Machine Learning Perspective

Perspective: Energy Landscapes for Machine Learning · Perspective: Energy Landscapes for Machine Learning Andrew J. Ballard,1 Ritankar Das,1 Stefano Martiniani,1 Dhagash Mehta,2

Big Data Analytics in Bioinformatics: A Machine Learning Perspective

Understanding Machine Learning – A theory Perspectivemlss.tuebingen.mpg.de/2017/speaker_slides/Shai1.pdf · Understanding Machine Learning – A theory Perspective ... Proof All

A machine learning perspective on neural networks and learning tools Tom Schaul.

A Machine Learning Perspective on Predictive Coding with ...

Machine Learning an Algorithmic Perspective(2009)

Basic Mathematics A Machine Learning Perspective

LEADERSHIP PERSPECTIVE ML...LEADERSHIP PERSPECTIVE ML Machine Learning: Key Adoption Cybersecurity Considerations 2 3 Machine learning (ML) is an application of Artificial Intelligence

Machine Learning: A Probabilistic Perspective - …murphyk/MLbook/pml-toc-9apr12.pdf · · 2012-04-09Machine learning : a probabilistic perspective / Kevin P. Murphy. ... 4.6.4

Machine Learning - University of Pittsburgh · 3 Administration Study material • Other ML books: – K. Murphy. Machine Learning: A probabilistic perspective, MIT Press, 2012. –

Multiagent Systems: A Survey from a Machine Learning ...mmv/papers/MASsurvey.pdfMultiagent Systems: A Survey from a Machine Learning Perspective ... Additional opportunities for applying