20150522_Example_PyData_use-cases_in_astronomy_research
-
Upload
samuel-harrold -
Category
Data & Analytics
-
view
46 -
download
0
Transcript of 20150522_Example_PyData_use-cases_in_astronomy_research
![Page 1: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/1.jpg)
A research group’s use-cases for PyData tools
Samuel HarroldAstrophysics PhD Student, UT Austin
2015-05-22@ Continuum Analytics, Austin, TX
![Page 2: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/2.jpg)
Motivation
● In 2011:○ Research group mostly used
bash scripts, awk, Fortran, IDL, IRAF.○ Pipeline was tightly coupled with old
computers, cameras, camera software.
![Page 3: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/3.jpg)
Motivation
● In 2011:○ Research group mostly used
bash scripts, awk, Fortran, IDL, IRAF.○ Pipeline was tightly coupled with old
computers, cameras, camera software.● Goals for new computers and camera:
○ Make pipeline loosely coupled, cross-platform.○ Develop skills for non-academic job market.
![Page 4: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/4.jpg)
Motivation
● In 2011:○ Research group mostly used
bash scripts, awk, Fortran, IDL, IRAF.○ Pipeline was tightly coupled with old
computers, cameras, camera software.● Goals for new computers and camera:
○ Make pipeline loosely coupled, cross-platform.○ Develop skills for non-academic job market.
● Led research group in adopting Python tools.
![Page 5: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/5.jpg)
● Conflict of interest:Engineering vs publishing papers.
● To adopt best practices from industry, science needs more tools that lower the entry barrier.○ Example: It’s hard to mine your data if you don’t
know how to create a database.
Summary
![Page 6: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/6.jpg)
Outline
● Motivation
● Use-cases
● FAQ from researchers
![Page 7: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/7.jpg)
Use of some PyData tools
● Anaconda: Environment management.● IPython Notebooks: Copy-paste code share.● scikit-image: Detecting stars.● pandas: Data organization.● statsmodels, emcee: Robust statistics.● astropy, astroML: Astronomy-specific.
![Page 8: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/8.jpg)
Use-case: Star brightness vs time
● “Time-series photometry.”● Objective:
○ Extract relative brightness of stars from images during acquisition.
https://github.com/ccd-utexas/tsphot
![Page 9: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/9.jpg)
Use-case: Star brightness vs time
● Status:○ Developed to be good enough for internal use, but
not made robust for distribution.○ Conflict of interest: engineering vs publishing papers
https://github.com/ccd-utexas/tsphot
![Page 10: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/10.jpg)
Use-case: Data mining platform
● Objective:○ Predict which unobserved white dwarf stars pulsate.
■ What stars are there? From catalogs.■ Which stars are published (non)pulsators? From papers.■ Which stars are unpublished (non)pulsators? From our data.
http://www.slideshare.net/SamuelHarrold/20140409-harrold-dataminingdemostellarseminar
![Page 11: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/11.jpg)
Use-case: Data mining platform
● Status:○ Shut down due to under-use.
■ Users preferred grep + Excel rather than pandas.■ Users didn’t want to maintain MySQL database.
○ Conflict of interest: engineering vs publishing papers
http://www.slideshare.net/SamuelHarrold/20140409-harrold-dataminingdemostellarseminar
![Page 12: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/12.jpg)
Use-case: Reproducible research
● Objective:○ Compute the physical quantities of a binary star
system from time-series photometry.
https://github.com/stharrold/Harrold_2015_SDSSJ1600; https://pypi.python.org/pypi/binstarsolver
![Page 13: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/13.jpg)
Use-case: Reproducible research
● Status:○ Citable code on GitHub with DOI from zenodo.org.○ Distributable code published to PyPI.○ Conflict of interest: engineering vs publishing papers
https://github.com/stharrold/Harrold_2015_SDSSJ1600; https://pypi.python.org/pypi/binstarsolver
![Page 14: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/14.jpg)
FAQ from researchers● Questions:
○ “Why don’t you use ___?”○ “How does this help publish more papers?”○ “Why should I learn another language?”
![Page 15: 20150522_Example_PyData_use-cases_in_astronomy_research](https://reader030.fdocuments.us/reader030/viewer/2022032506/55c9dd07bb61eb2e4d8b473d/html5/thumbnails/15.jpg)
FAQ from researchers● Questions:
○ “Why don’t you use ___?”○ “How does this help publish more papers?”○ “Why should I learn another language?”
● Answers:○ “Look how quickly I can do ___.”○ Examples justify taking time to learn new skills.○ NSF Data Management and Sharing requirements:
https://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp○ TIOBE code popularity index:
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html○ Jake VanderPlas’s blog post on data science and academia:
https://jakevdp.github.io/blog/2014/08/22/hacking-academia/