2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically...

10
2020 Projects More info about Data+ visit: bigdata.duke.edu/data Page 1 of 10 1. Disease Emergence and Richness in Primates A team of students led by the Nunn lab and its collaborators will investigate the ecological and behavioral factors that determine parasitism in different species of primates. Based on publicly available data and evolutionary trees, students will investigate parasitism by developing a network of primate-parasite relationships. This network will then be used to infer the ecological and behavioral characteristics that best predict parasitism. The findings are relevant to identifying emerging infectious diseases in humans, and also for conservation efforts globally. Project Leads: Jim Moody, Charles Nunn Project Manager: Marie Claire Chelini 2. When Black Stories Go Global: Analyzing the Translation of African-American Literature and Film A team of students led by Humanities Unbounded Fellow Eva Michelle Wheeler will explore how culturally-bound language in African-American literature and film is rendered for international audiences and will map where and into which languages these translations are occurring. Students will use a reference dataset to build and annotate a translation corpus, explore the lexical choices and translation strategies employed by translators, and conduct a macro-level analysis of the geographic and linguistic spread of these types of translations. The results of this project will bring a quantitative dimension to what has largely been a qualitative analysis and will contribute to ongoing academic conversations about language, race, and globalization. Project Lead: Eva Wheeler 3. ABOUT-US – A BOundary Update Tool for Utility Services A team of students led by researchers from the Internet of Water project at the Nicholas Institute will develop an online tool that allows local water systems to update and verify their service boundaries while maintaining data security and functionality for state regulators. Students will have the opportunity to interact with state regulators and water system managers in North Carolina and California who will provide feedback on design and usability. This tool will improve system boundary data that are used for planning and decision-making purposes. Additionally, the tool may include functionality for basic spatial analyses such as overlaying boundaries on sociodemographic, economic, and environmental data. This would enable impact analyses, the identification of utilities and vulnerable populations affected by environmental hazards to water systems, and multi-system regional water supply projections. Project Leads: Megan Mullin, Lauren Patterson Project Manager: Kyle Onda

Transcript of 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically...

Page 1: 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie

2020 Projects

More info about Data+ visit: bigdata.duke.edu/data

Page 1 of 10

1. Disease Emergence and Richness in Primates

A team of students led by the Nunn lab and its collaborators will investigate the ecological and behavioral factors that determine parasitism in different species of primates. Based on publicly available data and evolutionary trees, students will investigate parasitism by developing a network of primate-parasite relationships. This network will then be used to infer the ecological and behavioral characteristics that best predict parasitism. The findings are relevant to identifying emerging infectious diseases in humans, and also for conservation efforts globally. Project Leads: Jim Moody, Charles Nunn Project Manager: Marie Claire Chelini

2. When Black Stories Go Global: Analyzing the Translation of African-American Literature and Film

A team of students led by Humanities Unbounded Fellow Eva Michelle Wheeler will explore how culturally-bound language in African-American literature and film is rendered for international audiences and will map where and into which languages these translations are occurring. Students will use a reference dataset to build and annotate a translation corpus, explore the lexical choices and translation strategies employed by translators, and conduct a macro-level analysis of the geographic and linguistic spread of these types of translations. The results of this project will bring a quantitative dimension to what has largely been a qualitative analysis and will contribute to ongoing academic conversations about language, race, and globalization. Project Lead: Eva Wheeler

3. ABOUT-US – A BOundary Update Tool for Utility Services

A team of students led by researchers from the Internet of Water project at the Nicholas Institute will develop an online tool that allows local water systems to update and verify their service boundaries while maintaining data security and functionality for state regulators. Students will have the opportunity to interact with state regulators and water system managers in North Carolina and California who will provide feedback on design and usability. This tool will improve system boundary data that are used for planning and decision-making purposes. Additionally, the tool may include functionality for basic spatial analyses such as overlaying boundaries on sociodemographic, economic, and environmental data. This would enable impact analyses, the identification of utilities and vulnerable populations affected by environmental hazards to water systems, and multi-system regional water supply projections. Project Leads: Megan Mullin, Lauren Patterson Project Manager: Kyle Onda

Page 2: 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie

2020 Projects

More info about Data+ visit: bigdata.duke.edu/data

Page 2 of 10

4. Human Activity Recognition using Physiological Data from Wearables

Human activity recognition (HAR) is a rapidly expanding field with a variety of applications from biometric authentication to developing home-based rehabilitation for people suffering from traumatic brain injuries. While HAR is traditionally performed using accelerometry data, a team of students led by researchers in the BIG IDEAS Lab will explore HAR with physiological data from wrist wearables. Using deep learning methods, students will extract features from wearable sensor data to classify human activity. The student team will develop a reproducible machine learning model that will be integrated into the Big Ideas Lab Digital Biomarker Discovery Pipeline (DBDP), which is a source of code for researchers and clinicians developing digital biomarkers from wearable sensors and mobile health technologies. Project Lead: Jessilyn Dunn Project Manager: Brinnae Brent

5. Predictive Modeling of Mechanical Failures at Sea

A team of students will analyze sensor data from a shipping fleet to develop predictive models to prevent mechanical failures from happening at sea and optimize the best time for replacement. They will have the opportunity to collaborate closely with analytics professionals from Fleet Management Limited, the world’s third-largest ship management company looking after 520+ vessels on behalf of owners. Faculty Sponsor: Paul Bendich Client Lead: Shah Irani, Fleet Management Limited

6. Taking electrification on the road: Exploring the impact of the Electric Farm Equipment roadshow

A team of students led by researchers in the Energy Initiative and the Energy Access Project will explore historical data on the U.S. Electric Farm Equipment (EFE) demonstration show that ran between 1939 and 1941, which aimed to increase usage of electricity in rural areas. Students will compile data collected by the Rural Electrification Agency into a machine-readable form, and then use that data to explore and visualize the EFE’s impact. If time allows, they will then compare data from the EFE and a related, smaller-scale project from 1923 (“Red Wing Project”) to current data on appliance promotion programs in villages in East Africa that have recently gained access to electricity. The outcomes of this analysis would offer evidence on the successes and limitations of these types of programs, and the relevance of the historical U.S. case to countries that are currently facing similar challenges. Project Leads: Victoria Plutshack, Jonathon Free, Robert Fetter

Page 3: 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie

2020 Projects

More info about Data+ visit: bigdata.duke.edu/data

Page 3 of 10

7. On Being a Blue Devil: the changing profile of the Duke student body

A team of students, led by University Archivist Valerie Gillispie and Professor Don Taylor, will take a closer look at how the student body at Duke has transformed into a coeducational student body from around the world enrolled in ten different schools. Students will seek to transform digital and historical data into a dynamic visual display which allows viewers to examine changes in the student body over time in terms of three dimensions: geographic origin, gender, and school. The students will use born-digital data along with historical, paper-based data to assemble a data corpus. The goal is to demonstrate trends and changes over time in terms of where Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie Gillispie

8. Network Visualization of Foot Traffic Patterns

A team of students led by data scientists and engineers from the Office of Information Technology will work to visualize foot traffic patterns in the Bryan Center. Students will be given a large dataset consisting of wifi data, which they will analyze to gain insight into usage patterns of the Bryan Center over various time periods. The work will help to identify areas of the center that experience high wear and tear, particularly during high-volume events such as basketball games. Project Leads: John Haws, Mary Thompson, Eric Hope, Sean Dilda Project Manager: Hunter Klein

9. Uncovering Latinx Southern History

A team of students led by History Professor Cecilia Márquez will use census data to understand the long history of Latinxs in the U.S. South. Despite a growing focus of historians and social scientists on the historical and contemporary Latinx South, there has not yet been a thorough data analysis of the historical presence of Latinxs in the South. The Data+ team will search the U.S. Federal Census, immigration records, and marriage records to determine the location of Latinxs in the U.S. South over the course of the late nineteenth and early twentieth centuries. This work will provide an invaluable data set to help us understand the long southern history of Latinxs. Project Lead: Cecilia Márquez

Page 4: 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie

2020 Projects

More info about Data+ visit: bigdata.duke.edu/data

Page 4 of 10

10. Protecting American Investors? Financial Advice from before the New Deal to the Birth of the Internet

The promoters for modern American capitalism have long encouraged individuals, including those of modest means, to build their wealth through investments. But how have ordinary investors learned about the opportunities and risks of putting their savings to work on Wall Street? A team of students working with History professor Ed Balleisen will delve into the evolving nature of investment advice from the early twentieth-century up to the start of the internet age. Creating datasets from financial advice columns in large circulation American newspapers and magazines, they will use text mining techniques and sentiment analysis to see how advice changed in response to the business cycle, the emergence of new types of investments, financial products, and investors, and the evolution of financial regulation. This is a chance to link data science to historical analysis of a key facet of finance capitalism. Project Lead: Ed Balleisen

11. Predicting Blindness in Duke’s Glaucoma Patient Population

A team of students led by researchers in the Duke Eye Center and Department of Statistical Science will develop statistical models to assess the risk of legal blindness in glaucoma patients using electronic health records (EHR) from Duke Health. Students will focus on identifying risk factors relevant locally to the Durham county patient population and will enrich the available EHR data with detailed social and environmental data using the Durham Neighborhood Compass. A priority of the research will be to develop an app to make the prediction model accessible, so that real-time decisions about medical care related to blindness can be made. For the greatest impact, the app will be created in close collaboration with clinicians and decision makers at Duke Health. Project Leads: Samuel Berchuck, Sayan Mukherjee, Felipe Medeiros Project Manager: Kimberly Roche

12. Forecasting campus energy usage for improved energy management

A team of students led by the Data and Analytics Practice at OIT will develop a robust forecasting model for predicting energy usage for different facilities on campus. Students will explore a wide range of real-world time-series data challenges from anomaly detection as well as handling, to benchmarking traditional statistical and modern machine learning models for forecasting. Students will also gain valuable experience developing an interactive application with latest open source libraries converting Jupyter notebooks into web applications to facilitate effective stakeholder collaboration. This work will enable several critical analyses for Duke Facilities Management to optimize their operations and significantly reduce costs. Projects Leads: John Haws, Gagandeep Kaur Project Manager: Billy Carson

Page 5: 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie

2020 Projects

More info about Data+ visit: bigdata.duke.edu/data

Page 5 of 10

13. Finding Space Junk with the World’s Biggest Telescopes

A team of students led by Physics professors Dan Scolnic, Michael Troxel and Chris Walter will build their own algorithms to use images taken as part of The Dark Energy Survey, one of the largest cosmological surveys, to learn more about all the things we find in space that we aren’t looking for. These can be anything from image artifacts, to cosmic ray hits, to satellite trails to Elon Musk's car (see picture). Each of these different things has their own signatures on the images, and automatic detection and identification algorithms would enable improved image processing. As surveys attempt to measure increasingly difficult and subtle features of the universe, like the imprint of dark energy and dark matter, identification of any kind of artifact will be critical. Project Lead: Dan Scolnic, Michael Troxel, Chris Walter

14. Race and Housing in Durham over the Course of the 20th Century

A team of students led by professor of Public Policy William Darity Jr. will chart the evolution of racial inequality in housing in a subset of Durham’s neighborhoods over the course of the 20th century, using census data and Durham County housing records. Students will select a sample of homes from those that appear in de-anonymized decennial censuses between 1920 and 1940, noting homeowner race and reported home value. Tenure (time since last sale), assessed home values and occupancy will be collected from county records for the period between 1940 and 2018. The set of homes will be selected to include a range of neighborhoods that vary in racial composition, zoning designation, and credit riskiness as determined by HOLC’s residential security (redlining) maps. The proposed approach allows the Data+ team to document racial differences in the evolution of home values, tenure and occupancy across neighborhoods. Project Leads: William Darity Jr. Project Manager: Omer Ali

15. Mental Health and the Justice System in Durham County

Mental Illness is over-represented in the incarcerated population, and is correlated with higher rates of re-arrest. In recent years, Durham County has taken many steps to break this unfortunate cycle, including helping incarcerated people to engage with mental health treatment resources. This team will work with collaborators at the Durham County Detention Facility, the Criminal Justice Resource Center, and the Duke Health System to determine if recently-incarcerated people in Durham are using the resources available to them, and if outcomes are improving. This team is a combined Data +/Bass Connections project, so students will be expected to commit to the project for Summer 2020 as well as academic year 2020-2021. Project Leads: Nicole Schramm-Sapyta, Maria Tackett Project Manager: Ruth Wygle

Page 6: 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie

2020 Projects

More info about Data+ visit: bigdata.duke.edu/data

Page 6 of 10

16. Piloting an Environmental Public Health Tracking Tool for North Carolina

The Data+ student team, led by epidemiologist Mike Dolan Fliss and colleagues from the NC Division of Public Health (DPH), will build a pilot Environmental Public Health Tracking (EPHT) tool for NC. Students will analyze and combine spatial health, environmental, and point-source data from NC DPH and other partners, then co-design and prototype visual dashboards for public use. Project Leads: Mike Dolan Fliss, Kim Gaetz Project Manager: Melyssa Minto

17. American Predatory Lending and the Global Financial Crisis (Year 2)

A team of students, led by researches in the Global Financial Markets Center at Duke Law, will carry forward the work of a 2019-20 Bass Connections team to better understand the state of the home mortgage market leading up to the financial crisis. The Data+ team will expand the scope of their analysis outside North Carolina and begin the process of developing a complete quantitative portrait on the state of the mortgage market in Sun Belt states. Following the work done this year, the Data+ team would be largely responsible for creating visualization devices to visualize at the census tract level different mortgage market statistics for the entire US based on the NC version created this year. Additionally, a model would be created to identify whether a loan is predatory or not. The output for this project will be displayed on a comprehensive website that is currently being constructed by the Bass Connections team. Project Lead: Lee Reiners

18. Predicting Baseball Players’ Athletic Performance Utilizing Baseline Assessments of Vision

A team of students led by researchers from the Duke Human Performance Optimization Lab (OptiLab) and the Michael W. Krzyzewski Human Performance Laboratory (K-Lab) will develop an analytic and report generating application to test if baseline vision and movement screening measures are able to predict on-field baseball performance in a cohort of nearly 300 athletes who participated in the USA Baseball Prospect Development Pipeline (PDP). Using machine learning and Bayesian hierarchical modeling, students will test data provided by USA baseball to identify relationships between baseline characteristics and performance in NCAA sanctioned and collegiate summer league games during the 2018 and 2019 seasons. The final deliverable will be both a report of the findings, and an analytic toolset that can be used within the PDP to provide direct feedback to the athletes about their future performance potential immediately following testing. As such, this program will provide valuable new information about the characteristics that predict successful athletic performance in demanding situations, and could be used to develop new approaches for talent identification within and beyond baseball. Project Leads: Greg Appelbaum, Marc Richard

Page 7: 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie

2020 Projects

More info about Data+ visit: bigdata.duke.edu/data

Page 7 of 10

19. For love of greed: tracing the early history of consumer culture

A team of students led by Dr. Astrid Giugni (Duke, English and ISS) and Dr. Jessica Hines (Brimingham-Southern College, English) will address the question of how to trace concepts that slowly developed alongside changing economic and social realities. We will track a set of related terms (such as consumer, greed, speculation, profit) in order to begin assessing how the ethical, political, and economic language of goods-consumption changed around the Protestant Reformation and the rise of the market economy. Using large databases-- EEBO (Proquest), ECCO (Gale), HathiTrust, and TEAMS (University of Rochester)—that contain scans and machine-readable Medieval and Early Modern texts, the group will track and analyze pamphlets, sermons, satires, and images to understand how the ethical discourse of consumerism changed over time. Project Leads: Astrid Giugni, Jessica Hines Project Manager: Chris Huebner

20. Deep Learning for Rare Energy Infrastructures in Satellite Imagery

A team of students led by researchers in the Energy Data Analytics Lab, Electrical & Computer Engineering, and with participation from the Energy Access Project will investigate how to use synthetically-generated satellite imagery to improve the identification of energy infrastructure in satellite imagery. The detected energy infrastructure will fill outstanding data gaps in the ability to identify pathways for electrification in low-income countries. The team will build the foundation for research that can identify objects that appear relatively rarely in satellite imagery and accomplish this using very limited training examples by creating realistic synthetic 3D models of those rare objects. This would greatly scale up the applicability of computer vision techniques for energy object identification in overhead imagery. Project Lead: Kyle Bradbury

21. Linking Urban Land Use to Aquatic Metabolism Regimes

A team of students led by researchers at the Duke River Center will develop tools to link water quality and aquatic ecosystem condition to urban and other land uses by combining existing geospatial data including land cover maps, LiDAR, and remotely-sensed images with time series of estimates of ecosystem metabolism found within the StreamPULSE data portal. Students will develop clustering tools for rapid identification of land use and other gradients that minimize confounding factors, and then will compare metabolic time series along these gradients to identify connections between catchment attributes and the seasonal and stochastic components of ecosystem function. This work will help Duke researchers determine thresholds of land use that protect aquatic ecosystem condition and will also generate generalizable workflows and data infrastructure that supports our open science data portal. Project Leads: Jim Heffernan, Phil Savoy

Page 8: 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie

2020 Projects

More info about Data+ visit: bigdata.duke.edu/data

Page 8 of 10

22. Computational Tools to Improve Healthy and Pleasurable Eating in Young Children

A team of students led by eating disorders expert Nancy Zucker and engineering professor Guillermo Sapiro will develop multimodal computational tools to help improve the nutritional status and food enjoyment of young children with Avoidant/Restrictive Food Intake Disorder (ARFID), children who are not eating enough food or are eating an inadequate variety of food to the degree that it impairs functioning. Students will analyze facial affect and behavior from videos of children trying new foods and will derive sensory profiles based on children’s patterns of food acceptance. These analyses will serve as the basis for personalized recommendations for parents that will suggest actionable next steps to increase their child’s food acceptance. Project Leads: Guillermo Sapiro, Nancy Zucker Project Manager: Julia Nichols

23. Applying Security Orchestration, Automation & Response (SOAR) to security threat hunting with Duke’s ITSO

Over the past several months, Duke's Information Technology Security Office (ITSO) has begun applying the MITRE ATT&CK framework as a basis for how the team collects, assesses, identifies and responds to attacker tactics, techniques, and procedures (TTPs). As the team rolls out new processes to "hunt" for attackers, a model that transitions the team's primary functions from defensive/reactive to offensive/proactive, the team will need to incorporate real time and longitudinal data analytics as well as incorporate automated responses based on these data analyses. This orchestration of the various tools and analysis of data will facilitate the automation of responses to attacker incursions. Given the amount of data, and speed needed to respond, application of machine learning techniques will be a necessary component. Project Lead: Jen Vizas

24. Predictive Churn Models for Duke Season Ticket Holders and Annual Donors

Duke season ticket holders are both strategically and financially important to Duke Athletics. One of the major challenges in retaining season ticket holders is understanding which are most likely to churn, i.e. not renew their tickets. A team of students, in conjunction with Duke’s Office of Information Technology and Duke Athletics, will make use of data from Duke’s ticketing system, to build a set of models that seeks to predict the profiles and timing of non-renewal of season ticket holders and annual donors. Project Leads: John Haws, Larry Cleaver Project Manager: Andrew Carr

Page 9: 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie

2020 Projects

More info about Data+ visit: bigdata.duke.edu/data

Page 9 of 10

25. AI in the Investment Office

A team of students will explore how artificial intelligence tools can be used to support the investment office at the Duke University Management Company (DUMAC). In particular, the team will investigate natural language processing and other AI methods for supporting the legal review process, investment analysis, and financial reporting. Project Lead: Robert McGrail, DUMAC Project Manager: Yi Wang

26. Data Science for Retention of College Women in Tech

A team of students will explore ways in which data science can help support the mission of Rewriting the Code, a national non-profit organization dedicated to empowering a community of college women with a passion for technology. In particular, students will perform statistical analyzes of past survey data, build out interactive dashboards that help visualize trends in student experience, and help design future survey questions. Project Lead: Sue Harnett Faculty Lead: Alexandra Cooper

27. Computational Approaches to the History of Cartography

A team of students will explore new ways of reading pre-modern maps and perspectival views through image tagging, annotation and 3D modeling. Each student will build a typology of icons found in these early maps (for example, houses, churches, roads, rivers, etc.). By extracting, modeling, and cataloging these features, the team will create a library of 2D and 3D objects that will be used to (a) identify patterns in how space and power are represented across these maps, and (b) to create a model for “experiencing” these maps in 3D, using the Unity game engine platform. This is a combined Data+ / Bass Connections project that will instruct students in qualitative and quantitative mapping techniques, basic 3D modeling and the history of cartography. Project Lead: Philip Stern, Ed Triplett Project Manager: Sam Horewood

Page 10: 2020 Projects - bigdata.duke.edu...Duke students have come from, identifying statistically significant shifts and patterns that warrant further study. Project Leads: Don Taylor, Valerie

2020 Projects

More info about Data+ visit: bigdata.duke.edu/data

Page 10 of 10

28. Neural Network-Based Self-Adjusting Computational Processors

A team of students, led by Electrical and Computer Engineering professor Vahid Tarokh, will develop methods to improve the efficiency of information processing with adaptive decisions according to the structure of new incoming data. Students will have the opportunity to explore data-driven adaptive strategies based on neural networks and statistical learning models, investigate trade-offs between error threshold and computational complexity for various fundamental operations, and implement software prototypes. The outcome of this project can potentially speed up many systems and networks involving data sensing, acquisition, and computation. Project Leads: Yi Feng, Vahid Tarokh