Flint’s Data Story: a Government, Corporate, and University Collaboration · 2018-09-17 ·...

6
Flint’s Data Story: a Government, Corporate, and University Collaboration Jared Webb University of Michigan Ann Arbor, MI [email protected] Jacob Abernethy Georgia Tech Atlanta, GA [email protected] Eric Schwartz University of Michigan Ann Arbor, MI [email protected] ABSTRACT When residents in Flint, Michigan discovered that their wa- ter had been contaminated due to leaching of lead from aging pipes, a simple fact was hard to ascertain: how widespread was the problem? As the city turned its focus to recovery, the water crisis became an information crisis. Experts be- lieved that lead service lines at residences were the primary source of lead in drinking water, but there were limited, out- dated, and sometime inaccurate records of which homes had dangerous pipes. In this paper, we detail our attempts to answer that solve that information challenge, illustrating a collaborations among many parties – the Flint’s city gov- ernment, the Michigan state government, Google, Captric- ity, and the University of Michigan campuses in Flint and Ann Arbor – to combine disparate records into a unified dataset describing each home in the city. Starting in 2016, a massive digitization effort has been underway putting to use over 100,000 work records over the past century, hand-drawn annotations on block-by-block maps covering over 50,000 ad- dresses, and thousands of recent physical inspections of pipe material underground. Together, this data has been used to inform city officials and build statistical models that accu- rately predict the location of the dangerous infrastructure. 1. INTRODUCTION The story of the Flint water crisis has its roots in the general decline of manufacturing the Midwest. Flint was once one of the most prosperous cities in the United States, with more than 80,000 people employed just in its General Motors fac- tories. However, broader economic changes lead to a decline that lasted decades and was exacerbated by the banking cri- sis in 2008. By 2011, the city’s finances were in shambles due to“structural debt”[11]. In 2014 the emergency manager ap- proved a change in the city’s water supply from the Detroit municipal system to the Flint River as a cost saving mea- sure. The new water had different chemical characteristics from the previous supply that were misunderstood or over- looked by water officials. The water was acidic enough to strip a layer of protective deposits off of lead pipes and leach lead and other metal into drinking water. Flint residents re- ported strange smells and colors in the new water[9], and were unknowingly exposed to elevated levels of lead. This continued for 2 years before a nurse noticed elevated levels of lead in children’s blood in 2016 [10] [15]. Bloomberg Data for Good Exchange Conference. 16-Sep-2018, New York City, NY, USA. The sensational story combined with the politics of an elec- tion year lead to tremendous attention payed to Flint as the crisis unfolded. President Obama visited the city and so did the candidates for President from the major parties. During this flurry of attention, the state government allo- cated $27M dollars for recovery, followed by an allocation of $100M by the Federal government. The road to recovery began with searching for the source of the lead in the wa- ter supply, and focus landed on the water service lines (See Figure 1) , which have been identified as a primary source of lead contamination of water in the United States [13]. Figure 1: Schematic describing a residential service line. The private portion of the service line from the house to the curb box is owned by the resident. The public portion from the curb box to the water main is owned by the city. (Michigan Department of Environmental Quality The city and state jointly appointed a team to coordinate using the allocated funds to replace the lead service lines. This team, called the Flint Fast Action and Sustainability Program (or FAST Start), had a primary objective of re- moving as much hazardous infrastructure as possible, up to funding levels. This would be a relatively straightforward task except for one thing: no one knew where the dangerous pipes were. In the 1980’s, the federal government instituted a regula- tion governing drinking water in the United States called

Transcript of Flint’s Data Story: a Government, Corporate, and University Collaboration · 2018-09-17 ·...

Page 1: Flint’s Data Story: a Government, Corporate, and University Collaboration · 2018-09-17 · Flint’s Data Story: a Government, Corporate, and University Collaboration Jared Webb

Flint’s Data Story: a Government, Corporate, andUniversity Collaboration

Jared WebbUniversity of Michigan

Ann Arbor, [email protected]

Jacob AbernethyGeorgia TechAtlanta, [email protected]

Eric SchwartzUniversity of Michigan

Ann Arbor, [email protected]

ABSTRACTWhen residents in Flint, Michigan discovered that their wa-ter had been contaminated due to leaching of lead from agingpipes, a simple fact was hard to ascertain: how widespreadwas the problem? As the city turned its focus to recovery,the water crisis became an information crisis. Experts be-lieved that lead service lines at residences were the primarysource of lead in drinking water, but there were limited, out-dated, and sometime inaccurate records of which homes haddangerous pipes. In this paper, we detail our attempts toanswer that solve that information challenge, illustrating acollaborations among many parties – the Flint’s city gov-ernment, the Michigan state government, Google, Captric-ity, and the University of Michigan campuses in Flint andAnn Arbor – to combine disparate records into a unifieddataset describing each home in the city. Starting in 2016, amassive digitization effort has been underway putting to useover 100,000 work records over the past century, hand-drawnannotations on block-by-block maps covering over 50,000 ad-dresses, and thousands of recent physical inspections of pipematerial underground. Together, this data has been used toinform city officials and build statistical models that accu-rately predict the location of the dangerous infrastructure.

1. INTRODUCTIONThe story of the Flint water crisis has its roots in the generaldecline of manufacturing the Midwest. Flint was once one ofthe most prosperous cities in the United States, with morethan 80,000 people employed just in its General Motors fac-tories. However, broader economic changes lead to a declinethat lasted decades and was exacerbated by the banking cri-sis in 2008. By 2011, the city’s finances were in shambles dueto “structural debt”[11]. In 2014 the emergency manager ap-proved a change in the city’s water supply from the Detroitmunicipal system to the Flint River as a cost saving mea-sure. The new water had different chemical characteristicsfrom the previous supply that were misunderstood or over-looked by water officials. The water was acidic enough tostrip a layer of protective deposits off of lead pipes and leachlead and other metal into drinking water. Flint residents re-ported strange smells and colors in the new water[9], andwere unknowingly exposed to elevated levels of lead. Thiscontinued for 2 years before a nurse noticed elevated levelsof lead in children’s blood in 2016 [10] [15].

Bloomberg Data for Good Exchange Conference.16-Sep-2018, New York City, NY, USA.

The sensational story combined with the politics of an elec-tion year lead to tremendous attention payed to Flint asthe crisis unfolded. President Obama visited the city andso did the candidates for President from the major parties.During this flurry of attention, the state government allo-cated $27M dollars for recovery, followed by an allocationof $100M by the Federal government. The road to recoverybegan with searching for the source of the lead in the wa-ter supply, and focus landed on the water service lines (SeeFigure 1) , which have been identified as a primary sourceof lead contamination of water in the United States [13].

Figure 1: Schematic describing a residential serviceline. The private portion of the service line fromthe house to the curb box is owned by the resident.The public portion from the curb box to the watermain is owned by the city. (Michigan Departmentof Environmental Quality

The city and state jointly appointed a team to coordinateusing the allocated funds to replace the lead service lines.This team, called the Flint Fast Action and SustainabilityProgram (or FAST Start), had a primary objective of re-moving as much hazardous infrastructure as possible, up tofunding levels. This would be a relatively straightforwardtask except for one thing: no one knew where the dangerouspipes were.

In the 1980’s, the federal government instituted a regula-tion governing drinking water in the United States called

Page 2: Flint’s Data Story: a Government, Corporate, and University Collaboration · 2018-09-17 · Flint’s Data Story: a Government, Corporate, and University Collaboration Jared Webb

the lead and copper rule. The Environmental ProtectionAgency was tasked with enforcing the regulation. Amongother things, it mandated a ceiling on lead levels in drinkingwater, and that cities maintain an inventory of their leadinfrastructure. Flint failed to maintain such records [7], andso the data indicating the location of dangerous pipes layburied underground.

This made it costly to discover the material of even a singlepipe, and so this information road block stood to use upvaluable resources in exploration that could be better usedactually replacing dangerous pipes. Our team began work-ing with FAST Start in 2016 to provide technical, statisti-cal, and algorithmic support in order to improve informationavailability and help guide decision making. We assembleddata from many sources, with support from the City of Flint,the State of Michigan, Google, and Captricity to build thelargest and most detailed data set pertaining to this crisis.Using these data, we have built tools that predict whichhomes are most at risk of dangerous water, and relevantto the replacement effort, predict the location of dangerouspipes with high accuracy.

In this paper we will describe how this data story unfolded.We will begin with what little was known before the crisisstarted, and describe each data source as it was cleaned andintegrated into our database. Finally, we will briefly describehow this database has been put to use.

2. PRE-CRISIS INFORMATIONIn this section we will describe what data existed beforethe crisis. This information was limited, and in some casesunreliable.

2.1 Parcel DataAt the beginning of our relationship with Flint, the citygenerously provided data on each of its more than 55,000parcels. These records include a unique parcel identifier andinformation generally important for tax purposes and citylogistics. For example, the name of the owner, the address,the property’s state estimated value, acreage, and zoningare included. A full description of this data is found in ourprevious work [4]. Other columns, such as the parcel’s va-cancy status and property age, illuminate the larger struc-tural problems within Flint (See Figure 2. For example,the vacancy rate is amongst the highest in the nation for acity of its size and median estimated home value is very lowcompared to other cities in Michigan.

Since the recovery process is happening parcel by parcel, thisinformation became something a “home base” for incomingdata. Whenever we came across new information we linkedit to the unique parcel identifier and incorporated it into thedescription for the parcel.

2.1.1 Hand Drawn MapsIn addition to the index cards, the city also discovered a setof hand annotated maps with indicated each parcel and itsservice line material (See bottom of Figure 3). This mapdata was digitized by the GIS center of the University ofMichigan - Flint campus, led by Dr. Martin Kauffman [8].These records were difficult to interpret, however. Many

Figure 2: This shows the distribution of homes bythe year that they were built. The vast majority ofhomes in Flint were built in the early 20th centuryand post World War 2 era - before the risks of leadplumbing were widely known and regulations werein place.

parcels simply recorded a “?” for their service line material.Other parcels listed two materials, i.e. “Copper/Lead,” butno interpretation on what these multiple labels mean. Asservice lines were replaced at several parcels, we were able tocompare these records to the truth in the ground. Currently,the evidence suggests that the multiple labels refer to theprivate and public portions of the service line. Specifically,a label of “Copper/Lead” indicates that the private portion(belonging to the home/business owner) is made of copper,but the public portion (belonging to the city water works) ismade of Lead. To further complicate matters, many parcelsonly have one label. In this case, the label may refer to onlyone side or both sides of the service line.

These records were initially seen as a boon to the replace-ment effort: just go to the homes that were recorded as hav-ing lead pipes and replace them. However, a large portionof the records are simply missing, and even more alarming,as pipes were actually inspected it was discovered that therecords were frequently inaccurate. See Table 1.

2.2 City Service Line Records2.2.1 City Work Order on Index Cards

As mentioned in the introduction, when the crisis began thecity failed to produce any records of service line materials.Eventually, a trove of more than 100,000 index cards wasfound in the basement of a government building[14] (SeeFigure 3). The index cards recorded work done by the waterdepartment on the city’s water infrastructure going backdecades, including requests to run service lines to particularhomes. At first, this data was too irregular and too largefor our team to deal with, and so was not incorporated intoour data.

Eventually, we were able to form a pro bono collaborationwith Captricity.com, which specializes in digitizing paperwork. Captricity was able to build a set of labeled dataon subset of the cards and automatically digitize the hand-written data on the remainder. Thus far, however, we havebeen unable match many of the information from this datato known service line materials. Evaluating the accuracy ofthis data is an ongoing effort.

Page 3: Flint’s Data Story: a Government, Corporate, and University Collaboration · 2018-09-17 · Flint’s Data Story: a Government, Corporate, and University Collaboration Jared Webb

Figure 3: Examples of 3x5 index cards (top andmiddle) that recorded installation notes about wa-ter service lines for over 100 years. There were alsohand-drawn maps (bottom), in water-damaged at-las books, with various markings representing thematerial used in the service lines (red annotatedcircles added by authors). When Flint’s water trou-bles began the city was initially unable to locate anyrecorded information on service lines, and the aboverecords were eventually found in a basement.

2.3 United States Census RecordsWhile the available parcel data described that physical re-ality of the parcels in Flint, there was no data about thepeople living there. While the demographics of Flint’s res-idents contributed no causal factors to the crisis, our teamand the city were interested in which populations were go-ing to be impacted the most. To that end, we gathered datafrom the United States Census Bureau.

Census data is sensitive and private, and so individual re-sponses on the census are not made public until many decadesafter the census is taken. However, a less specific data setcalled the ”American Community Survey” is publicly avail-able. This is a large demographic survey run by the bureaudown to the census block level (at least 30,000 square feet)(source). The information may be used by the governmentto predict migration, measure economic impact, or informlegislation. It is also available on the Census Bureau’s web-site.

We queried the Census Bureau’s database1 and pulled recordsfor the racial, age, and family demographics for the censusblocks in the city. Since the original parcel data includedthe block to which each parcel belonged, we were able tolink these data together.

3. PEAK CRISIS INFORMATION3.1 Pilot Service Line ReplacementIt was at the peak of the crisis that the federal and stategovernments began discussing funding for Flint’s recovery.Lawmakers decided that only a complete removal of leadinfrastructure from the city would be acceptable by theirconstituents. To that end, The FAST Start team began apilot phase to replace the service lines in a small sample ofhomes. In the spring of 2016, 36 homes (See Figure 4, upperleft) were selected based on risk factors such as the presenceof high lead levels and the presence of pregnant woman andchildren. A contractor carried out the replacements andreported that of the 36 homes, 33 had hazardous material intheir pipes, while 3 were safe. These were the first physicalverifications of service line materials since the crisis began.

Figure 4: Maps of Flint service line replacementsand inspections through different 2016-17: 36 homesthrough Mar. 2016 (Pilot program; left), 762homes through Dec. 2016 (left-center), 1079 homesthrough April 2017 (right-center), and 6,506 homesthrough Sep. 2017 (right). Red dots indicate dan-gerous public service lines, while green dots indicatesafe lines.

1https://www.census.gov/programs-surveys/acs/

Page 4: Flint’s Data Story: a Government, Corporate, and University Collaboration · 2018-09-17 · Flint’s Data Story: a Government, Corporate, and University Collaboration Jared Webb

Verified Service Line MaterialsRecord N/A C-C C-G C-L L-C L-G L-L L-OC 19 42 0 0 10 4 2 0C-L 16 7 0 2 134 9 2 0G-O 1 2 1 0 24 72 2 0L 7 2 0 0 5 3 14 1N/A 28 17 2 1 29 48 21 2

Table 1: Service lines verified, as of November 1,2016, vs City Records. The table is adopted from areport by the authors and the Flint Fast Start teampredicting a total of 29,100 homes requiring somedangerous service line portion replaced. The CityRecords come from UM Flint GIS Center data. C-C = Copper-Copper, L-C = Lead-Copper, C-G =Copper-Galvanized, L-G = Lead-Galvanized, C-L =Copper-Lead, L-L = Lead-Lead, L-O = Lead-Other.

3.2 Department of Environmental Quality In-spections

At around the same time that the pilot program was happen-ing, the state Department of Environmental Quality (DEQ)began to gather data on the private portion of residentialservice lines. Unlike the public portion, which runs exclu-sively underground from a curb box to the main water pipesunder city streets, the private service line typically runs intoa home’s basement. Therefore, the private service line ma-terial can be checked by a volunteer with minimal training,and no digging.

DEQ directed a team of government employees and volun-teers from the local plumbers union to carry out physicalinspections in more than 3,000 homes. After each inspectionthe workers submitted their findings to the DEQ, which inturn shared the data with our team. Since this data wascreated recently by careful inspectors, we consider it to bemore reliable than the city records with unknown prove-nance. Thus, we were able to partially evaluate the accuracyof the city records. We note, however, that the comparisonis not exact since the city records do not indicate whetherthey report the private or public portions of the service line,while the DEQ inspections only report the private portion.However, given even the most generous interpretation, theDEQ inspections show that the city records were only some-what correlated with the truth in the ground. See in Table 1the confusion matrix between the city records and all knownservice lines as of November 2016, and note that there aresubstantial discrepancies.

3.3 Water Testing DataAs part of public outreach in response to the crisis, the De-partment of Environmental Quality also initiated a watertesting program open to all residents of Flint. At variouslocations throughout the city a resident is able to pick upa free test kit. Residents then collect water from their owntaps and submit samples to the DEQ for analysis. In effortto increase transparency, the test results are available on theState’s website 2. For each sample we are given the date thesample was submitted, the lead and copper levels, and theaddress of the residence. Since the program began, morethan 25,000 samples have been submitted from more than

2http://www.michigan.gov/flintwater/

15,000 unique locations (there was no limit to the numberof times a resident could make a submission).

This corpus of water data represents by far the largest andmost representative sample available. However, these dataalso show that measuring lead contamination is a noisy pro-cess, with multiple tests from the same home sometimesyielding highly variable results. There are many sources ofnoise in this process, but the highest contributor is probablythe uncontrolled nature of the experiments. The instructionsfrom the DEQ state that a sample should be collected afterthe water has not moved through the pipes for at least 6hours - no taps opened or toilets flushed. This is a tall orderfor say, a home with small children.

Incorporating these records into the parcel information alsoposed a significant data challenge. All parcel data is con-nected to a unique parcel identifier, but these test resultswere linked to self reported addresses which were in turnhand digitized. Due to different naming conventions andhuman error, many addresses from test results could not bematched to the official addresses in the parcel data.

To get around this, we used the Google Maps API as a fuzzysearch engine3. We fed addresses from the parcel data andthe water testing data. When matches were found we wereable to connect the water testing data to a parcel, and fromthere to the other data we had gathered. This data seteventually became the input to a model that predicted howdangerous water was at each home in Flint (See Section 5.)

3.4 Sentinel SitesIn tandem with the residential testing program, the DEQalso ran what is called the “sentinel program,” in whicharound 400 homes were consistently tested over several months.These homes were chosen to be a representative sample ofthe city, and data from the program has been made availablepublicly at http://michigan.gov/flintwater4. However,some have questionedthe validity of the sample.

Sentinel sites were visited for water tests starting in 2016,and some homes were tested more often than others. Whilethis data is smaller an the residential testing program, webelieve the data to be more reliable since the tests wereconducted under more controlled circumstances.

4. RECOVERY PHASE INFORMATION4.1 Early Replacement ActivityOver the summer of 2016, FAST Start selected 200 homes forservice line replacement as part of “Phase One” of their re-covery plan. These homes were selected in a similar fashionto the Pilot Phase, but were concentrated in fewer neigh-borhoods. Replacements were scheduled to begin August31st.

By late September, the initial data from the Phase One re-placements was arriving and forced us to re-evaluate esti-3Thanks to a grant and API access from Google.org.4The sentinel data omit the full addresses of the homes, butour team was able to get access to these records with helpfrom the Michigan Governor’s office. This allowed us to linkeach home to the many variables describing each parcel ofproperty.

Page 5: Flint’s Data Story: a Government, Corporate, and University Collaboration · 2018-09-17 · Flint’s Data Story: a Government, Corporate, and University Collaboration Jared Webb

mates of the extent of lead infrastructure. While the cityrecords estimated that only 40% of the homes in Phase Onewould have lead public lines, in fact 96% (165/171) did. Itbecame clear that previous predictions would need to be ad-justed upwards, and that probably more than 20,000 homesin Flint would need replacement service lines. This discov-ery took place while the Federal Government was debatingthat $100 million dollar allocation for Flint’s recovery. Inlight of this information, our team wrote and published aninformal report that was reported in the local media [3, 5].This lead to a more formal technical report to the Mayor,the DEQ, and the EPA in November 2016 that, given thedata so far collected, between 20,600 and 37,100 homes inFlint would need new service lines.

4.2 Contractor Data CollectionPhase One of FAST Start’s recovery plan was only the firstof several phases of replacements of increasing size. Thou-sands of homes were slated for replacement, and the collec-tion and organization of the data generated by this effortwas an important logistics challenge. Initial plans called forpaper forms to be digitized later. However, given the previ-ous hurdles caused by human error and handwritten forms,it was clear that a digital first system connected to storageon the cloud would serve better.

Our team volunteered to facilitate this data collection andby the Fall of 2016 an app was live and being used by con-tractors on the ground (See Figure 5). The app is written inPython and Flask. For each replacement, contractors andofficials fill in information describing the replaced pipe ma-terials. This is then shared to a real time database that canin turn generate up to the minute maps of the replacementeffort for city officials and the FAST Start team. The appremains in use and updates what materials are being foundcontinuously.

Figure 5: Mobile and web app, developed by theauthors, to gather data from contractors on-sitereplacing service lines. Clicking that home’s iconon the map brought them to a form, which wasautomatically pre-populated with the home’s ad-dress, unique parcel ID, and latitude-longitude co-ordinates. The contractors then provided the keyinformation: what material was the public portionand private portion of the service line.

4.3 Hydrovac InspectionBy the end of the year, several hundred homes had hadtheir service lines replaced. These homes, however, weremostly concentrated in 3 neighbhorhoods and cherry picked

to have a very high likelihood of having lead pipes. Thebroader question of where lead service lines were in the cityremained.

A full scale excavation of a single home’s service line cost thecity $2,500. If the service line needed to be replaced, thenthe cost rose to $5,000. These expenses were prohibitivefor exploration of random homes in the city. we thereforeemphasized to FAST Start that a cheaper yet sound methodwas needed.

After consulting with experts and contractors, a solutionwas presented: hydro-vacuum excavation. A hyrdo-vacuumtruck, or just hydrovac, combines a powerful vacuum with ahigh pressure jet of water to quickly dig a small, straight holein the ground (See Figure 6). The cost to use a hydrovacto inspect a home’s public and private service lines couldbe as low as $250 without disrupting traffic or needing aresident’s approval. However, a hydrovac is not capable ofdigging through cement or asphalt, leading to some (20%-25%) unsuccessful excavations.

Figure 6: bla

The Hydrovac inspection was cheap and efficient enough towork as an exploration method. Since it’s implementation,several hundred homes have been inspected throughout thecity. The data generated by these inspections is stored cen-trally in a similar method to the replacements, via the samePython webapp that is used to report replacement activity.

5. DATA APPLICATIONSThough the primary focus of this paper is the data story inFlint and our collaborations across several organizations tocurate all these data, in this section we will briefly describesome of the applications for which the data has been applied.

5.1 Google.org Web AppIn 2016 Google.org partnered with the University of Michi-gan in Flint and Ann Arbor to give a grant to support workaiding Flint [12]. The result of this work was a web app,MyWater-Flint, that reported amongst other things a risklevel for each home in Flint.

5.2 Water Lead Level PredictionThe Michigan Data Science Team [6] (MDST) expanded thepredictions from the MyWater-Flint app with machine learn-ing anaylsis and statistical work to compensate for samplingbias in the public water test program. They published theirwork at the 2016 Data For Good Exchange and 2017 KDDconference in Nova Scotia, Canada. [4], [1].

Page 6: Flint’s Data Story: a Government, Corporate, and University Collaboration · 2018-09-17 · Flint’s Data Story: a Government, Corporate, and University Collaboration Jared Webb

5.3 Lead Service Line Prediction:ActiveRemediation

The highest impact work with this data has been the devel-opment of an active learning/machine learning hybrid modelthat efficiently discovers dangerous infrastructure in a city.Using the techniques we developed, our experiments showthat several million dollars could be saved by the city withby helping avoid unnecessary excavations (i.e., digging upa copper pipe that doesn’t need to be replaced). This re-search generalizes well beyond Flint, as many municipalitiesgrapple with the presence of antiquated and potentially dan-gerous infrastructure. These results will be presented at the2018 KDD conference in London [2].

6. CONCLUSIONThe Flint Water Crisis is an example of how important get-ting reliable, clean, structured data can be to an applieddata science problem. At the beginning of the crisis verylittle was known about how many and where the danger-ous service lines were located. Working with many partnersacross industry, academia, and government we have put to-gether a rich dataset describing the city and its service linesthat continues to be an important resource for the leadersof Flint’s recovery effort.

AcknowledgementsThe authors would like to thank the FAST Start team fortheir admirable efforts and willingness to collaborate withus. This includes Brigadier General (Ret.) Michael Mc-Daniel, Ryan Doyle, Major Nicholas Anderson, and KyleBaisden. Professors Lutgarde Raskin and Terese Olson, en-vironmental engineering faculty at U-M, provided invaluablescientific support throughout. We are grateful to the workof Professor Martin Kaufman and Troy Rosencrantz at U-M FlintaAZs GIS Center. We would like to thank Captric-ity, especially their machine learning team, Michael Zamora,Michael Zamora, David Shewfelt, and Kayla Pak for makingthe data accessible, and Kuang Chen for the generous sup-port. We had support from Mark Allison and his team of U-M Flint students. Rebecca Pettengill was generous with hertime and ability to help in the Flint community. We thankUM Professors Marc Zimmerman and Rebecca Cunninghamfor their encouraging and helpful discussions. Among themany students involved in this work, we would like to rec-ognize the roles of Jonathan Stroud and Chengyu Dai. Andthis work would not have happened without the expertiseand enthusiasm of the students in the Michigan Data Sci-ence Team 5. The authors gratefully acknowledge the finan-cial support of the Michigan Institute for Data Science (MI-

DAS), U-MaAZs Ross School of Business, Google.org, andNational Science Foundation CAREER grant IIS 1453304.

7. REFERENCES[1] J. Abernethy, C. Anderson, C. Dai, A. Farahi,

L. Nguyen, A. Rauh, E. Schwartz, W. Shen, G. Shi,J. Stroud, et al. Flint Water Crisis: Data-Driven RiskAssessment Via Residential Water Testing. arXivpreprint arXiv:1610.00580, 2016.

[2] J. Abernethy, A. Chojnacki, A. Farahi, E. Schwartz,and J. Webb. ActiveRemediation: The Search for

5:http://midas.umich.edu/mdst/

Lead Pipes in Flint, Michigan. In Proceedings of the24th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, KDD ’18,New York, NY, USA, 2018. ACM.

[3] S. Carmody and M. Brush. Flint might have a biggerproblem with lead pipes thanpreviously thought. http://michiganradio.org/post/flint-might-have-bigger-problem-lead-pipes-previously-thought,2016. (Accessed Feb, 16, 2017).

[4] A. Chojnacki, C. Dai, A. Farahi, G. Shi, J. Webb,D. T. Zhang, J. Abernethy, and E. Schwartz. A DataScience Approach to Understanding Residential WaterContamination in Flint. In Proceedings of the 23rdACM SIGKDD International Conference onKnowledge Discovery and Data Mining, KDD ’17,pages 1407–1416, New York, NY, USA, 2017. ACM.

[5] M. Dolan. Far more Flint homes have lead lines thanexpected, report shows, 2016. (Accessed Feb, 16,2017).

[6] A. Farahi and J. Stroud. The Michigan Data ScienceTeam: A Data Science Education Program withSignificant Social Impact. 2018.

[7] R. Fonger. Documents show flint filed false reportsabout testing for lead in water.https://www.mlive.com/news/flint/index.ssf/

2015/11/documents_show_city_filed_fals.html,November 2015. (Accessed July 8, 2018.

[8] R. Fonger. Flint data on lead water lines stored on45,000 index cards. 2015. (Accessed Feb, 16, 2017).

[9] R. Fonger. Here’s how that toxic lead gets into Flintwater.http://www.mlive.com/news/flint/index.ssf/

2015/10/see_step_by_step_how_lead_is_g.html,2015. (Accessed Feb, 16, 2017).

[10] M. Hanna-Attisha, J. LaChance, R. C. Sadler, andA. Champney Schnepp. Elevated blood lead levels inchildren associated with the Flint drinking watercrisis: a spatial analysis of risk and public healthresponse. American journal of public health,106(2):283–290, 2016.

[11] K. Longley. Emergency manager michael brownappointed to lead flint through second state takeover.https://www.mlive.com/news/flint/index.ssf/

2011/11/emergency_manager_michael_brow.html,November 2011. (Accessed July 8, 2018.

[12] M. Miller. Helping for the long term in flint, michigan.urlhttps://blog.google/outreach-initiatives/google-org/helping-for-long-term-in-flint-michigan/, May2016. (Accessed July *, 2018).

[13] A. Sandvig, P. Kwan, G. Kirmeyer, B. Maynard,D. Mast, R. R. Trussell, S. Trussell, A. Cantor, andA. Prescott. Contribution of service line and plumbingfixtures to lead and copper rule compliance issues.Environmental Protection Agency, WaterEnvironment Research Foundation, 2008.

[14] L. Smith. After news reports, state asks Flint for leadservice line info. http://michiganradio.org/post/after-news-reports-state-asks-flint-lead-service-line-info,2015. (Accessed Feb, 16, 2017).

[15] M. Torrice. How Lead Ended Up in Flint’s Tap Water.Chem. Eng. News, 94(7):26–29, 2016.