Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert...

11
Diabetes Care EditorsExpert Forum 2018: Managing Big Data for Diabetes Research and Care Diabetes Care 2019;42:11361146 | https://doi.org/10.2337/dci19-0020 Technological progress in the past half century has greatly increased our ability to collect, store, and transmit vast quantities of information, giving rise to the term big data.This term refers to very large data sets that can be analyzed to identify patterns, trends, and associations. In medicinedincluding diabetes care and researchdbig data come from three main sources: electronic medical records (EMRs), surveys and registries, and randomized controlled trials (RCTs). These systems have evolved in different ways, each with strengths and limitations. EMRs continuously accumulate information about patients and make it readily accessible but are limited by missing data or data that are not quality assured. Because EMRs vary in structure and management, comparisons of data between health systems may be difcult. Registries and surveys provide data that are consistently collected and representative of broad populations but are limitedin scope and maybe updated only intermittently. RCT databases excel in the specicity, completeness, and accuracy of their data, but rarely include a fully representative sample of the general population. Also, they are costly to build and seldom maintained after a trials end. To consider these issues, and the challenges and opportunities they present, the editors of Diabetes Care convened a group of experts in management of diabetes- related data on 21 June 2018, in conjunction with the American Diabetes Asso- ciations 78th Scientic Sessions in Orlando, FL. This article summarizes the discussion and conclusions of that forum, offering a vision of benets that might be realized from prospectively designed and unied data-management systems to support the collective needs of clinical, surveillance, and research activities related to diabetes. Within the span of their professional careers, older physicians and investigators have experienced a revolution in the management of data. In the 1960s, we wrote chart notes and prescriptions by hand, pored over large volumes in libraries, recorded notes from these tomes on 3- by 5-inch notecards, computed means and standard deviations on mechanical calculators, and composed manuscripts for publication on typewriters. Digital technologies ushered in a paradigm change for all of these practices. Large, slow mainframe computers were developed in that decade and, concurrently, defense and academic groups established electronic communication networks. These innovations were followed in the 1970s by smaller, yet more powerful, computers and, in the 1980s, by personal computers and expanded networks. Now we have the Internet, the World Wide Web, and cloudstorage capabilities, all of which can be accessed anywhere and at any time by individuals with smartphones and other small electronic devices. Our ability to collect, store, analyze, and transmit data has increased remarkably, giving rise to the collective term big data: extremely large data sets that can be analyzed to identify patterns, trends, and 1 Division of Endocrinology, Diabetes & Clinical Nutrition, Oregon Health & Science University, Portland, OR 2 Ochsner Diabetes Clinical Research Unit, Frank Riddick Diabetes Institute, Department of Endo- crinology, Ochsner Medical Center, New Orleans, LA 3 McMaster University and Hamilton Health Sci- ences, Hamilton, Ontario, Canada 4 Centers for Disease Control and Prevention, Atlanta, GA 5 Diabetes Trial Unit, Radcliffe Department of Medicine, University of Oxford, Oxford, U.K. 6 The George Washington University Biostatistics Center, Rockville, MD 7 Center for Health Research, Kaiser Permanente Northwest, Portland, OR 8 Brigham and Womens Hospital and Harvard Medical School, Boston, MA 9 American Diabetes Association, Arlington, VA Corresponding author: Matthew C. Riddle, [email protected] Received 20 March 2019 and accepted 20 March 2019 The ndings and conclusions in this report are those of the authors and do not necessarily represent the ofcial position of the Centers for Disease Control and Prevention. This article is featured in a podcast available at http://www.diabetesjournals.org/content/diabetes- core-update-podcasts. © 2019 by the American Diabetes Association. Readers may use this article as long as the work is properly cited, the use is educational and not for prot, and the work is not altered. More infor- mation is available at http://www.diabetesjournals .org/content/license. Matthew C. Riddle, 1 Lawrence Blonde, 2 Hertzel C. Gerstein, 3 Edward W. Gregg, 4 Rury R. Holman, 5 John M. Lachin, 6 Gregory A. Nichols, 7 Alexander Turchin, 8 and William T. Cefalu 9 1136 Diabetes Care Volume 42, June 2019 DIABETES CARE EXPERT FORUM

Transcript of Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert...

Page 1: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

Diabetes Care Editors’ ExpertForum 2018: Managing Big Datafor Diabetes Research and CareDiabetes Care 2019;42:1136–1146 | https://doi.org/10.2337/dci19-0020

Technological progress in the past half century has greatly increased our ability tocollect, store, and transmit vast quantities of information, giving rise to the term “bigdata.” This term refers to very large data sets that can be analyzed to identifypatterns, trends, and associations. In medicinedincluding diabetes care andresearchdbig data come from three main sources: electronic medical records(EMRs), surveys and registries, and randomized controlled trials (RCTs). Thesesystems have evolved in different ways, each with strengths and limitations. EMRscontinuously accumulate information about patients and make it readily accessiblebut are limited by missing data or data that are not quality assured. Because EMRsvary in structure and management, comparisons of data between health systemsmay be difficult. Registries and surveys provide data that are consistently collectedandrepresentativeofbroadpopulationsbutare limited inscopeandmaybeupdatedonly intermittently. RCT databases excel in the specificity, completeness, andaccuracy of their data, but rarely include a fully representative sample of thegeneral population. Also, they are costly to build and seldommaintained after a trial’send. To consider these issues, and thechallengesandopportunities theypresent, theeditors of Diabetes Care convened a group of experts in management of diabetes-related data on 21 June 2018, in conjunction with the American Diabetes Asso-ciation’s 78th Scientific Sessions in Orlando, FL. This article summarizes the discussionand conclusions of that forum, offering a vision of benefits that might be realizedfrom prospectively designed and unified data-management systems to support thecollective needs of clinical, surveillance, and research activities related to diabetes.

Within the span of their professional careers, older physicians and investigators haveexperienced a revolution in the management of data. In the 1960s, we wrote chartnotes andprescriptions by hand, poredover large volumes in libraries, recordednotesfrom these tomes on 3- by 5-inch notecards, computed means and standarddeviations on mechanical calculators, and composed manuscripts for publicationon typewriters. Digital technologies ushered in a paradigm change for all of thesepractices. Large, slow mainframe computers were developed in that decade and,concurrently, defense and academic groups established electronic communicationnetworks. These innovations were followed in the 1970s by smaller, yet morepowerful, computers and, in the 1980s, by personal computers and expandednetworks. Now we have the Internet, the World Wide Web, and “cloud” storagecapabilities, all ofwhich can be accessed anywhere and at any timeby individualswithsmartphones and other small electronic devices. Our ability to collect, store, analyze,and transmit data has increased remarkably, giving rise to the collective term “bigdata”: extremely large data sets that can be analyzed to identify patterns, trends, and

1Division of Endocrinology, Diabetes & ClinicalNutrition, Oregon Health & Science University,Portland, OR2Ochsner Diabetes Clinical Research Unit, FrankRiddick Diabetes Institute, Department of Endo-crinology, OchsnerMedical Center, NewOrleans,LA3McMaster University and Hamilton Health Sci-ences, Hamilton, Ontario, Canada4Centers for Disease Control and Prevention,Atlanta, GA5Diabetes Trial Unit, Radcliffe Department ofMedicine, University of Oxford, Oxford, U.K.6The George Washington University BiostatisticsCenter, Rockville, MD7Center for Health Research, Kaiser PermanenteNorthwest, Portland, OR8Brigham and Women’s Hospital and HarvardMedical School, Boston, MA9American Diabetes Association, Arlington, VA

Corresponding author: Matthew C. Riddle,[email protected]

Received 20March 2019 and accepted 20March2019

The findings and conclusions in this report arethose of the authors and do not necessarilyrepresent the official position of the Centersfor Disease Control and Prevention.

This article is featured in a podcast available athttp://www.diabetesjournals.org/content/diabetes-core-update-podcasts.

© 2019 by the American Diabetes Association.Readers may use this article as long as the workis properly cited, the use is educational and notfor profit, and the work is not altered. More infor-mation is available at http://www.diabetesjournals.org/content/license.

Matthew C. Riddle,1 Lawrence Blonde,2

Hertzel C. Gerstein,3 Edward W. Gregg,4

Rury R. Holman,5 John M. Lachin,6

Gregory A. Nichols,7 Alexander Turchin,8

and William T. Cefalu9

1136 Diabetes Care Volume 42, June 2019

DIABETES

CAREEX

PER

TFO

RUM

Page 2: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

associations. It is now, at least in prin-ciple, possible tomanagehugequantitiesof data over decades of time and amongregions globally.These tools fordatamanagementhave

long been recognized as relevant to ourefforts to improve health care (1), andcertainly this applies to clinical care andresearch in the field of diabetes. Somenotable examples deserve mention.Henry J. Kaiser, a prominent defensecontractor, developed health systemsfor employees at his shipyards in the1940s. The Kaiser systems applied busi-ness principles to health care, includingearly adoption of electronic medical re-cords (EMRs) (2). TheU.S. Department ofVeterans Affairs created an electronicdatabase for its geographically dispersedmedical systems in the 1980s (3). Pop-ulation-based medical registries havebeen established in the U.K. (4) andother countries. In the U.S., the NationalHealth and Nutrition Examination Survey(NHANES) began in 1971, and data col-lection continues to the present time (5).Likewise, the first large, randomized trialstesting interventions for diabetes werefacilitated by digital data management.TheUKProspectiveDiabetes Study (UKPDS)was launched in 1977 (6), and enrollmentin the Diabetes Control and Complica-tions Trial (DCCT) began in 1983 (7).Despite the strong influence of digital

technology on these projects, the sys-tems used for clinical care, epidemiologicsurveillance, and interventional trialshave grown and evolved in quite differ-ent ways. Their purposes and designsdiffer considerably, and collected dataare not easily compared among systems.To consider these issues, and both thechallenges and opportunities presentedby them, Diabetes Care convened a groupof experts in the field of diabetes digitaltechnology on 21 June 2018. Here, wereport a summary of the discussion and

conclusions of that forum. Thediscussionwas divided into the three categories ofdata-management systems briefly de-scribed in Table 1.

ELECTRONIC MEDICAL RECORDS:ENORMOUS POTENTIAL BUTLIMITATIONS

Ancient Egyptian physicians recordedtheir patients’ medical information onpapyri (8), and until recently, handwrit-ten records continued to be the norm.However, as digital technology devel-oped, it was quickly applied to medicalrecords. Some health systems intro-duced electronic data management inthe 1960s, with the focus initially onscheduling and billing. Over time, elec-tronicmedical record (EMR) systems haveexpanded to other aspects of patient careand, in the past decade, a growing numberof health care organizations have largelyabandoned paper-based records.

The potential of EMRs to make patient-related information more accessible isenormous. In 1863, Florence Nightingalecomplained that, “In attempting to arriveat the truth, I have applied everywherefor information, but in scarcely an in-stance have I been able to obtain hospitalrecords fit for any purposes of compar-ison” (9). Recent studies have demon-strated that use of EMRs can improvepreventive health services, decreasemedication errors, and facilitate popula-tion health management (10–13). In thecase of diabetes, analyses of data fromEMRs have been shown, in appropriatesettings, to help improve success in con-trolling glycemia, lipids, and blood pres-sure and to reduce the frequency ofemergency department visits and non-elective hospitalizations (14–16).

At the organizational level, review ofEMR data allows for the assessmentof clinical visit scheduling and reim-bursement, attendance and wait times,

medication prescription and dispensa-tion, and tracking of variously definedmeasures of quality of care. For pro-viders, EMRs allow immediate access topatients’ clinical histories, physicaland laboratory findings, and othercare-related information. Providers po-tentially can access clinical informationindependent of where they or theirpatients may be at a given time. Theimportance of this ability was illustratedby the experience after Hurricane Katrinastruck New Orleans and nearby areas in2005. Clinicians who had EMR accesscould provide information and adviceand fill prescriptions for their patientswho were widely dispersed across thecountry, whereas those without EMRslost contact with patients and perma-nently lost their paper records to stormand flood damage. Electronic recordsallow many different users to accessmedical information simultaneouslyand eliminate the costs of creatingand delivering hard copies of recordsto each clinician. Virtually instantaneousremote access by on-call clinicians, in-cluding those in emergency departmentsanddistant institutions, can assist in timelyprovision of care. For the care of thosewithdiabetes, use of EMRs facilitates track-ing of relevant clinical data over time, in-cluding weight, blood pressure, A1C, lipidmeasurements, and medications forcontrol of various risk factors. Becausea team of providersdincluding physi-cians and advanced practice providers,diabetes educators, nutritionists, andothersdis typically involved in care forpeople with diabetes, EMRs assist incoordinating multidisciplinary care.

There are also potential limitations tothe use of EMRs. Both isolated andsystematic unintended consequenceshave been reported (17–21). Workloadsof clinical providers may be increased andtheir morale impaired by the need to

Table 1—Typical features of current systems for managing medical data

Feature EMRsPublic surveys and

registries RCTs

Financial support Health system Government Government, industry, or voluntary healthorganization

Governance System administrators Government employees Academic partnership with sponsor

Population included Enrolled in public or commercialsystem

National or regional Selected for study, may be international

Time of data collection Continuous Periodic Specific interval

EMRs, electronic medical records; RCTs, randomized controlled trials.

care.diabetesjournals.org Riddle and Associates 1137

Page 3: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

enter orders for tests, prescriptions, andconsultationsdtasks previously per-formed by other health care personnel.Because of the need to review priorencounters and enter current data inthe examining room, both cliniciansand patients have sometimes com-plained about EMRs interfering withcommunication during visits (22,23). Al-though much energy is devoted to op-timizing the use of EMRs in managing thelogistical aspects of care (e.g., scheduling,billing, and process-based quality assess-ment), medical information needed forpersonalized management of complexconditions such as diabetes may beless easily collected, recorded, and visu-alized. Whereas the consistency andaccuracy of entries concerning financialor operational matters are routinelychecked by specialized personnel withinhealth systems, similar quality control israrely attempted for clinical entries. Theresult is variability and inconsistency incapturing even the most crucial medicalinformation in many cases. A notableexample is the difficulty of tracking in-sulin doses prescribed, as well as thoseactually takendespeciallywhenpatientsare actively self-managing their glycemiccontrol.Another is the lackof consistencyin distinguishing between type 1 (auto-immune-mediated) diabetes, type 2 di-abetes, and less common forms ofdiabetes in EMRs.Electronic record systems come in a

bewildering variety of configurations,and they frequently evolve over time.Therefore, careful implementation pro-cedures, including user training, are cru-cial to their success. Although broadprinciples of EMR design are well estab-lished (24,25), they are not universallyfollowed. As a result, many systemssuffer from discrepancies between soft-ware design, user needs, and clinicalworkflow, sometimes leading tonegativeperceptions of their value and reliability(Fig. 1) (26–29). Alignment of EMRs withthe activities and concerns of medicalproviders can be improved, but in manycases this is not occurring. Business-related aspects of EMR use can alsopose barriers. For example, EMR systemvendors may have contractual hold-harmless clauses that limit their account-ability for harm or inconvenience relatedto defects and malfunction. It may beunclear who is responsible for mainte-nance of services, and difficulties may

not be reliably reported. Governmentaloversight of the quality of EMR productsand services is limited (30–33).

There is considerable potential for theuse of data collected routinely in EMRsfor epidemiological surveillance or pro-spectively designed medical research(34,35). Some large health systemswith long-standing databases have pub-lished useful epidemiologic reports oftheir experience. Notable examples rel-evant to diabetes include early reports ofclinical inertia in advancing pharmaco-therapy of diabetes (36,37) and clinicalfeatures associatedwithhypoglycemia inclinical practice (38). However, there arelimitations to such use of data collectedin EMRs under current circumstances.These include missing or unreliable data,collected without consistent definitionsor quality control, and uncertain gener-alizability when data originate from asingle institution. These problems couldbe addressed and some attempts havebeenmade, although the success of suchefforts depends on allocation of addi-tional resources and support by healthsystem administrators (39–42).

SURVEYS AND REGISTRIES:MONITORING POPULATION-WIDETRENDS

Public health surveillance for chronicdiseases has also been greatly facilitatedby electronic data-management sys-tems. Surveillance can be defined as

quantitative monitoring of population-level incidence (risk) and prevalence(frequency) of disease and of provisionof preventive care, with attention tovariations according to personal charac-teristics, time, and location (43,44). Pe-riodic surveys can identify emerging riskfactors, new health problems and co-morbid conditions, gaps in care, andadverse events of treatment. Surveil-lance aims to identify subpopulationsthat are most at risk for a given diseaseor most likely to benefit from interven-tion. Data grouped according to specificcharacteristics of individuals may be de-scribed as a registry, which can be sys-tematically updated to provide targetedsurveillance of individuals sharing thischaracteristic.

Such information provides timely guid-ance for short-term decisions by policymakers, health plan administrators, clini-cians, and the public. It also permitsmore in-depth etiological analyses, cost-effectiveness determinations, andhealthimpact modeling, all relevant to long-termdecisions. When combined with relateddisciplines (e.g., clinical epidemiology,health services and policy research,health economics, and program man-agement evaluations), population sur-veillance forms the basis for publichealth strategies and resourceallocation.

Population-level surveillance for dia-betes is undergoing a rapid transforma-tion due to new health-related data

Figure 1—A conceptualmodel of differences between how electronic medical records are designed(Designermodel), functionality desired by the users (Usermodel), and how they are actually utilized(Activity model). Reprinted from Zhang J, Walji MF. TURF: toward a unified framework of EHRusability. J Biomed Inform 2011;44:1056–1067, with permission from Elsevier (29).

1138 Big Data for Diabetes Research and Care Diabetes Care Volume 42, June 2019

Page 4: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

sources and also computing and analyticapproaches to large data sets (45). Di-abetes surveillance in countries such astheU.S., Canada, Europe, Australia, Israel,and some Asian countries originatedmainly from public survey– and directregistry–based systems. In some settings,it is now extending to include healthsystem–based electronic registries link-ing EMR data, hospital and ambulatoryservices, laboratory and pharmacy data,and, most recently, various non-health-related data sources (46,47).

Surveillance Through Public SystemsNationally representative surveys in theU.S. that include assessment of diabetesprevalence have existed for more than50 years (Fig. 2), beginning with theNational Health Survey in the 1960s.Next came the National Health InterviewSurvey (NHIS), the first National Healthand Nutrition Examination Survey(NHANES I) in the 1970s, NHANES II inthe 1970s and 1980s, NHANES III in 1988–1994, and continuous NHANES surveysfrom 1999 to the present (48–51). Theseare coordinated by the Centers for Dis-ease Control and Prevention’s NationalCenter for Health Statistics.A suite of other health care surveysd

including the National AmbulatoryMedical Care Survey (52), Medical Ex-penditurePanel Surveys fromhealth caresettings (53), and the National HospitalDischarge Survey (later supplanted bythe National Inpatient Sample [NIS][54])dcollects data at the level of hos-pitals rather than individuals. Since1993 the Behavior Risk Factor Sur-veillance System (BRFSS) has providedpopulation-based surveys conducted at

the state level (46). These surveys arecomplemented by registries for selectedconditions suchas theUnitedStatesRenalData System for end-stage renal disease(55), or for special problems and pop-ulations (e.g., the prevalence of type 1 vs.type 2 diabetes in children in the SEARCHfor Diabetes in Youth study) (56). Similarevolution of surveillance has occurred inother countries as well. For example, theNational Diabetes Audit in the U.K. is oneof the largest annual clinical audits in theworld. It integrates data from both pri-mary and secondary care sources, withproviders legally required to supply thedata from their clinical practices (57).

Most of these surveys are designed toobtain repeatedcross-sectional, complexsamples with analytic weighting so thatthe estimates derived are representativeof the noninstitutionalized population,including people without health insur-ance. NHANES is the most comprehen-sive survey in the U.S., consisting of aquestionnaire, physical exam, and labo-ratory examinations every 2 years. It isthe primary source for tracking totalprevalence of diabetes, prediabetes,and undiagnosed diabetes, as well asselected risk factors and complications,including peripheral arterial disease, ret-inopathy, and chronic kidney disease(58–61). NHIS includes the single largestsample of the U.S. population and is theprimary source of self-reported inci-dence of diagnosed diabetes. It servesas the key platform for supplementalsurveys of issues ranging from healthcare access to preventive care (62,63).NIS is the main source of data for hos-pitalizations and procedures and is usedto estimate and track the incidence of

cardiovascular disease, stroke, and am-putation (64). BRFSS has been crucial inproviding state-level and,with assistanceof small-area statistical modeling,county-level prevalence and incidencerates of diabetes and prevalence of obe-sity and physical inactivity (65). Severalsurveys, including NHIS and NHANES,also have linkage to the National DeathIndex. This is an important associationthat allows mortality rates to be esti-mated for consecutive cohorts (66). Col-lectively, the publicly available surveyspermit researchers and policy makers tomonitor a broad range of metrics suchas behavioral and biochemical risk fac-tors, preventive behaviors, receipt ofpreventive care, risk factor manage-ment, diabetes-related complications,disability, and mortality (Fig. 2) (43).

However, these public surveys havesome fundamental limitations. First, theyare largely cross-sectional data sets.Apart from the mortality linkage, thelack of longitudinal data limits assess-ment of changes in risk and care and theability to examine the effectiveness oftreatments or the etiology of conditionsin individuals. Second, the ability toexamine geographic variation in risk,care, or outcomes is limited in mostsurveys. Thus, their utility for directlytargeting interventions to areas of great-est need in regions below the nationallevel is impaired. While BRFSS has beenuseful for estimation of state- andcounty-level prevalenceofdiabetes, obe-sity, and physical inactivity, there arelimitations of its design and data collec-tion that allow incidence rates to betracked reliably only at the national level(46,62,63,65). Third, although they aredesigned and weighted to be represen-tative of noninstitutionalized popula-tions, steadily declining response ratesare a growing threat to validity. Finally,despite improvements in the timelinessof collection and disclosure of data,periodic surveys do not always allowreal-time assessment of emerging prob-lems. Also, incorporating new elementsinto the surveys requires administrativereview and approval, which can be alengthy process.

Surveillance Within Health SystemsAs noted earlier, integrated health sys-tems in theU.S. and elsewhere haveusedEMRs and other systematically collecteddata for surveillance, development of

Figure 2—Overviewof diabetes-relatedmetricsmonitored in the U.S. via publicly available surveydata throughout the natural history of the disease. Adapted from Desai et al. (43).

care.diabetesjournals.org Riddle and Associates 1139

Page 5: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

registries, and evaluation of care withintheir populations. Direct clinical data insuch systems can be linked to pharmacyand laboratory information, allowingbroader assessment of processes andoutcomes (67,68). This experience hasset the stage for linkage of previouslyexisting public surveys and registries todata derived from direct patient contactwithin private systems. This trend hasbeen paralleled by conceptually similarpopulation-wide registries in countrieswith single-payer health systems, includ-ing Sweden, Finland, Denmark, and theU.K. (47,57,69–71). Combining these reg-istries has the advantage of allowingestimation of levels of care, risk-factormanagement, and rates of outcomes,taking a broader perspective than ispossible within a single database. Suchanalyses can lead to revaluation of med-ical practice methods, medication use,and the cost-effectiveness of specificinterventions within each system.Development of EMR-based registries

by privatelymanaged health systems hasalso provided an opportunity for largemulti–health system aggregators such asIBM MarketScan Research, DARTNet,Optum, the Centers for Medicare &Medicaid Services, and others. Their da-tabases contain information on billingclaims for various services, pharmacyrecords, and laboratory data on largesegments of the population. This infor-mation can be linked to other factors atthe health plan level or to external in-formation on geographic location andsocioeconomic patterns. Thus, theycan broaden the population includedbeyond that of individual health systems.However, these aggregating systems re-quire substantial financial resources andcan have other limitations. Althoughindividual-level longitudinal analyses arepossible with such systems, they can becomplicated by the flow of individuals inand out of health plans, requiring carefuldistinction between cross-sections andcohorts. Aggregated health-system dataalso may lack routinely collected infor-mation on health behaviors and anyinformation on the historically andgeographically variable proportion of theU.S. population that is uninsured. Fi-nally, as with databases within individualhealth systems, the completeness andreliability of aggregated data sets varieswidely and poses significant problems ofinterpretation.

CLINICAL TRIAL DATABASES:OPTIMIZED COLLECTION ANDANALYSIS FOR SPECIFICQUESTIONS

Complete and accurate quantitativedata are required for success in all dis-ciplines involved in scientific research.Clinical research can generally be classi-fied as either observational or experi-mental. Observational research relies ondata generated by people, clinics, insti-tutions, health systems, or devices thatare obtained, often passively, from sour-ces such as an EMR system. Any varietyof exposures, differences, or changescan be analyzed to identify relationshipsbetween the topic of investigation andvarious outcomes. Examples of topicsfor study include the uptake of a newdrug, a change in health policy, an in-crease or decrease in access to healthcare providers, genetic characteristics,or increasing duration of disease or sur-veillance. Clinical assessments thatcan be related to such topics includeweight or blood pressure, laboratorytests (e.g., A1C), health systemutilization(e.g., emergency room visits), symptom-atic events (e.g., hypoglycemia), andmedical outcomes (e.g., myocardial in-farction). All kinds of information col-lected during routine medical care mightbe used for observational research, anddata are increasingly stored in easilyaccessible digital forms to facilitate theiranalysis.

Although observational researchcan identify relationships, whether anyrelationship is caused by the exposureor by something else linked to the ex-posure (i.e., a confounding variable) ismore difficult to discern. Although so-phisticated statistical techniques canaccount for potential confounding vari-ables, they can only account for those thatare both known to be possible confound-ers and available in the database. Be-cause any relationship may reflect theeffect of an unknown number of con-founding factors, both measured andunmeasured, a causal effect suggestedfrom observational analysis should beviewed as hypothesis-generating ratherthan definitive evidence. The only excep-tion would be relationships that areextremely strong, such as the effect ofsmoking on the risk of lung cancer or theability of insulin to prevent death inpatients with type 1 diabetes.

Observational studies are no substi-tute for randomized controlled trials(RCTs) in establishing efficacy (72). TheRCT is the gold standard for detectingmodest but clinically important effectsof a treatment or intervention. Indeed, alarge number of RCTs conducted in thepast 25 years (73) have provided crucialinsights into the management of diabe-tes and have identified novel life-savingtherapies. In an RCT, the administrationof the exposure versus the comparator israndomly determined for two or moregroups, and the effect of one versusalternate exposures (comparators) isthen measured. The randomization pro-cess reduces confounding by construct-ing treatment groups that are, onaverage, expected to be similar exceptfor the extent of the exposure beingstudied. Thus, any difference in out-comes is attributable to the exposureand not something else, with a level ofconfidence that depends on the rigorwith which the study is designed andconducted. Many different exposurescan be tested in RCTs, including drugs,devices, monitoring procedures (e.g.,continuous glucose monitoring [CGM]),treatment algorithms, and adminis-trative policies. Whereas the unit ofrandomization is typically an individual,groups or clusters of individuals can alsobe randomly assigned, with differentclusters being randomly allocated todifferent exposures.

Although the methodological strengthsof randomization are profound, random-ization alone is not sufficient. Otherrequirementsmustbemet (Fig. 3). First, aclearly formulated and ethical researchquestion or hypothesis must be articu-lated as part of a carefully designedprotocol. This should be reviewed byimpartial experts to ensure that thequestion is important and that the re-search plan is ethical and feasible.Second, sufficient numbers of partici-pants must be enrolled within a shortenough period of time to ensure that theallocated groups are well matched andthat the trial will be finished quicklyenough to be relevant. Third, systemsshould be in place to ensure that peoplewho are allocated to the exposure beingtested actually adhere to or receive it.The lower the level of adherence, thesmaller the difference between the al-located groups will be, so that a trial withlow adherence may fail to detect very

1140 Big Data for Diabetes Research and Care Diabetes Care Volume 42, June 2019

Page 6: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

important effects of the exposure.Fourth, follow-up of study outcomesmust be as close to 100% as possibleto avoid the possibility that those whoare not followed in each of the treatmentgroups may differ in important ways fromthose who are followed. Otherwise,these differences may confound the ran-domization and limit the confidence ofconclusions about the effect of the ex-posure on the outcome. As illustrated inTable 2, most large-scale, randomizedcardiovascular outcomes trials in diabe-tes have achieved follow-up rates for vitalstatus approaching 100% (74–86). Fifth,systems need to be in place to ensurethat outcomes of interest occurring

during follow-up are reliably collectedand analyzed. This is accomplished forvery high percentages of participants intrials such as those shown in Table 2.Finally, the results should be analyzedaccording to the originally allocated ex-posure (i.e., through an intention-to-treat approach) regardless of adherenceto the exposure.

RCTs have additional strengths. First, iftwo or more interventions work in verydifferent ways, they can be tested at thesame time in a large enough RCT. Forexample, in a 2-by-2 factorial design allparticipants in the study population arerandomized to one intervention beingtested or to its comparator and also to

another intervention being tested or toits comparator. Such designs have beenextremely successful and have seldombeen undermined by unanticipated in-teractions between the therapies. Sec-ond, the database fromanRCT can alsobe used for observational research. In-deed, any analyses done on the data thatare not related to the comparison ofthe randomized treatment groups are es-sentially observational analyses throughwhich associations can be explored.Third, substudies that rely upon collec-tion of additional data (e.g., blood tests,images, or physiological measures) canalso be built into trials to help determinethemechanism of action of the exposure(e.g., the effects of a new drug).

Becauseaccuracyandcompletenessofdata are centrally important to RCTs,procedures have been devised to ensurethat the data collected are of the highestquality. Digital technology has been es-sential to this effort. Most RCTs havebeen done by specialized groups withinspecialized infrastructures erected tomonitor and support the work of eachtrial (i.e., a coordinating center, a steer-ing committee, multiple clinical sites, anevent adjudication committee, and adata and safety monitoring board).Such infrastructures are complex andexpensive, especially when there aremany study participants who are

Figure 3—Six crucial components that ensure the robustness of a randomized controlled trial.

Table 2—Ascertainment of vital status in recent diabetes outcomes trials

Trial acronym (drug studied) Patients randomized (n) Median follow-up (years) Vital status known (%)

ORIGIN (glargine & omega-3 fatty acid) (74) 12,537 6.2 99.0

SAVOR-TIMI 53 (saxagliptin) (75) 16,492 2.1 99.1

EXAMINE (alogliptin) (76) 5,308 1.5 99.5

TECOS (sitagliptin) (77) 14,671 3.0 97.5

EMPA-REG OUTCOME (empagliflozin) (78) 7,020 3.1 99.2

ELIXA (lixisenatide) (79) 6,068 2.1 99.0

LEADER (liraglutide) (80) 9,340 3.8 96.8

SUSTAIN-6 (semaglutide) (81) 3,297 2.1 99.6

CANVAS Program (canagliflozin) (82) 10,142 2.4 99.6

EXSCEL (exenatide) (83) 14,752 3.2 98.8

ACE (acarbose) (84) 6,522 5.0 94.4

HARMONY Outcomes (albiglutide) (85) 9,463 1.6 99.4

DECLARE-TIMI 58 (dapagliflozin) (86) 17,160 4.2 99.5

ACE, Acarbose Cardiovascular Evaluation; CANVAS, Canagliflozin Cardiovascular Assessment Study; DECLARE-TIMI 58, Dapagliflozin Effect onCardiovascular Events; ELIXA, Evaluation of Lixisenatide in Acute Coronary Syndrome; EMPA-REG OUTCOME, BI 10773 (Empagliflozin) CardiovascularOutcome Event Trial in Type 2 DiabetesMellitus Patients; EXAMINE, Examination of Cardiovascular Outcomes with Alogliptin versus Standard of Care;EXSCEL, Exenatide Study of Cardiovascular Event Lowering; HARMONY Outcomes, A Long Term, Randomized, Double-blind, Placebo-ControlledStudy to Determine the Effect of Albiglutide, When Added to Standard Blood Glucose Lowering Therapies, onMajor Cardiovascular Events in PatientsWith Type 2 Diabetes Mellitus; LEADER, Liraglutide Effect and Action in Diabetes: Evaluation of Cardiovascular Outcome Results; ORIGIN, OutcomeReduction With Initial Glargine Intervention; SAVOR-TIMI 53, Saxagliptin Assessment of Vascular Outcomes Recorded in Patients with DiabetesMellitus–Thrombolysis in Myocardial Infarction 53; TECOS, Trial Evaluating Cardiovascular Outcomes With Sitagliptin; SUSTAIN-6, Trial to EvaluateCardiovascular and Other Long-term Outcomes With Semaglutide in Subjects With Type 2 Diabetes.

care.diabetesjournals.org Riddle and Associates 1141

Page 7: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

geographically dispersed. The personneland procedures assembled for large RCTsare capable of collecting and rapidly pro-cessingmassive amounts of meticulouslycollected data. For example, the ORIGIN(Outcome Reduction With Initial Glar-gine Intervention) trial gathered data formore than 6 years concerning more than12,500 participants in 40 countries (74).Each trial’s infrastructure is usually dis-banded on completion of the trial.Databases ofwell-conducted RCTs con-

tain the highest-quality data, both toanswer the primary question that promp-ted the study and to generate additionalhypotheses from observational analyses.There are, however, some limitations tothese data sets. They are usually focusedon a very specific question that is ad-dressed by measuring a specific primaryoutcome of interest and may not includeobservations or measurements that arerelevant to some other questions. Fur-thermore, thepopulationstudieddwhichhas been selected for being both able andwilling to participatedmay not be com-pletely representative of the general pop-ulation. Efforts are commonly made toanalyze thedata in suchawayas toassessgeneralizability of the conclusions of theRCT,butquestionsoften remain (87). Therapid growth of both EMRs and largepublic and private registries has thepotential to address these problems.Specifically, such databases may facili-tate enrollment of suitable study pop-ulations for randomized trials and alsoassist in tracking various measures ofoutcome (88).

OPPORTUNITIES FOR IMPROVINGMEDICAL DATA MANAGEMENT

Although there are substantial variationsbetween databases within each cate-gory, those derived from EMRs, thosecreated from surveys and registries, andthose created for RCTs or prospective

observational trials generally differ in

their strengths and limitations. Someof their characteristics are summarized

in Table 3.Information in EMRs deals with large

groups of individual patients, includes a

comprehensive rangeof clinicalmaterial,

is collected continuously, and is intended

to be stored indefinitely. However, ex-

cept in the case of certain countrywide

health systems that provide care for all

citizens, the population included in an

EMR may not be fully representative of

the general population. Interpretation of

observations is generally limited by lack

of consistency and accuracy in data en-

tries. One important reason for this

difficulty is that data are typically entered

by many individual providers with little

support or monitoring by administrative

personnel. Further, the structure of

EMRs differs widely among health sys-

tems and, thus, pooling or comparison of

data from different systems is difficult.Registries or surveys can include data

that are consistently collected and oftenrepresentative of a whole population.However, surveillance data sets maynot be reliable in all cases (e.g., whencollected by self-report), usually focus onone or a few categories of data, and maybe updated only intermittently. Whenrepeated cross-sectional information iscollected, longitudinal observation ofindividuals may not be possible.

Databasescreatedspecifically for largeclinical trials excel in the specificity,

completeness, and accuracy of the

data collected. However, they rarely

include a fully representative sample

of the patient population and may

contain a relatively limited range of ob-

servations. They are also costly to build

and maintain, and often are not main-

tained after completion of the trial.

An Evolving Approach: DistributedData NetworksTo some degree, the limitations of theseseveralmodels for collecting andman-aging data can be overcome by devel-opment of comprehensive, distributed,multicenter data networks. Such “hub-and-spoke”models for linking individualdatabases differ from traditional multi-center studies or surveillance systems (inwhich all data are held centrally) in thatindividual data are maintained at theirsource, with analyses conducted periph-erally using centrally coordinated com-mon data models and analytic routines.Aggregate results standardized by thecommon data models and pre-established covariate adjustment canthen be returned to the coordinatingcenter for final analyses. Typically,such models have been used for com-parative effectiveness studies of phar-macological options, bariatric surgery,and adverse events of drugs, but lessoften for surveillance of variation in risk,care, or outcomes of diabetes or forpostmarketing drug surveillance pro-grams (89–91). There are also opportu-nities for wider integration across careproviders, including linkage of clinic EMRswith pharmacies to allow better monitor-ing of therapy adherence, and for auto-mated acquisition of data from personaldevices such as glucose monitors, insulinpumps and pens, exercise trackers, andhealth and fitness apps. One early exam-ple of regional EMR linkage for assess-mentofdiabeteswas theDARTS(DiabetesAudit and Research in Tayside, Scotland)study, which linked EMRs within a Scottishcommunity to create a diabetes registry(92). More recently, groups of clinicalinvestigators, such as the Blood PressureLowering Treatment Trialists’ Collabora-tion and the Cholesterol Treatment Tria-lists’ Collaboration, have formed for thepurpose of aggregating individual data

Table 3—Attributes of current and potential future systems for managing medical data

Attributes EMRsPublic surveysand registries RCTs

Retrospectively aggregateddata sets

Prospectively integrateddata systems

Representative 11 111 1 11 111

Consistent 1 111 111 11 111

Accurate 1 1 111 1 111

Comprehensive 111 1 1 1 111

Up-to-date 111 1 111 1 111

1, 11, or 111 refer to the relative strength of each attribute, with 111 denoting the strongest. EMRs, electronic medical records; RCTs,randomized controlled trials.

1142 Big Data for Diabetes Research and Care Diabetes Care Volume 42, June 2019

Page 8: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

from large trials (93,94). The possibilityof expanding the range of data collectionand analysis through such networks isobvious, but the quality of data can stillbe limited by inconsistencies and inac-curacies in clinical observations and dataentry, and even aggregated data may notfully represent the general population.

A Potential Solution: Unified Data-Management SystemsIn an ideal world, prospectively designedand unified data-management systemscould support clinical, surveillance, andresearch activities all together in a waythat circumvents many of the limitationsof current systems while drawing on thestrengths of each. An integrated systemwould, in theory, allow substantial sav-ings in costs of design, development,operation, and maintenance, and thesecosts could be shared among multiplestakeholders.Specifically, EMRs could be improved

by incorporating the more stringentmonitoring of data integrity, includingautomated validation of quantitativeclinical and laboratory entries, which istypically used in trial-management sys-tems. Population-wide surveillance ofdrug safety, rates of various adverseoutcomes, regional differences in pat-terns of care, and other public healthconcerns could be based on improvedand structured data collection duringroutine health care. Additionally, testingof new therapeutic agents, devices, orregimens could be embedded withinexisting health systems, using prospec-tively designed protocols for randomizedor nonrandomized treatment choicesand assessment of outcomes. Such anapproach would facilitate enrollment ofmore representative and larger patientcohorts at lower cost. Patient follow-upand therapeutic adherence likely wouldbe better if research studies were per-formed in a familiar usual-care setting.Additional benefits of conductingRCTs

or prospective observational studies us-ing an EMR-based system would includethe ability to follow patients passivelylong after themore structured initial partof the study. Long-termdeven lifetimedindividual follow-up could more fullycapture the risks and benefits of theinterventions evaluated and identify po-tential “legacy effects” persisting aftercompletion of an active intervention.Beyond the opportunities related to

individual trials and data cohorts, wide-spread implementation of unified data-management systems would facilitateroutine analysis of pooled individual pa-tient data, allowing greater representa-tion of the whole population than ispossiblewith currentmeta-analytic tech-niques. Access to such rich and long-termphenotypic information might also facil-itate use of biobank and genetic datato identify new biomarkers and theirrelationships to disease outcomes andto existing and future therapies. Suchunified big-data systems could providea unique platform for testing, validat-ing, and refining new analytic techniquessuch as artificial intelligence technolo-gies, potentially leading to new diag-nostic (95) and interventional tactics.

SUMMARY: MOVING TOWARDINTEGRATED SYSTEMS FORMANAGING MEDICAL DATA

As noted above, networks that draw onvaried sources are already accumulatingexperience with large, long-term aggre-gated data sets. For example, data col-lected over several decades in multiplepopulation-based registries in both Nor-way and Sweden have been analyzedwith the aimof improving regional healthpractices. These efforts have led to re-cent reports of marked and apparentlycontinuing reduction of end-stage renaldisease in type 1 diabetes during theperiod of surveillance (96,97). Also, a5-year population-based interventionprogram in Hong Kong, based on struc-tured EMR surveillance and decision-making by designated personnel trainedin diabetes management, prospectivelydemonstrated large concurrent reduc-tions of deaths, hospitalizations, andcosts for patients with type 2 diabetes(98,99). Thus, movement toward inte-gration of various kinds of health-relateddata is already underway.

As suggested by the examples above,the scale of the systems involved may berelevant to the success of integrated datamanagement. Sweden, Norway, andHong Kong all have populations in the5- to 10-million range, and all havecomprehensive publicly supervisedhealth systems providing services tonearly all citizens. Fortunately, at presentthe computational power of electronicsystems should not be a limiting factor inpursuing the goal of integrated data

management for diabetes. However,the organizational and practical barriersto implementing integrated programsmaybedaunting over a larger geographicrange and larger population than weredemonstrated in these examples.

Several specific requirements appearto be necessary for implementing suchsystems in any setting. One is agreementon the definitions of key terms and goalsin the management of diabetes, as forothermedical conditions. Some progresshas beenmade on this front for diabetes,as evidenced by consensus statementsregarding glycemic measurements, gly-cemic targets, and hypoglycemia promp-ted by the recent development of CGMdevices (100–102). Similarly, growingagreement on the properties and bestuses of various glucose-lowering thera-pies is apparent in recent consensusstatementsbyprofessional organizations(103,104). However, much remains to bedone. Unified data systems will requirenational and international standardiza-tion of nomenclature and the incorpo-ration of data dictionaries that can beused to encode diagnoses, procedures,drugs, and clinical outcomes. This mayrequire building on established systemssuch as the Systematized Nomenclatureof Medicine (SNOMED), InternationalClassification of Diseases (ICD), andWorld Health Organization (WHO) clas-sifications. Where required, mappingtools might be developed to allow trans-lation of data sets among disparate codingsystems. This is already being done bythe National Library of Medicine, whichmaps the ICD-9 Clinical Modification,ICD-10 Clinical Modification, ICD-10 Pro-cedure Coding System, and other classi-fication systems to SNOMED, with thegoal of establishing a universal taxon-omy.

Additional difficulties are posed byproprietary concerns of competinghealth systems, hardware and softwaremanufacturers, and data-managementgroups. Sharing of data and agreementon standardization of systems amongbusinesses that are competing in thesame markets may pose a significantbarrier. In addition, security of protectedhealth information must be ensured, andprocedures to accomplish it must beagreed upon by various stakeholders.

However, there is precedent for re-solving difficulties such as these. Stan-dardization of electronic systems,

care.diabetesjournals.org Riddle and Associates 1143

Page 9: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

definitions, procedures, and regulationsallowed for the development of interna-tional telephone service in the last cen-tury, and more recently the mechanics ofthe Internet and the World Wide Web.There seems no reason to believe thatgreater integration of data-managementsystems to facilitate diabetes care, sur-veillance, and research cannot be attained,given the potential for simultaneously im-proving medical outcomes and reducingoverall costs.In summary, integrated and improved

management of big data has the poten-tial to open a brave new world for di-abetes care and research. Already wesee successful proof-of-concept efforts,but further progress depends on over-coming logistical, administrative, andethical obstacles to linking currently sep-arate data-based activities.

Acknowledgments. Writing and editing sup-port services for this article were provided byDebbie Kendall of Kendall Editorial in Richmond,VA. The authors thank Christian S. Kohler, theAmerican Diabetes Association’s Associate Pub-lisher for Scholarly Journals, and his staff for theirassistance, guidance, and expertise in conveningthe 2018 Expert Forum.Duality of Interest. M.C.R. has received re-search grant support from AstraZeneca and EliLilly; honoraria for consulting from Adocia,AstraZeneca, DalCor, Dance, Elcelyx, Eli Lilly,GlaxoSmithKline, Sanofi, and Theracos; andhonoraria for speaking at a scientific meetingfrom Sanofi. L.B. has received research sup-port from Janssen Pharmaceuticals, LexiconPharmaceuticals, Merck, Novo Nordisk, andSanofi; has been a speaker for Janssen Phar-maceuticals, Novo Nordisk, and Sanofi; andhas been a consultant for AstraZeneca, GileadSciences, Janssen Pharmaceuticals, Merck,Novo Nordisk, and Sanofi. H.C.G. holds theMcMaster-Sanofi Population Health InstituteChair in Diabetes Research and Care. He hasreceived research grant support from Astra-Zeneca, Eli Lilly, Merck, Novo Nordisk, andSanofi; honoraria for speaking from Astra-Zeneca, Boehringer Ingelheim, Eli Lilly, NovoNordisk, and Sanofi; and consulting fees fromAbbott, AstraZeneca, Boehringer Ingelheim,Eli Lilly, Merck, Novo Nordisk, Janssen, andSanofi. R.R.H. has received research grantsupport from AstraZeneca, Bayer AG, andMerck Sharp & Dohme; honoraria for speakingfrom Bayer AG; and consulting fees fromBoehringer Ingelheim, Merck Sharp & Dohme,Novartis, and Novo Nordisk. G.A.N. has re-ceived research support from Boehringer In-gelheim, Merck, and Sanofi. A.T. has receivedresearch grant support from Eli Lilly and con-sulting fees from Monarch Medical Tech-nologies and has equity in Brio Systems. Noother potential conflicts of interest relevantto this article were reported.

References1. Wiederhold G. Database technology in healthcare. J Med Syst 1981;5:175–1962. Faltermayer EK. Better care at less cost with-out miracles. Medical College of Virginia Quar-terly 1970;6:111–1133. Andrews RD, Beauchamp C. A clinical data-base management system for improved integra-tion of the Veterans Affairs Hospital InformationSystem. J Med Syst 1989;13:309–3204. Chantler C, Clarke T, Granger R. Informationtechnology in the English National Health Ser-vice. JAMA 2006;296:2255–22585. National Center for Health Statistics. NationalHealth and Nutrition Examination Survey history[Internet], 14 November 2011. Available fromwww.cdc.gov/nchs/nhanes/history.htm. Accessed15 January 20196. UK Prospective Diabetes Study (UKPDS)Group. Intensive blood-glucose control withsulphonylureas or insulin compared with con-ventional treatment and risk of complications inpatients with type 2 diabetes (UKPDS 33). Lancet1998;352:837–8537. NathanDM,GenuthS, Lachin J, etal.;DiabetesControl and Complications Trial Research Group.The effect of intensive treatment of diabetes onthe development and progression of long-termcomplications in insulin-dependent diabetesmellitus. N Engl J Med 1993;329:977–9868. Evans RS. Electronic health records: then,now, and in the future. Yearb Med Inform2016;(Suppl. 1):S48–S619. Nightingale F. Notes on Hospitals. 3rd Ed.London, England, Longman, Green, Longman,Roberts, and Green, 186310. Kucher N, Koo S, Quiroz R, et al. Electronicalerts to prevent venous thromboembolismamong hospitalized patients. N Engl J Med2005;352:969–97711. Bright TJ, Wong A, Dhurjati R, et al. Effect ofclinical decision-support systems: a systematicreview. Ann Intern Med 2012;157:29–4312. Ammenwerth E, Schnell-Inderst P, MachanC, Siebert U. The effect of electronic prescribingon medication errors and adverse drug events:a systematic review. J Am Med Inform Assoc2008;15:585–60013. Dorr D, Bonner LM, Cohen AN, et al. In-formatics systems to promote improved care forchronic illness: a literature review. J Am MedInform Assoc 2007;14:156–16314. Cebul RD, Love TE, Jain AK, Hebert CJ.Electronic health records and quality of diabetescare. N Engl J Med 2011;365:825–83315. Reed M, Huang J, Graetz I, et al. Outpatientelectronic health records and the clinical care andoutcomes of patients with diabetes mellitus. AnnIntern Med 2012;157:482–48916. Reed M, Huang J, Brand R, et al. Im-plementation of an outpatient electronichealth record and emergency departmentvisits, hospitalizations, and office visits amongpatients with diabetes. JAMA 2013;310:1060–106517. McDonald CJ. Computerization can createsafety hazards: a bar-coding near miss. AnnIntern Med 2006;144:510–51618. Koppel R, Metlay JP, Cohen A, et al. Role ofcomputerized physician order entry systems infacilitating medication errors. JAMA 2005;293:1197–1203

19. Howe JL, Adams KT, Hettinger AZ, RatwaniRM. Electronic health record usability issues andpotential contribution to patient harm. JAMA2018;319:1276–127820. Sittig DF, Wright A, Ash J, Singh H. Newunintended adverse consequences of electronichealth records. YearbMed Inform 2016;Nov. 10:7–1221. Han YY, Carcillo JA, Venkataraman ST, et al.Unexpected increasedmortality after implemen-tation of a commercially sold computerizedphysician order entry system. Pediatrics 2005;116:1506–151222. Margalit RS, Roter D, DunevantMA, LarsonS, Reis S. Electronic medical record use andphysician-patient communication: an observa-tional study of Israeli primary care encounters.Patient Educ Couns 2006;61:134–14123. Street RL Jr, Liu L, Farber NJ, et al. Providerinteraction with the electronic health record: theeffects on patient-centered communication inmedical encounters. Patient Educ Couns 2014;96:315–31924. Bates DW, Kuperman GJ, Wang S, et al. Tencommandments for effective clinical decisionsupport: making the practice of evidence-basedmedicine a reality. J AmMed Inform Assoc 2003;10:523–53025. Cresswell KM, Bates DW, Sheikh A. Ten keyconsiderations for the successful optimizationof large-scale health information technology.J Am Med Inform Assoc 2017;24:182–18726. Kaipio J, Laaveri T, Hypponen H, et al. Us-ability problems do not heal by themselves:national survey on physicians’ experienceswith EHRs in Finland. Int J Med Inform 2017;97:266–28127. Topaz M, Ronquillo C, Peltonen LM, et al.Nurse informaticians report low satisfactionand multi-level concerns with electronic healthrecords: results from an international survey.AMIA Annu Symp Proc 2017;2016:2016–202528. Payne TH, Corley S, Cullen TA, et al. Reportof the AMIA EHR-2020 Task Force on the statusand future direction of EHRs. J Am Med InformAssoc 2015;22:1102–111029. Zhang J, Walji MF. TURF: toward a unifiedframework of EHR usability. J Biomed Inform2011;44:1056–106730. Koppel R, Kreda D. Health care informationtechnology vendors’ “hold harmless” clause:implications for patients and clinicians. JAMA2009;301:1276–127831. Goodman KW, Berner ES, Dente MA, et al.;AMIA Board of Directors. Challenges in ethics,safety, best practices, and oversight regardingHIT vendors, their customers, and patients: a re-port of an AMIA special task force. J Am MedInform Assoc 2011;18:77–8132. Ratwani RM, Hodgkins M, Bates DW. Im-proving electronic health record usability andsafety requires transparency. JAMA 2018;320:2533–253433. Tahir D. Doctors barred from discussingsafety glitches inUS-funded software. Politico,11September 2015 [article online]. Available fromwww.politico.com/story/2015/09/doctors-barred-from-discussing-safety-glitches-in-us-funded-software-213553. Accessed 17 January 201934. Coorevits P, Sundgren M, Klein GO, et al.Electronic health records: new opportunities forclinical research. J Intern Med 2013;274:547–560

1144 Big Data for Diabetes Research and Care Diabetes Care Volume 42, June 2019

Page 10: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

35. U.S. Department of Health and Human Ser-vices, Food and Drug Administration, Centerfor Drug Evaluation and Research, Center forBiologics Evaluation and Research, Center forDevices and Radiological Health. Use ofelectronic health record data in clinical in-vestigations: guidance for industry. July 2018.Available from www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM501068.pdf. Accessed 17 Jan-uary 201936. Brown JB, Nichols GA. Slow response to lossof glycemic control in type 2 diabetes mellitus.Am J Manag Care 2003;9:213–21737. Brown JB, Nichols GA, Perry A. The burden oftreatment failure in type 2 diabetes. DiabetesCare 2004;27:1535–154038. Whitmer RA, Karter AJ, Yaffe K, QuesenberryCP Jr, Selby JV. Hypoglycemic episodes and risk ofdementia in older patients with type 2 diabetesmellitus. JAMA 2009;301:1565–157239. DeMoorG, SundgrenM, Kalra D, et al. Usingelectronichealth records for clinical research: thecase of the EHR4CR project. J Biomed Inform2015;53:162–17340. Fleurence RL, Curtis LH, Califf RM, Platt R,Selby JV, Brown JS. Launching PCORnet, a na-tional patient-centered clinical research net-work. J Am Med Inform Assoc 2014;21:578–58241. Visweswaran S, Becich MJ, D’Itri VS, et al.Accrual to Clinical Trials (ACT): a clinical andtranslational science award consortiumnetwork.JAMIA Open 2018;1:147–15242. MastellosN, AndreassonA, Huckvale K, et al.A cluster randomised controlled trial evaluatingthe effectiveness of eHealth-supported patientrecruitment inprimary care research: theTRANS-FoRm study protocol. Implement Sci 2015;10:1543. Desai J, Geiss L, Mukhtar Q, et al. Publichealth surveillance of diabetes in the UnitedStates. J Public Health Manag Pract 2003;(Suppl.):S44–S5144. Ali MK, Siegel KR, Laxy M, Gregg EW. Ad-vancing measurement of diabetes at the pop-ulation level. Curr Diab Rep 2018;18:10845. Toh S, Platt R. Is size the next big thing inepidemiology? Epidemiology 2013;24:349–35146. Centers for Disease Control and Prevention.Diabetes home: data and statistics [Internet].2019. Available from www.cdc.gov/diabetes/statistics/index.htm. Accessed 12 April 201947. CarstensenB, Kristensen JK,MarcussenMM,Borch-JohnsenK. TheNationalDiabetesRegister.Scand J Public Health 2011;39(Suppl):58–6148. Birkner R. Plan and initial program of theHealth Examination Survey. Vital Health Stat 11965;3:1–4349. National Center for Health Statistics. Planand operation of the Health and Nutrition Ex-amination Survey: United Statesd1971-1973[Internet]. Available from: https://www.cdc.gov/nchs/data/series/sr_01/sr01_010a.pdf. Ac-cessed 12 April 201950. National Center for Health Statistics. Planand operation of the Third National Health andNutrition Examination Survey, 1988-94. Series 1:programs and collectionprocedures. Vital HealthStat 1 1994;32:1–40751. National Center for Health Statistics. NHANES1999–2000:Data,documentation, codebooks, SAScode [Internet]. Available from wwwn.cdc.gov/

nchs/nhanes/ContinuousNhanes/Default.aspx?BeginYear51999. Accessed 18 January 201952. Centers for Disease Control and Prevention.Ambulatory health care data. Available fromwww.cdc.gov/nchs/ahcd/index.htm. Accessed14 March 201953. Agency forHealthcare Research andQuality.Medical Expenditure Panel Survey. Availablefrom meps.ahrq.gov/mepsweb. Accessed 14March 201954. Healthcare Cost and Utilization Project.Overview of the National (Nationwide) InpatientSample (NIS) [Internet], 2018. Available fromhttps://www.hcup-us.ahrq.gov/nisoverview.jsp.Accessed 18 January 201955. Collins AJ, Foley RN, Gilbertson DT, Chen SC.United States Renal Data System public healthsurveillance of chronic kidney disease and end-stage renal disease. Kidney Int Suppl (2011) 2015;5:2–756. Dabelea D, Mayer-Davis EJ, Saydah S, et al.;SEARCH for Diabetes in Youth Study. Prevalenceof type 1 and type 2 diabetes among children andadolescents from 2001 to 2009. JAMA 2014;311:1778–178657. NHS Digital. National Diabetes Audit. Avail-able from digital.nhs.uk/data-and-information/clinical-audits-and-registries/national-diabetes-audit. Accessed 6 February 201958. Centers for Disease Control and Prevention.National Diabetes Statistics Report, 2017. At-lanta, GA, Centers for Disease Control and Pre-vention, U.S. Department of Health and HumanServices, 201759. ZhangY,Huang J,WangP.Apredictionmodelfor the peripheral arterial disease using NHANESdata. Medicine (Baltimore) 2016;95:e345460. Zhang X, Saaddine JB, Chou C-F, et al. Prev-alence of diabetic retinopathy in the UnitedStates, 2005-2008. JAMA 2010;304:649–65661. Murphy D,McCulloch CE, Lin F, et al.; CentersforDisease Control andPreventionChronic KidneyDisease Surveillance Team. Trends in the preva-lence of chronic kidney disease in the UnitedStates. Ann Intern Med 2016;165:473–48162. Geiss LS, Wang J, Cheng YJ, et al. Prevalenceand incidence trends for diagnosed diabetesamong adults aged 20 to 79 years, United States,1980-2012. JAMA 2014;312:1218–122663. Centers for Disease Control and Prevention.National Health Interview Survey: question-naires, datasets, and related documentation[Internet]. Available fromwww.cdc.gov/nchs/nhis/nhis_questionnaires.htm2015. Accessed20 April 201764. Gregg EW, Li Y, Wang J, et al. Changes indiabetes-related complications in the UnitedStates, 1990-2010. N Engl J Med 2014;370:1514–152365. Centers for Disease Control and Prevention(CDC). Estimated county-level prevalence of di-abetes andobesity -United States, 2007.MMWRMorb Mortal Wkly Rep 2009;58:1259–126366. National Center for Health Statistics. NCHSdata linked to NDI morbidity files [Internet].Available from www.cdc.gov/nchs/data-linkage/mortality.htm. Accessed 18 January 201967. EngelgauMM, Geiss LS, Manninen DL, et al.;CDC Diabetes in Managed Care Work Group.Use of services by diabetes patients in managedcare organizations. Development of a diabetes

surveillance system. Diabetes Care 1998;21:2062–206868. Selby JV, Ray GT, Zhang D, Colby CJ. Excesscosts of medical care for patients with diabetesin a managed care population. Diabetes Care1997;20:1396–140269. Gudbjornsdottir S, Cederholm J, Nilsson PM,Eliasson B; Steering Committee of the SwedishNationalDiabetesRegister. TheNationalDiabetesRegister in Sweden: an implementation of the St.Vincent Declaration for Quality Improvement inDiabetesCare.DiabetesCare2003;26:1270–127670. Niemi M, Winell K. Diabetes in Finland:Prevalence and Variation in Quality of Care.Available from http://www.diabetes.fi/files/1105/Diabetes_in_Finland._Prevalence_and_Variation_in_Quality_of_Care.pdf. Accessed 6 February 201971. Newton J, Garner S. Disease Registers inEngland: a Report Commissioned by the Depart-ment of Health Policy Research Programme inSupport of theWhite Paper Entitled Saving Lives:Our Healthier Nation. Oxford, U.K., Institute ofHealth Sciences, University of Oxford, 200272. Gerstein HC, McMurray J, Holman RR. Real-world studies no substitute for RCTs in establish-ing efficacy. Lancet 2019;393:210–21173. Cefalu WT, Kaul S, Gerstein HC, et al. Car-diovascular outcomes trials in type 2 diabetes:where do we go from here? Reflections from aDiabetes Care Editors’ Expert Forum. DiabetesCare 2018;41:14–3174. Gerstein HC, Bosch J, Dagenais GR, et al.;ORIGIN Trial Investigators. Basal insulin andcardiovascular and other outcomes in dysglyce-mia. N Engl J Med 2012;367:319–32875. Scirica BM, Bhatt DL, Braunwald E, et al.;SAVOR-TIMI 53 Steering Committee and Inves-tigators. Saxagliptin and cardiovascular out-comes in patients with type 2 diabetesmellitus. N Engl J Med 2013;369:1317–132676. White WB, Cannon CP, Heller SR, et al.;EXAMINE Investigators. Alogliptin after acutecoronary syndrome in patients with type 2 di-abetes. N Engl J Med 2013;369:1327–133577. Green JB, Bethel MA, Armstrong PW, et al.;TECOS Study Group. Effect of sitagliptin oncardiovascular outcomes in type 2 diabetes. NEngl J Med 2015;373:232–24278. ZinmanB,WannerC, Lachin JM,et al.; EMPA-REG OUTCOME Investigators. Empagliflozin, car-diovascular outcomes, and mortality in type 2diabetes. N Engl J Med 2015;373:2117–212879. Pfeffer MA, Claggett B, Diaz R, et al.; ELIXAInvestigators. Lixisenatide in patients with type 2diabetes and acute coronary syndrome. N Engl JMed 2015;373:2247–225780. Marso SP, Daniels GH, Brown-Frandsen K,et al.; LEADER Steering Committee; LEADER TrialInvestigators. Liraglutide and cardiovascular out-comes in type 2 diabetes. N Engl JMed2016;375:311–32281. Marso SP, Bain SC, Consoli A, et al.; SUSTAIN-6 Investigators. Semaglutide and cardiovascularoutcomes in patients with type 2 diabetes. N EnglJ Med 2016;375:1834–184482. Neal B, Perkovic V, Mahaffey KW, et al.;CANVAS Program Collaborative Group. Canagliflozinand cardiovascular and renal events in type 2diabetes. N Engl J Med 2017;377:644–65783. Holman RR, Bethel MA, Mentz RJ, et al.;EXSCEL Study Group. Effects of once-weeklyexenatide on cardiovascular outcomes in

care.diabetesjournals.org Riddle and Associates 1145

Page 11: Diabetes Care Editors’ Expert Forum 2018: Managing Big ... · Diabetes Care Editors’ Expert Forum 2018: Managing Big Data for Diabetes Research and Care ...

type 2 diabetes. N Engl J Med 2017;377:1228–123984. Holman RR, Coleman RL, Chan JCN, et al.;ACE Study Group. Effects of acarbose on cardio-vascular and diabetes outcomes in patients withcoronary heart disease and impaired glucosetolerance (ACE): a randomised, double-blind,placebo-controlled trial. Lancet Diabetes Endo-crinol 2017;5:877–88685. Hernandez AF, Green JB, Janmohamed S,et al.; Harmony Outcomes committees and in-vestigators. Albiglutide and cardiovascular out-comes in patients with type 2 diabetes andcardiovascular disease (Harmony Outcomes):a double-blind, randomised placebo-controlledtrial. Lancet 2018;392:1519–152986. Wiviott SD, Raz I, Bonaca MP, et al.;DECLARE–TIMI 58 Investigators. Dapagliflozinand cardiovascular outcomes in type 2 diabetes.N Engl J Med 2019;380:347–35787. Kosiborod M, Lam CSP, Kohsaka S, et al.;CVD-REAL Investigators and Study Group. Car-diovascular events associated with SGLT-2 in-hibitors versus other glucose-lowering drugs: theCVD-REAL 2 study. J Am Coll Cardiol 2018;71:2628–263988. Bowman L, Mafham M, Stevens W, et al.;ASCEND Study Collaborative Group. ASCEND: AStudy of Cardiovascular Events iN Diabetes:characteristics of a randomized trial of aspirinand of omega-3 fatty acid supplementation in15,480 people with diabetes. Am Heart J 2018;198:135–14489. Brown JS, Holmes JH, Shah K, Hall K, LazarusR, Platt R. Distributed health data networks:a practical and preferred approach to multi-institutional evaluations of comparative ef-fectiveness, safety, and quality of care.Med Care2010;48(Suppl.):S45–S5190. Maro JC, Platt R, Holmes JH, et al. Design of anational distributed health data network. AnnIntern Med 2009;151:341–34491. Curtis LH, Brown J, Platt R. Four health datanetworks illustrate the potential for a shared

national multipurpose big-data network. HealthAff (Millwood) 2014;33:1178–118692. Morris AD, Boyle DIR, MacAlpine R, et al.;DARTS/MEMOCollaboration.TheDiabetesAuditand Research in Tayside Scotland (DARTS) study:electronic record linkage to create a diabetesregister. BMJ 1997;315:524–52893. Ninomiya T, Perkovic V, Turnbull F, et al.;Blood Pressure Lowering Treatment Trialists’Collaboration. Blood pressure lowering and ma-jor cardiovascular events in people with andwithout chronic kidney disease: meta-analysisof randomised controlled trials. BMJ 2013;347:f568094. Kearney PM, Blackwell L, Collins R, et al.;Cholesterol Treatment Trialists’ (CTT) Collabo-rators. Efficacy of cholesterol-lowering therapyin 18,686 people with diabetes in 14 randomisedtrials of statins: a meta-analysis. Lancet 2008;371:117–12595. van der Heijden AA, AbramoffMD, VerbraakF, van HeckeMV, Liem A, Nijpels G. Validation ofautomated screening for referable diabetic ret-inopathy with the IDx-DR device in the HoornDiabetesCareSystem.ActaOphthalmol2018;96:63–6896. Gagnum V, Stene LC, Leivestad T, Joner G,Skrivarhaug T. Long-term mortality and end-stage renal disease in a type 1 diabetes pop-ulation diagnosed at age 15–29 years in Norway.Diabetes Care 2017;40:38–4597. Toppe C, Mollsten A, Waernbaum I, et al.;Swedish Childhood Diabetes Study Group andthe Swedish Renal Register. Decreasing cumu-lative incidence of end-stage renal disease inyoung patients with type 1 diabetes in Sweden:a 38-year prospective nationwide study. Diabe-tes Care 2019;42:27–3198. Wan EYF, Fung CSC, Jiao FF, et al. Five-yeareffectiveness of the multidisciplinary RiskAssessment and Management Programme–Diabetes Mellitus (RAMP-DM) on diabetes-related complications and health serviceuses: a population-based and propensity-

matched cohort study. Diabetes Care 2018;41:49–5999. Jiao FF, Fung CSC, Wan EYF, et al. Five-yearcost-effectiveness of the multidisciplinary RiskAssessment and Management Programme-Diabetes Mellitus (RAMP-DM). Diabetes Care2018;41:250–257100. Petrie JR, Peters AL, Bergenstal RM, HollRW, Fleming GA, Heinemann L. Improving theclinical value and utility of CGM systems: issuesand recommendations: a joint statement of theEuropean Association for the Study of Diabetesand the American Diabetes Association DiabetesTechnologyWorking Group. Diabetes Care 2017;40:1614–1621101. Danne T, Nimri R, Battelino T, et al. In-ternational consensus on use of continuousglucose monitoring. Diabetes Care 2017;40:1631–1640102. Agiostratidou G, Anhalt H, Ball D, et al.Standardizing clinically meaningful outcomemeasures beyond HbA1c for type 1 diabetes:a consensus report of the American Associationof Clinical Endocrinologists, the American Asso-ciation of Diabetes Educators, the AmericanDiabetes Association, the Endocrine Society,JDRF International, The Leona M. and Harry B.Helmsley Charitable Trust, the Pediatric Endo-crine Society, and the T1D Exchange. DiabetesCare 2017;40:1622–1630103. Davies MJ, D’Alessio DA, Fradkin J, et al.Management of hyperglycemia in type 2 diabe-tes, 2018: a consensus report by the AmericanDiabetes Association (ADA) and the EuropeanAssociation for the Study of Diabetes (EASD).Diabetes Care 2018;41:2669–2701104. GarberAJ, AbrahamsonMJ,Barzilay JI, et al.Consensus statement by the American Associa-tion of Clinical Endocrinologists and AmericanCollege of Endocrinology on the comprehensivetype 2 diabetes algorithmd2018 executive sum-mary. Endocr Pract 2018;24:91–120

1146 Big Data for Diabetes Research and Care Diabetes Care Volume 42, June 2019