How to evaluate and improve the quality of mHealth behaviour change tools

PowerPoint Presentation

How to evaluate & improve the quality of mHealth behaviour change tools ?Jeremy Wyatt DM FRCP ACMI Fellow

Leadership chair in eHealth research, University of Leeds; Clinical Adviser on New Technologies, Royal College of Physicians, London

[email protected]

Yorkshire Centre for Health Informatics

AgendaWhy mHealth BC tools ?Is there a quality problem ?What does quality mean:In general ?For mHealth BC tools ?What methods might improve quality ?How to evaluate the quality of these tools ?Some example studies Conclusions

Why mHealth for behaviour change ?1. Face-to-face contacts do not scale2. Smart phone hardware used by > 75% of adults:Cheap, convenient, fashionable Inbuilt sensors / wearables allow easy measurementsMultiple channels: SMS, MMS, voice, video, apps3. mHealth software enables:Unobtrusive alerts to record data, take actionIncorporation of BCTs (eg. BCTs present in 96% of adherence apps, median 2 BCTs)Tailoring makes BC more effective (corrected d=0.16 - SR of 21 studies on BC websites, Lustria, J H Comm 2013)

Why digital channels ?

A broken market ?

Privacy and mHealth appsPermissions requested: use accounts, modify USB, read phone ID, find files, full net access, view connectionsOur study of 80 apps: average of 4 clear privacy breaches for health apps, only 1 for medical appsWe know that - we read the Terms & Conditions ! (this one only 1200 words, but many much longer)

First Folio As You Like It Public Domain Photo taken byCowardly Lion-Folio Society edition of 1996

With Hannah Panayiotou & Anam Noel, Leeds medical students

What is Quality ?The totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs (ISO 9000:2005 Quality management systems -- Fundamentals and vocabulary)

IE. Quality is about fitness for purpose - a good quality product complies with client requirements

[That 30-page ISO document costs 96-50!]

Possible processes to improve quality of mHealth toolsMethodsAdvantagesDisadvantagesExamplesWisdom of the crowdSimple user rankingHard for users to assess quality; click factory biasCurrent app storesMyHealthAppsUsers apply quality criteriaExplicitRequires widespread dissemination; can everyone apply them ?RCP checklistClassic peer reviewed articleRigorous (?)Slow, resource intensive, doesnt fit App model47 PubMed articlesPhysician peer reviewTimelyDynamicNot as rigorousScalable ?iMedicalApps, MedicalAppJournalDeveloper self-certificationDynamicRequires developers to understand & comply; checklist must fit appsHON Code ?RCP checklist

Developer supportResource lightTechnical knowledge neededMultitude of developersBSI PAS 277CE marking, external regulationCredibleSlow, expensive, apps dont fit national modelNHS App Store, FDA, MHRA

BSI publically accessible standard 277 on health & medical appsPAS is not a BSI Standard, but a supportive framework for developersSteering group includes DHACA, BUPA, NHS AHSN, app developers, notified bodies, HSCIC, RCP, NIHR mental health technology centre, patient rep, medical device manufacturerNow published on BSI web site free access

Regulation of medical apps by FDA, FCCIf classified as a medical device by FDA a product must demonstrate efficacy, but:Only 100 apps so far classified as a medical deviceDecision to exercise enforcement discretion on most medical appsSo, FDA has not actually banned any apps, yet

However, the Federal Communication Commission has banned some apps with misleading claims, eg. Acne Cure (no evidence of claimed benefit of iPhone screen backlight)

Sharpe, New England Center for Investigative Reporting, Many health apps are based on flimsy science at best, and often do not work. Washington Post, November 12th 2012

We need to think differentlyOld thinkNew ThinkPaternalism: we know & determine what is best for usersSelf determination: users decide what is best for themRegulation will eliminate harmful Apps after releasePrevent bad Apps - help App developers understand safety & qualityThe NHS must control Apps, apply rules and safety checksSelf regulation by developer communityConsumer choice informed by truth in labellingApp developers are in controlAristotles civil society* is in controlQuality is best achieved by laws and regulationsQuality is best achieved by consensus and culture changeThe aim of Apps is innovation (sometimes above other considerations)App innovation must balance benefits and risksAn Apps market driven by viral campaigns, unfounded claims of benefit An Apps market driven by fitness for purpose (ISO) & evidence of benefit

The elements that make up a democratic society, such as freedom of speech, an independent judiciary, collaborating for common wellbeing

User ratings: app display rank versus adherence to evidence

Study of 47 smoking cessation apps (Abroms et al, 2013)More popular apps are lower quality a failed market ?

What went wrong with user rankings and reviews ?Wall Street Journal Nov. 24, 2013 6:25 p.m. Inside a Twitter Robot Factory: Fake Activity, Often Bought for Publicity Purposes, Influences Trending TopicsByJEFF ELDER One day earlier this month, Jim Vidmar bought 1,000 fake Twitter accounts for $58 from an online vendor in Pakistan. Mr. Vidmar programs accounts like these to "follow" other Twitter accounts, and to rebroadcast tweets

http://on.wsj.com/18A2hr9

Promoting quality in the marketplaceQuality is defined by ISO as fitness for purposeHowever, app users and purposes vary, so a simple good / bad quality mark is insufficientWe need a checklist of optional criteria to empower users / clinicians / commissioners / app developersSelect the criteria you need according to complexity of the app and contextual risk (Lewis & Wyatt, JMIR 2014)Labelling apps will make quality explicit, adding a new Darwinian selection pressure to the apps marketplace

Risk Framework for mHealth apps

Context in which app usedLewis T, Wyatt JC. JMIR 2014

Our draft quality criteria for apps based on Donabedian 1966Structure = the app development team, the evidence base, use of an appropriate behaviour change model etc. Processes = app functions: usability, accuracy etc.Outcomes = app impacts on user knowledge & self efficacy, user behaviours, resource usage

Wyatt JC, Curtis K, Brown K, Michie S,. Submitted to Lancet

Structure: what should a high quality App include ?Appropriate sponsor & developer skillsAppropriate components:An appropriate underlying health promotion theoryA sensible choice of behaviour change techniquesGood user interfaceAccurately programmed algorithmsAdequate securityAccurate knowledge from a reliable source NICE ?

Study method: independent expert review

Process: does the App work well ?Suggested measures:App download rates [surrogate for user recognition]App usage rates immediately after download & laterTime taken to enter data, receive reportAcceptability, usabilityAccuracy of calculations, dataAppropriateness of any advice given

Study method: lab tests and surveys

Case study of CVD risk apps: methodsAssembled search terms: heart attach, cardiovascular disease etc.Searched for & downloaded all iPhone / iPad free & paid apps that public might use to assess personal riskAssembled 15 scenarios varying in risk from 1% to 98%Assessed the risk figure, format and advice given by each appDefinition of error: App above 20% & GS below 20%, or vice versa (old NICE guidance - or 10% - new guidance)With Hannah Cullumbine & Sophie Moriarty, Leeds medical students

Overall resultsLocated 21 apps, only 19 (7 paid) gave figuresAll 19 communicated risk using percentages (cf. advice from Gigerenzer, BMJ 2004)One app said see your GP every time; none of the rest gave adviceSome apps refused to accept key data, eg. age > 74, diabetes

Heart Health App

Misclassification rates for 20% thresholdVaried from 7% to 33%5 (26%) of 19 apps misclassified 25% or more of the scenarios, 3 (16%) misclassified 20-24%; 8 (42%) misclassified at least a fifth of the scenariosMedian error rate free 13%, paid 27%Error rate for free apps significantly less than for paid apps (p = 0.026, M-W U test 13.5)

Assessing app accuracyOnly applies to apps that give advice, calculate a risk / drug dose etc.Need a gold standard for correct advice / risk [QRisk2 in our case]Need a representative case series, or plausible simulated casesIdeally, users should enter case data or their own dataHow accurate is accurate enough:Accurate enough to get used ? Accurate enough to encourage user to take action ?

Intervention modelling experimentsAim: to check intervention before expensive large scale study (MRC Framework: Campbell BMJ 207)

What to measure: acceptability, usabilityaccuracy of data input by users, accuracy of outputwhether users correctly interpret outputstated impact of output on decision, self efficacy, actionusers emotional response to outputuser impressions & suggested improvements

Example IME: How to make prescribing alerts more acceptable to doctors ?

Background: interruptive alerts annoy doctorsRandomised IME in 24 junior doctors, each viewing 30 prescribing scenarios, with prescribing alerts presented in two different waysSame alert text presented as modal dialogue box (interruptive) or on ePrescribing interface (non-interruptive)Funded by Connecting for Health, carried out by Academic F2 doctor

Published as Scott G et al, JAMIA 2011

Interruptive alert

Interruptive alert in modal dialogue box

Example of an interruptive alert25

Non-interruptive alert same textInterruptive alerts reduced errors by a factor of 10, non-interruptive alerts by factor of 5, but more acceptable

Example of a non-interruptive alert for the same error26

Outcomes the bottom lineWhat is the impact of a health promotion App on:User knowledge and attitudesSelf efficacyShort term behaviour changeLong term behaviour changeIndividual health-related outcomesPopulation health

Study method: randomised controlled trials (eg. My Meal Mate App - Carter et al JMIR 2013)

surrogate outcomes

Online RCT on Foggs persuasive technology theory & NHS organ donation register sign up

Persuasive features:URL includes https, dundee.ac.ukUniversity LogoNo advertisingReferencesAddress & contact detailsPrivacy StatementArticles all datedSite certified (W3C / Health on Net)Result: 900 students recruited to RCT in 5 days; no difference in NHS organ donation register sign-up rates (38% both groups)Nind, Sniehotta et al 2010

28

Why bother with impact evaluation cant we just predict the results ?No, the real word of people and organisations is too messy / complex to predict whether technology works:

Diagnostic decision support (Wyatt, MedInfo 89)Integrated medicines management for a childrens hospital (Koppel, JAMA 2005)MSN messenger for nurse triage (Eminovic, JTT 2006)

29

A proposed evaluation cascade for mHealth AppsAreaTopicsMethodsSourcePurpose, sponsorUser, costInspectionSafetyData protectionUsabilityInspectionHCI lab / user testsContentBased on sound evidenceProven behaviour change methodsInspectionAccuracyCalculationsAdviceScenarios with gold standardPotential impactEase of use in the fieldUnderstanding of outputIntervention modelling experimentsImpactKnowledge, attitudes, self-efficacyHealth behaviours, outcomesWithin-subject exptsField trials

ConclusionsThe quality of mHealth tools varies too muchUser & professional reviews, developer self-certification and regulation are not enoughTo help reduce apptimism and strengthen other strategies, we need to agree quality criteria, evaluate apps against them, & label apps with the resultsWe have the evaluation methods (eg. rating quality of evidence, usability / accuracy studies, RCTs)This will support patients, health professionals and app developers to maximise the benefits of mHealth

31

How to evaluate and improve the quality of mHealth behaviour change tools

Technology

Transcript of How to evaluate and improve the quality of mHealth behaviour change tools