The Application of Action Design Science Research
to Digital Innovations in Data Science
Matthew T. Mullarkey, Ph.D. & Alan R. Hevner
Information Systems and Decision Sciences
Muma College of Business
University of South Florida
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program
Teach us Data Science…
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 26/1/2017
But, where does an organization start?Motivation
Fortune 100 Contract Manufacturer Problem Domain 1: lots of data and data structures, a bunch
of stats, sql, and dashboard tools Problem Domain 2: just a few Ph.D.s (Lab & Praxis scientists
in mathematics, statistics, and data) w/ limited domain knowledge
Solution Domain: “Citizen” Data Scientists Gap1: Method to Conduct a full Data Science Project – How
do we attack a DS problem when we aren’t really sure what we will find or what an innovative solution will look like?
Gap2: Means to Evaluate ROI – How do we value what we create in DS? Is it only a fully implemented system like a typical IS/IT project?
Dr. MT Mullarkey, [email protected], (c) 2016 - Citizen Data Scientist Program
36/1/2017
By Data Science we mean what exactly…?
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 46/1/2017
The Essence of Data Science… Interdisciplinary, about processes and systems to extract knowledge or insights
from “interesting” data in various forms
structured & unstructured, private & public, samples & populations, “big” and “small” data
which is a continuation of some of the data analysis fields
such as statistics, data mining, and predictive analytics
and, can use processes like Knowledge Discovery in Databases (KDD) and CRISP-DM.
Data science employs techniques and theories drawn from many fields:
mathematics, statistics, information science, and computer science,
And includes signal processing, probability models, machine learning, statistical learning, data mining, database, data engineering, pattern recognition and learning, visualization, predictive analytics, uncertainty modeling, data warehousing, data compression, computer programming, artificial intelligence, and high performance computing.
And incorporates machine translation, speech recognition, robotics, search engines.
Data science is essential to the digital economy, and also the biological sciences, medical informatics, health care, social sciences and the humanities.
An integral part of competitive intelligence, a newly emerging field that encompasses a number of activities, such as data mining and data analysis.
Dr. MT Mullarkey, [email protected], (c) 2016 - Citizen Data Scientist Program 56/1/2017
The rise of Data Science in Organizations Massive “Interesting” Data
Firms are seeking to gain greater understanding of and insights into more and more massive quantities of data collected and stored in disparate public and private databases.
Data Science Tools & Technique Maturity
Data science tools and techniques have evolved rapidly over the last five years to analyze this structured and unstructured data from exploratory and confirmatory perspectives.
Desktop Accessible – Computational Power & User Friendly Interfaces
The data science tools themselves now run on readily available CPUs with much more user friendly interfaces that increase the use and usefulness of the tools to data scientists and domain expert alike.
Domain Knowledge Integral to DS Insights
As firms seek to maximize the utility of emerging data science technologies to organize and investigate more and more complex, massive data they are looking for innovative approaches to diffuse knowledge of the tools and techniques of data science to a “richer” set of users within domains.
Requires a Methodical Approach
To effectively and efficiently deploy project resources to the data science search activity and consequent build and evaluation of innovative artifacts, certain firms are finding that a Design Science approach can provide the iterative, evaluative method for the diagnosing, design, implementation, and evolution of data science artifact creation.
Critical to Competitive Intelligence
Several case examples exist to illustrate the Design Science method for investigating challenging data science problems that result in innovative solutions through the creation of a business process for the use of data science digital technologies.
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 66/1/2017
Moving Data Science from Lab to Practice
DS in the Lab Tool innovation & deployment
Specialized software and hardware
Deep disciplinary knowledge
Technical user interface
Close to the research gap question
Demonstrate clever investigations
Demonstrated by gain in knowledge
Limited, experimental use
DS in Practice Method innovation & deployment
Ubiquitous software on existing hardware
Deep domain knowledge
“Civilian” user friendly interface
Close the domain specific question
Create a competitive advantage
Demonstrated Return on Investment
Widespread, predictable use
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 76/1/2017
A Data Science effort Requires…
Interesting Questions informed by Domain Knowledge
Access to multiple “interesting” data sets
Computer processing power
Proven software tools
Proven (accessible) user interfaces to the tools
Time and inter-disciplinary resources to focus
Discipline methodical search, discovery, design, build, evaluate and system implementation (automation)
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 86/1/2017
No lack of Interesting Data Structured Private Data in Relational Databases within the Organization
Generally clean but not always connected databases
Almost always domain specific…
HR, Finance, Manufacturing, Supply Chain, Marketing or by Business Unit
Often access is controlled by an expert (in IT or in the Domain)
Standard reporting and sql processes are common
More and more often resides in private cloud platforms (ERP, CRM, etc.)
Unstructured Private Data in Non-SQL Databases within the Organization Including text, images, clickstream, posts, …
Limited standard reporting, difficult to search using traditional techniques
Often needs to be transformed to be productively explored
Often resides in cloud based data structures (ex: Hadoop)
Structured and Unstructured Data in Public data repositories Unstructured Social network unstructured data streams
Semi-structured Clickstream and IP/MAC addressable data streams
Structured data provided through various agencies and NGOs (CDC, DOT, etc)
Often dependent upon APIs and/or paid access
Usually only interesting when paid in “clever” ways with Private Data
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 96/1/2017
No lack of Digital Innovation in Data Science Innovative Data Structures & Techniques
SQL & NoSQL in Relational, Distributed, Parallel dB Ex: Cassandra, Hadoop, Big table, … (see Hurst, 2010)
Techniques: mapreduce, Hive, HDFS, text search, Lucene, Solr, & Natural Language Processing,
APIs & Social Media Data - Twitter API, Facebook Graph API, & LinkedIn API Innovations in Dimensional Modeling
Analytic SQL, Aggregation, Analysis, Modeling Unstructured Data & Text Mining
Innovations in Statistical Analysis & Predictive Modeling Excel, R, SAS, SPSS with Interfaces & Population capabilities Linear and non-Linear Relationships Analysis & Modeling Data Mining Process – KDD, SEMMA, CRISP-DM Model Evaluation – 13, Lift and ROC, Cost-Sensitive Learning Machine Learning Techniques - Decision Trees, Random Forests. P Predictive Modeling - Neural Networks, Probability Models, Support Vector Machines Pattern Discovery Methods - Market Basket Analysis, Genetic Algorithms for Pattern
Discovery Distance-based Techniques – Clustering, Nearest Neighbor Classification,
Recommender Systems Algorithms Innovations in Reporting & Visualization
Visual Studio vs. Report Builder 3.0, PowerPivot in MS Excel 2013+, TableauDr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 106/1/2017
A (Design Science) method to the (Data Science) madness… An Evidence Based Approach
Data Science has evolved to the point where its collective technologies can be relatively easily understood and deployed by any domain expert due to the “usual” evolution of a methodological approach, simplified interfaces, computing capacity, and access to data needed to move from the domain of the expert scientist to the praxis scientist.
Our analysis of the prior examples of the process of moving from research and science to problem solving and practice identified a logical evolution of technologies from the lab to the workbench. (ie: six sigma Motorola & GE)
From the Theoretical to the Practical
When this occurs, an innovative technology formed through research can be said to transition from the theoretical and into practical use by a wider set of engaged researchers and actual practitioners.
In so doing, the technology moves from the lab and into the practical application designed to solve real, sticky, wicked problems where those problems exist.
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 116/1/2017
Standard Project ApproachSee a Problem – Solve a Problem
Zero defect, Eliminate Waste, Eliminate Variation
-Cu
ltur
al M
inds
et -
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 126/1/2017
The methods used to “get things done”…CRISP-DM KDD Process
Agile
6 Sigma DMAIC
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 136/1/2017
Standard Project ApproachSee a Problem – Solve a Problem
Data Science Project ApproachGain Insight – Competitive Intelligence
Revel in the VariationSearch, Emergent, Experiment, Model
Zero defect, Eliminate Waste, Eliminate Variation
-Cu
ltur
al M
inds
et -
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 146/1/2017
Mullarkey & Hevner, 2017
Creating competitive intelligence from inferences and hidden insights in massive data doesn’t happen by accident, it happens by Design!
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 156/1/2017
A (Design Science) method to the (Data Science) madness…
Our work in situ with a Fortune 100 global advance manufacturing company used an action design research approach to co-develop a methodological approach to the conduct of big data science.
We identified, implemented and evaluated a data science method that can now be combined with the state-of-the-art GUI interfaces of various data science tools and existing computing capability to investigate massive data present in most organizations to allow organizations to truly develop scientifically sound inferences from the data.
We find that this transition to the scholarly practice of the data scientist can be accelerated through the application of the design science research method in an action research approach with practitioners.
And, that a combination of the engaged researcher and the scholarly practitioner generate knowledge that informs research and practice.
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 166/1/2017
In Data Science it is not surprising that the majority of time is spent Diagnosing and Designing
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 176/1/2017
In Data Science, we iteratively build and evaluate artifacts …
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 186/1/2017
Putting a method to the DS “madness”… Approach
Recognized: Digital Innovation & Transformation can be a Method for the Deployment of Innovative Data Science Tools
Apply Elaborated ADR approach to define Data Science Project Method
Mullarkey & Hevner, “Entering Action Design Science Research”, 2015
Mullarkey & Hevner, European Journal on Information Systems, Pending – Special Addition on Design Science)
Develop an adult learning curriculum based learning to teach a DS method, tools, and application to a real question involving “interesting” data to gain insights that improve one or more key decisions
196/1/2017Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program
Iterative Stages and Steps
Within Each DS Stage, DS projects must iteratively build and evaluate artifacts… PLAN/Formulate the
Problem
BUILD an Artifact
EVALUATE the Artifact
REFLECT on the Outcomes
Identify and Communicate LEARNings
DS Projects tend to flow from DS stage to stage over time…
DIAGNOSE
DESIGN
IMPLEMENT
EVOLVE
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 206/1/2017
Iteration Between ADR Data Science Stages A methodical approach to Data Science
Dr. MT Mullarkey, [email protected], (c) 2016 - Citizen Data Scientist Program216/1/2017
Mullarkey & Hevner, “Elaborating Action Design Research”, European Journal of Information Systems, pending Special Edition on Design Science Research.
Problem Formulation & Planning
Artifact Build
Evaluation
Reflection
Learning
Iteration Within each ADR Data Science Stage
Dr. MT Mullarkey, [email protected], (c) 2016 - Citizen Data Scientist Program 226/1/2017
Key: ArtifactAbstraction
--------------------Build & Evaluate
Abstraction: A Data Science “Artifact” Artifacts are:
Constructs, models, methods, instantiations, systems
Data sets, RFP guidelines, requirements definition, linkage table, cluster analysis, map BOMs, visualization of data, decision trees, recommender algorithms, scripts, queries, dashboards, reports, sentiment analysis, correlation matrix … new decision support systems, new recommender systems, new information systems, new technologies
One or more Artifact is built and evaluated with every iteration within each DS Stage
Dr. MT Mullarkey, [email protected], (c) 2016 - Citizen Data Scientist Program236/1/2017
Example 1: Blog post engagement on the company site (10,000 unique blogs)
DIAGNOSIS DESIGN
Problem DomainStep 1•Thousands of blogs but no idea how to respond to each new blog and use to improve sales.
•Key insight: Blogs must be coming in from vendors, customers, potential customers, employees, potential employees, analysts, etc
•Marketing believes that “nurtured” leads with targeted content increase demand for that service by 20%
DataStep 2•External information: unique blogger identifying information•Internal: text content of blog, type of blog, page views, emails sent, number of visits
•Collect, clean, merge data, build data dictionary•Attributes include email address, country, lifecycle stage, emails opened, page views, social media clicks, and the text of the blog itself – entered in SQL & Hadoop
Text MiningStep 3•Rapid Miner – unsupervised associative rule cluster analysis•Multiple iterations to valid clustering model performance•Multiple iterations to supervise the grouping of clusters into “subscribers
Note: multiple iterations in each step often with different tools occurred.
Design Communication Strategies for each SegmentStep 1
•Five segments – Five strategies•Test each message with sample of blog subscribers•Complete A/B testing/experimentation
Design Automated Method to Classify Next Blog into a SegmentStep 2
Design Learning Algorithm to Measure Subscriber ReactionStep 3
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program246/1/2017
Evidence: Multiple Iterations in Each Elaborated ADR Stage with Artifacts Created & Evaluated (Two Cohorts)
Diagnose Design Build Implement BackwardsArtifacts Created &
Evaluated2 1 0 0 1 45 2 1 0 0 102 2 2 0 2 86 4 2 0 0 108 1 0 0 1 82 2 1 0 0 46 2 0 0 1 75 1 1 0 4 822 10 3 0 10 252 1 0 0 1 42 2 0 0 1 83 1 1 0 0 620 3 1 0 7 5
83 32 12 0 28 107
6.38 2.46 .92 0 2.15 8.23
Use Case12345678910111213
Total
Mean
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program256/1/2017
Principles of an Innovative approach using Elaborated ADR for Data Science
A Search Process: Iterative, Guided, Emergence Revel in the Variation Educate Domain Experts to use the tools Value Innovative Artifact Creation (Build) and Evaluation 90% of the Perspiration on Diagnosing & Design prior to
Implement & Evolve There will be dead-ends: essential to capture reflection
and learnings Contributions to knowledge occur with each iteration –
communicate widely
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 266/1/2017
Initial Conclusions & Discussion Elaborated Action Design Research Method is a Digital
Innovation that contributes to the conduct of Data Science Projects
Principles of eADR are Congruous with the Conduct of Data Science
Citizen Data Scientist will become the principle actors in the Conduct of Data Science
Represents a Cultural Shift for most Companies (can be aided by the inherent principles in eADR)
Focus on Artifact build and evaluation: Helps bridge the “action” gap between Cultures – things get done Emphasizes iterative abstraction Values the contribution that occurs long before completed
instantiations
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 276/1/2017
What we’ve found
so far…
Sustaining and Growing the Citizen Data Science Contribution to the Corporation
Crowd Sourcing “Interesting” Data Science Questions
Monthly Reporting, Artifact Generation, and Dashboard
Repository of Artifacts
Governance and Cultural Shift
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 286/1/2017
Crowd Sourcing Interesting Data Science Project Questions
Create an internal intra-organizational social network of Citizen Data Scientist (Slack or Yammer based)
Encourage them to post their Data Science Project questions, problem domains, and solution domains on the network
Gamify participation
Crowd source data science and design science thinking about the posted Data Science Project proposals
Use this to refine the Citizen Data Scientist author’s thinking and framing
Identify any similar projects already performed
Suggests approaches, artifacts, data sets that might aid in the investigation and inquiry into the question being asked
Doesn’t rely on a few executives to approve/reject – uses the collective intelligence to encourage more different data science projects that are well designed and well performed.
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 296/1/2017
Sustaining… monthly report by project.
Macro Activity ________Step 1• Iteration 1:• Iteration 2:• Iteration 3:• Iteration 4:
Macro Activity ________Step 2• Iteration 1:
Iteration 2:Iteration 3:Iteration 4:
Macro Activity ________Step 3• Iteration 1:
Iteration 2:Iteration 3:Iteration 4:
Step 1• Iteration 1:
Iteration 2:Iteration 3:Iteration 4:
Step 2• Iteration 1:
Iteration 2:Iteration 3:Iteration 4:
Step 3• Iteration 1:
Iteration 2:Iteration 3:Iteration 4:
Artifact Built Artifact Repository
Data Science Stage: ______________
Artifact Evaluation
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 306/1/2017
Sustaining…monthly leadership dashboard
Diagnose Design Build Implement BackwardsArtifacts Created &
Evaluated2 1 0 0 1 45 2 1 0 0 102 2 2 0 2 86 4 2 0 0 108 1 0 0 1 82 2 1 0 0 46 2 0 0 1 75 1 1 0 4 822 10 3 0 10 252 1 0 0 1 42 2 0 0 1 83 1 1 0 0 620 3 1 0 7 5
83 32 12 0 28 107
6.38 2.46 .92 0 2.15 8.23
Use Case12345678910111213
Total
MeanDr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 316/1/2017
Establish a Central Repository of Built and Evaluated Artifacts
Use a single Box, Dropbox, or other internal access only cloud based repository of artifacts.
Segment artifacts by the type of artifact, for example:
Data sets, RFP guidelines, requirements definition, linkage table, cluster analysis, map BOMs, visualization of data, decision trees, recommender algorithms, scripts, queries, dashboards, reports, sentiment analysis, correlation matrix … new decision support systems, new recommender systems, new information systems, new technologies
Use links where appropriate to data sets and built systems to avoid duplication of storage on the system.
Link the repository to the Citizen Data Science Social Network as a reference useable by all.
Governance and Cultural Shift Establish multi-disciplinary, multi-national data science design council (DSCC).
One simple goal: promote the creation of competitive intelligence that benefits the company through the methodical investigation of data by design.
Members of the DSCC must be Citizen Data Science trained
Executive program or the Practitioner program
The DSCC provides oversight for:
Citizen Data Scientist training in the US, Malaysia, and China (elsewhere as needed)
Reviews the monthly reporting
Participates in the Crowdsourcing Data Scientist Social Network
Reports Quarterly to Executive Leadership
Identifies “Wins” and celebrates successes company wide.
Assures the quality of Citizen Data Science Project work through diligent application of the Design Science Method.
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 336/1/2017
Conclusion It’s Time for the Citizen Data Scientist:
The Data Science Tools have moved from the Lab to the Practice.
Citizen (Domain) Data (Variation in Interesting Data) Scientist (Methodical).
The User Interfaces make the Tools “Fungible” to Domain Experts
The Data Exists in exciting ways like never before (unstructured, populations, APIs, public)
Computing is (relatively) cheap and accessible to All.
The Method to Conduct Data Science Exists and can be taught and learned to improve project quality
Elaborated ADR Method – iterative, guided emergence, reflection, learning, to build and evaluate artifacts that make contributions to knowledge…
Citizen Data Science is a natural, complimentary – but different - progression from Deming Principles, TQM (Zero Defect), JIT (Eliminate Waste), Six Sigma (Eliminate Variation). Empower domain experts to explore, confirm, model, and instantiate systems that generate competitive
intelligence.
Learn to “Revel in the Variation”.
Use intelligent tools to see what is invisible to the human eye.
Digital Innovation results from the integration of DSR and DS. Provides the Method to conduct scientific investigation
Evidence of success through real projects with citizen data scientists.
Build a culture willing to explore and reward artifact generation to find hidden insights in “interesting” data.
Dr. MT Mullarkey, [email protected], (c) 2017 - Citizen Data Scientist Program 346/1/2017
Workshop I
Brainstorm an interesting DS question.
Area of Inquiry?
Key Insight?
Key Competitive Intelligence Possible?
Inter-disciplinary team to explore?
Locus and nature of data involved?
Workshop II
Reflection and Feedback on Design Science approach to Data Science
What resonates? Why?
Where are the gaps?
Will you use it and recommend it to others?
Top Related