Augmented Analytics in R&D - Scientific Computing World · Augmented Analytics tools for scientists...

6
Augmented Analytics in R&D WHITE PAPER We live in a time of unprecedented innovation. Product design cycles keep getting shorter, personalized and custom products are becoming the norm and external requirements are changing constantly. On top of this, global problems like climate change, clean energy, water, phosphorus and nitrogen flows, plastics waste and other planetary boundaries that are being transgressed [1] are calling for redesign of almost everything we are familiar with today. Research and Development have more capabilities than ever to deliver on these challenges. Current state of the art laboratories are increasingly digital with extensive automation and robotization. Hundreds of millions of dollars and euros have been invested in high-throughput capabilities that have multiplied our experimental capabilities to such extent that the main challenge is no longer to do specific experiments but rather to keep up with the analysis of all that data and the decision on which experiments to do next. Modern R&D is almost fully digital, any data of the entire R&D process is recorded, stored in LIMS, ELN and many other repositories. Still, lots of data is incomplete, misses some of the context to make it useful for the long term and ends up in Excel spreadsheets scattered over the organization. R&D organizations are expected to deliver on the pressures forinnovation and show significant ROI on the investments that have been made in automation and data infrastructure. R&D is required to become smarter, more cost- effective and more predictable. To deliver on this expectation, many people are looking at advanced analytics, predictive analytics, machine learning and artificial intelligence to harness and extract the full potential of all that data. Many of these terms are used interchangeably and lumped together with other buzz-words like big data and IoT. The potential is clearly there but many companies struggle to make the transition from being aware and reactive to an end-to-end integrated data-driven culture (see figure 1). From our experience, there are three main challenges that need to be addressed to ascend to analytic maturity which are largely independent of the specific business segment companies are in. First challenge is that despite the huge investments in sensors, automation and digitization many companies realize that huge amounts of data are underutilized. There are various reasons, but the main ones are that data is not treated as the asset that it should be and is still very transactional in nature i.e. data is being generated and used for a specific short-term purpose with very .com

Transcript of Augmented Analytics in R&D - Scientific Computing World · Augmented Analytics tools for scientists...

Page 1: Augmented Analytics in R&D - Scientific Computing World · Augmented Analytics tools for scientists and researchers have reached sufficient maturity and become available to make this

Augmented Analytics in R&D

WH

ITE

PAPE

R

We live in a time of unprecedented innovation. Product design cycles keep getting shorter, personalized and custom products are becoming the norm and external requirements are changing constantly. On top of this, global problems like climate change, clean energy, water, phosphorus and nitrogen flows, plastics waste and other planetary boundaries that are being transgressed [1] are calling for redesign of almost everything we are familiar with today.

Research and Development have more capabilities than ever to deliver on these challenges. Current state of the art laboratories are increasingly digital with extensive automation and robotization. Hundreds of millions of dollars and euros have been invested in high-throughput capabilities that have multiplied our experimental capabilities to such extent that the main challenge is no longer to do specific experiments but rather to keep up with the analysis of all that data and the decision on which experiments to do next. Modern R&D is almost fully digital, any data of the entire R&D process is recorded, stored in LIMS, ELN and many other repositories. Still, lots of data is incomplete, misses some of the context to make it useful for the long term and ends up in Excel spreadsheets scattered over the organization.

R&D organizations are expected to deliver on the pressures for innovation and show significant ROI on the investments that have been made in automation and data infrastructure. R&D is required to become smarter, more cost-effective and more predictable. To deliver on this expectation, many people are looking at advanced analytics, predictive analytics, machine learning and artificial intelligence to harness and extract the full potential of all that data. Many of these terms are used interchangeably and lumped together with other buzz-words like big data and IoT. The potential is clearly there but many companies struggle to make the transition from being aware and reactive to an end-to-end integrated data-driven culture (see figure 1).

From our experience, there are three main challenges that need to be addressed to ascend to analytic maturity which are largely independent of the specific business segment companies are in. First challenge is that despite the huge investments in sensors, automation and digitization many companies realize that huge amounts of data are underutilized. There are various reasons, but the main ones are that data is not treated as the asset that it should be and is still very transactional in nature i.e. data is being generated and used for a specific short-term purpose with very

.com

Page 2: Augmented Analytics in R&D - Scientific Computing World · Augmented Analytics tools for scientists and researchers have reached sufficient maturity and become available to make this

little regard for the long-term utility across the organization. Assigning the proper value to all data implies much more discipline in terms of recording not just the particular data point but also all the context that is needed for reuse at a later time by somebody else. This has significant implications on the whole culture required around data generation and storage. One needs to address the consistent

naming of things, the definition of ontologies or relationships between data, consider the data-flows and discoverability of data in the context of new projects, the addition of all meta-data etc. In essence one needs to anticipate the future usage of the data for analytics at the moment it’s being created.The second main challenge that many companies are facing is the limited access to great data scientists. There is a real war for talent going on in this area and finding and retaining people with the right analytics skills is not easy [see figure 2]. This problem is getting worse over time because virtually every business is now looking for people with these skill sets. It is true that many companies have made significant investments in basic training and tools in statistics and design of experiments but the requirements for predictive analytics, machine learning and digital R&D in general go far beyond these basic competencies.

A third challenge is that even if all the data is available in an organized and structured form and even if you have access to good data scientists it is still the domain experts that make the final decisions and can turn insights and predictive models into meaningful action. R&D is iterative and incremental and requires a lot of specific domain expertise to make judgment calls all the time. Data scientists will not rule

the world by themselves because they often lack this detailed domain expertise. Often there are significant communication barriers between data scientists and domain experts that are difficult to resolve completely. This happens because data scientists are expected to contribute to many different problems over time and don’t have the opportunity nor background to make all the necessary distinctions between correlation and causation.

In our opinion, the only scalable solution is embedded in the concept of Augmented Analytics [2] where the domain expert is put central and is provided with the right tools and training to be able to generate most of the analytics insights him/herself. According to this recent Gardner report Augmented Analytics marks the next wave of analytics disruption where domain experts are turned into Citizen Data Scientists that have automated tools to help them with preparing their data, finding

Chasm

DEGREE OF ANALYTIC MATURITYIMPA

CT O

N CO

MPA

NY P

ERFO

RMAN

CE

AWARE &REACTIVE EXPERT-LED

DATA-DRIVENEXCELLENCE

END-TO-END INTEGRATEDDATA-DRIVEN CULTURE

• Roll-out of data-drivenimprovement projects embraced by domain experts

• Predictable ROI andtiming for each proof of value

• Functioning eco-systemof external suppliers

• Structured integration ofdata-driven decision making

• Data-driven approach is apart of onboarding trainings just like safety training

• Local improvements ledby experts /champions

• Several successful proof of value implementations with value captured

• No systematic approach to handling or even looking at data within R&D or production

• Thinking about capturing the data

• Reactive measures in fire-fighting mode

Figure 1 : Road to an end-to-end data-driven culture

Page 3: Augmented Analytics in R&D - Scientific Computing World · Augmented Analytics tools for scientists and researchers have reached sufficient maturity and become available to make this

patterns, creating predictive models, do multi-objective optimization and define and execute virtual experiments using predictive models. In addition, these people should be able to easily operationalize their findings and share the results with their peers and management in the form of Data Stories.Augmented Analytics platforms have to provide state-of-the-art robust, predictive analytics and machine learning tools with additional

layers of functionality that automate the tasks that are now typically done by data scientists. One should not assume that data is perfect but rather that data comes from the real-world. This implies that there are mixed data types, that there may be missing data, that there might be outliers, correlated data and badly scaled data. In addition, one should not expect the domain expert to be able to select the appropriate algorithm to use including the various parameterizations that are required to build models that do not over- nor undertrain.

SummaryThe increasing digitization of laboratories and research environments brings a multitude of new opportunities for more insightful, more productive and efficient R&D operations. With the more traditional methods of research and experimentation, scientists rely on their expertise and intuition to navigate the vast space of possibilities and discover appropriate solutions in a laborious, resource intensive and not always successful process. The proliferation of data and the availability of analytical tools have the potential to transform how research and experimentation is done within a wide range of industries – from the life sciences, to basic and advanced chemicals, to

food and agriculture.

Digital R&D powered by Augmented Analytics is a more efficient and focused way of doing research and new product development, which is informed by data insights and targeted accumulation of knowledge across the entire search space of possibilities to guide decisions. Data-driven research is based on an iterative process (see figure 3) of planning and conducting experiments, with a structured way to collect and document results, enriching the data and applying advanced predictive modelling to inform the next steps. Researchers gain insights into which experiments are worth doing the most to increase the body of knowledge, and which ingredients and conditions have the most positive effect on the desired outcome. Embracing this approach has many benefits – it accelerates discovery, shortens development cycles (significantly), allows focus and improves success rates. Pioneering scientists with deep mathematical and modelling knowledge have been applying similar approaches successfully for many years, but for most researchers this has been out of reach. Until now, when enabling Augmented Analytics tools for scientists and researchers have reached sufficient maturity and become available to make this process accessible to the wider R&D community.

Author:Guido F. Smits has worked in the field of computational intelligence for 31 years. Prior to joining DataStories Int. NV as a chief scientific officer, Guido was a principal research scientist at Core R&D dept. of Dow Chemical for 25 years. His expertise spans quantum physics, chemistry, statistical learning theory, machine learning and high-performance computing. He spent his entire career applying AI to industrial challenges on product design and process optimization.

DAT

A V

OLU

ME

TIME

capacity to analyse

gap

data available

Figure 2 : Growing gap between amounts of data to analyze and available data scientistsculture

Page 4: Augmented Analytics in R&D - Scientific Computing World · Augmented Analytics tools for scientists and researchers have reached sufficient maturity and become available to make this

Sign-up for a webinarat datastories.com andtry our platform for free!

Page 5: Augmented Analytics in R&D - Scientific Computing World · Augmented Analytics tools for scientists and researchers have reached sufficient maturity and become available to make this

— INTERACTIVE RELATIONSHIP MAPS

How is everything inter-connected?Compute linear and non-linear correlations and view the results in interactive relationship maps to gain insight into the complexity of your problem.

— A MINIMUM SET OF DRIVING VARIABLES

From the thousands of variables, which ones really matter?Discover the optimal predictive model with the minimum set of variables sufficient to predict and explain the chosen target variable (KPI).

— INTERACTIVE WHAT-IF SCENARIOS

How should you change the inputs to get optimum output?Simulate the optimal values for your driving metrics to achieve the desired KPI levels.

— MODEL-BASED OUTLIERS

What are the exceptions to the rule?Explore outliers to gain valuable insight about your process and find further opportunities for improvement.

Page 6: Augmented Analytics in R&D - Scientific Computing World · Augmented Analytics tools for scientists and researchers have reached sufficient maturity and become available to make this

References:

[1] https://en.wikipedia.org/wiki/Planetary_boundaries[2] https://www.gartner.com/doc/3773164/augmented-analytics-future-data-analytics Gartner report entitled, Augmented Analytics Is the Future of Data and Analytics, Published 27 July 2017, by Rita L. Sallam, Cindi Howson, and Carlie J. Idoine, highlighting the importance and benefits of Augmented Analytics, Augmented Data Preparation and Augmented Data Discovery.[3] DataStories™ platform: datastories.com

DESIGN EXPERIMENTS

DEPLOY MODELS

ANALYZE MODELS

MODELING

ASSEMBLE DATA SOURCES

VISUALIZE & RUN STATS

CONSOLIDATE DATA

ADD NUMERIC DESCRIPTORS

Plan and collect the data by optimally varying compositions and process conditions

Robust Prediction Profilers and interactive sensitivity analysis identify future experiments

Robust ensemble-based methods help identify driving factors and

build robust predictions

The goal of modeling in the augmented space is to identify descriptors and conditions that drive performance

Organize the data intelligently, think of labelling, ontologies, continuity

Explore the experimental data visually, identify visible problems,

errors, data density, duplicates, etc

Add new experimental data to the master source, check true meaning, naming,

conditions

Critically important step to augment the composition data with computable

numerical descriptors

Figure 3 : Augmented analytics workflow