December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA

8

Click here to load reader

Transcript of December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA

Page 1: December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA

c1 TDWI RESE ARCH tdwi.org

TDWI CHECKLIST REPORT: SEVEN BEST PR ACTICES FOR ADOPTING DATA WAREHOUSE AUTOMATION

TDWI CHECKLIST REPORT

TDWI RESEARCH

tdwi.org

Seven Best Practices for Adopting Data Warehouse Automation

By David Loshin

Sponsored by:

Page 2: December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA

1 TDWI RESE ARCH tdwi.org

2 FOREWORD

2 NUMBER ONE Analyze the cycle time for satisfying business requests

3 NUMBER TWO Devise a value model for comparing data warehousing alternatives

3 NUMBER THREE Architect a hybrid environment

4 NUMBER FOUR Be agile in ingesting new data sources

4 NUMBER FIVE Train data engineers to collaborate with business users

5 NUMBER SIX Support business self-service

5 NUMBER SEVEN Rethink the role of the BI competency center

6 CONSIDERATIONS Developing a feasible integration plan for data warehouse automation

7 ABOUT OUR SPONSOR

7 ABOUT THE AUTHOR

7 ABOUT TDWI RESEARCH

7 ABOUT TDWI CHECKLIST REPORTS

© 2016 by TDWI (The Data Warehousing InstituteTM), a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. E-mail requests or feedback to [email protected]. Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies.

JANUARY 2016

SEVEN BEST PRACTICES FOR ADOPTING DATA WAREHOUSE AUTOMATION

By David Loshin

TABLE OF CONTENTS

555 S Renton Village Place, Ste. 700 Renton, WA 98057-3295

T 425.277.9126 F 425.687.2842 E [email protected]

tdwi.org

TDWI CHECKLIST REPORT

Page 3: December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA

2 TDWI RESE ARCH tdwi.org

TDWI CHECKLIST REPORT: SEVEN BEST PR ACTICES FOR ADOPTING DATA WAREHOUSE AUTOMATION

The word agile conveys a number of meanings. Its simple definition as an adjective is “quick and well-coordinated,” but the term has taken on additional meaning in the context of system design and development, largely in terms of rapid development, increased partnership among information technology (IT) and business partners, and leveraging teamwork to achieve short-term objectives that build toward solving more complex business challenges.

Applying agile development methodologies to data warehousing promises a number of benefits: quicker development and movement into production, faster time to value, reduced start-up and overhead costs, and simplified access for business users. Yet there are some prerequisites to transitioning to using the agile approach for data warehousing:

• Evaluate the existing environment to identify the best opportunities for achieving the benefits of adoption

• Establish good practices for leveraging more agile technologies where possible

• Identify the right technologies to facilitate that transition

There are a number of technologies that accelerate design and development, improve cycle time in producing reports and analyses, and enhance the IT-business collaboration. Some are platform oriented, such as columnar databases, in-memory computing, and Hadoop, which all seek to leverage faster performance to improve analytical results. Alternatively, data warehouse automation (DWA) tools blend user requirements and repeatable processes to automatically generate the components of a data warehouse environment. These tools simplify the end-to-end production of a data warehouse, encompassing the entire development life cycle, including source system analysis, design, development, generation of data integration scripts, building, deployment, generation of documentation, testing, support for ongoing operations, impact analysis, and change management.

We are rapidly moving away from the monolithic, single-system enterprise data warehouse and toward a hybrid environment that uses the most appropriate technologies to address specific data challenges. That environment will encompass many components and will benefit from reduced complexity through the use of tools like DWA. This checklist discusses seven practices for determining the value proposition of adopting DWA and establishing the foundation that will ease its adoption.

FOREWORD

The success of the conventional IT management approach to data warehousing and business intelligence (BI) has opened the door for a growing population of data warehouse consumers, both within and outside of the organization. Although many of these consumers’ needs are met by existing reports or by providing access for straightforward queries, a combination of factors has created a bottleneck in rapidly addressing business-user demands. Many business analysts are becoming more sophisticated in their investigations, requiring additional consulting from their IT counterparts to develop data extracts and reports.

At the same time, though, IT budgets and staffing remain constrained. The result is that scheduling limited IT consulting resources elongates the time from when a business user requests a data product to the time when that data product is delivered (“cycle time”).

There are two main risks of long cycle times. First, the data product is delivered after the window of opportunity for taking advantage of its results. Second, frustrated users may abandon the use of the enterprise data warehouse and adopt their own “shadow” reporting and analytics tools and methods, bypassing any governance procedures intended to ensure enterprisewide consistency.

Long cycle times and development bottlenecks lead to missed business opportunities. Therefore, increasing agility by eliminating those bottlenecks will increase the benefits of your reporting and analytics investment.

To find where those bottlenecks hinder productivity, analyze the end-to-end process for satisfying BI and analysis requests. It is likely that the source of the logjam is in the design-develop-test loop for new reports and extracts; of course, automation tools can reduce or even eliminate the development blockage and speed time to value.

Analyzing the report development cycle time provides a benchmark for the time and resources required (in general) to satisfy business needs. This benchmark can be used as the starting point for optimizing existing processes as well as providing a metric for evaluation of how DWA tools can speed time to value.

ANALYZE THE CYCLE TIME FOR SATISFYING BUSINESS REQUESTS

NUMBER ONE

Page 4: December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA

3 TDWI RESE ARCH tdwi.org

TDWI CHECKLIST REPORT: SEVEN BEST PR ACTICES FOR ADOPTING DATA WAREHOUSE AUTOMATION

The excitement over new technologies can sometimes overwhelm common sense. Organizations have invested significant amounts of money and staff effort in architecting, developing, and productionalizing their existing data warehouse and BI platforms. It would be surprising, risky, and generally unwise for any organization to completely rip out a trusted legacy data warehousing platform and replace it with systems generated using DWA tools or move directly to cloud-hosted data warehousing providers.

A more conservative approach would embrace a transition strategy that incrementally migrates data warehousing and BI applications from legacy platforms to more agile environments. Start off by establishing an innovation lab within the existing environment in which new techniques can be piloted and evaluated for interoperability with existing systems. This allows different approaches to be considered while constraining the alternatives to ones that are compatible with the production environment.

Design a hybrid data warehousing environment architecture that accommodates the introduction of new technologies and emerging agile development paradigms while maintaining the production operations of established conventional systems. Employ virtualization methods to provide layers on top of existing platforms, and abstract and differentiate the functionality from its implementation.

Designing a hybrid architecture provides the flexibility to integrate different data warehousing approaches that have already been vetted. In turn, adopting the right technology mix allows the application teams to develop facades abstracting the underlying capabilities. This allows for reengineering without disrupting production systems, enables dual operations during a testing period to verify that the new approaches are trustworthy, and facilitates seamless transitions for the user community when the time comes to migrate applications.

Adjustments to the data warehousing tools and platform infrastructure can potentially reduce development bottlenecks and speed time to value. This decision triggers the evaluation of vendors with products that purport to deliver on that promise, selecting candidate alternatives, and choosing one (or more) to integrate within the enterprise.

Balance the benefits and costs of using existing data warehouse systems versus introducing newer technology. Recognize that introducing new technology into the organization requires more than just purchasing the license and installing the tool. It also requires design and development time, an integration effort, training to empower product users, and a communications plan to transition legacy users to modernized platforms. Any plan for data warehouse environment modernization must incorporate the cost and resources needed to support those tasks to assess the potential return on investment and to compare and contrast candidate technologies.

Develop a value model for comparing data warehousing alternatives both against the existing platform as well as against each other. The key is to select the right variables for comparison that will lead to greater agility, lower costs, and better outcomes. Some variables for comparison include:

• Application development complexity. How easy is it to design, configure, and deploy a new data warehouse or add new subject areas within an existing data warehouse environment?

• Application development time. How long does it take for a data warehouse to become operational, focusing on the end-to-end process of design, development, and implementation?

• Skills requirements. What types of skills are required by the development team members and how long does it take to acquire those skills?

• Report development turnaround time. How long is the cycle time for new reports?

• End-user ease of use. How easy is it to empower data consumers to use the developed data warehouse without IT support?

• Resource requirements. What are the necessary resources for implementation?

• Cost. What are the accumulated start-up and operational costs?

This value model will provide quantitative comparisons to guide technology selection and is likely to highlight the value of DWA tools.

DEVISE A VALUE MODEL FOR COMPARING DATA WAREHOUSING ALTERNATIVES

ARCHITECT A HYBRID ENVIRONMENT

NUMBER TWO NUMBER THREE

Page 5: December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA

4 TDWI RESE ARCH tdwi.org

TDWI CHECKLIST REPORT: SEVEN BEST PR ACTICES FOR ADOPTING DATA WAREHOUSE AUTOMATION

Our world has evolved into one where new, diverse sources of massive amounts of data are emerging every day. From an analytics perspective, the impact of the explosion of data sets originating outside the typical enterprise is twofold. On the one hand, there is a growing availability of information that can inform internal subject area profiles, such as enhancing customer behavior information by analyzing streaming social media posts. On the other hand, the broad variety of both structured and unstructured data formats creates significant complexity for the data warehouse professionals who are unaccustomed to programming with Web services and APIs.

The conventional approach to data warehousing focuses on ingesting one or two data sets at a time, originating from sources inside the organization. However, as the speed, volume, and diversity of externally sourced data grow, the organization must move faster in absorbing data sources and putting those data streams to productive use.

Integrating new data sources requires that:

• The incoming data set is profiled as part of a discovery process

• Representative models are designed for data ingestion

• The incoming data model elements are mapped to existing data warehouse data elements

• Definitions are captured (or inferred) to ensure semantic consistency between new data sources and existing data models

• The data warehouse model must be augmented to absorb any newly defined data elements of interest from the new data source

• Transformations are programmed to align incoming data with the corresponding data elements in the existing warehouse model

• The incoming data sets are continuously monitored to verify that the data exchange interface has not changed

Taking these steps prior to ingestion requires effective project management, but ensuring the scheduling, operations, and fidelity of production intake processes may overwhelm those choosing to manually oversee the tasks. Data warehouse automation simplifies these processes by automating discovery and alignment of data source metadata with existing warehouse metadata and orchestrating data ingestion, transformation, and loading. The generated utilities are effectively self-documenting, enabling the developers to understand what the generated code does.

Adopting the agile development methodology for data warehouse development suggests that the days of the IT data practitioner as the data warehouse gatekeeper are over. Business users are significantly more knowledgeable than they were during the early days of data warehousing, and they have a much lower dependence on IT staff to meet many of their more mundane needs when it comes to developing reports or performing simple queries.

Yet, as more enlightened business analysts devise sophisticated analyses, the burden on IT staff only increases beyond the typical development, operations, and maintenance of the data warehouse platforms. Adopting DWA tools to support building and managing data warehouses reduces the IT staff’s burden for design and development. This provides more time for IT staff members to focus on helping business users evaluate their specific business problems and on how reporting and analysis can solve those problems (rather than delivering reports in a virtual vacuum).

According to the Agile Alliance, the agile software development methodology emphasizes “close collaboration between the programmer team and business experts; face-to-face communication (as more efficient than written documentation); frequent delivery of new deployable business value; tight, self-organizing teams.”1 Fostering increased developer-user communication can reduce the cycle time for developing reports and more complex analyses. Such communication can also lead to increased user satisfaction.

Train your data professionals to actively engage business users, effectively solicit business requirements, and (together with the business users) translate those requirements into directives to formulate reports and analyses that actually meet business needs. By relying on tools that can automate data discovery, warehouse modeling, and report creation, the collaborative knowledge transfer between IT staff and their business partners will educate the business analysts to become more self-sufficient. Business-user self-sufficiency triggers a virtuous cycle by yet again reducing the IT burden, freeing those resources to work with other business users to engender more self-sufficiency to provide ever more sophisticated analytics.

BE AGILE IN INGESTING NEW DATA SOURCES TRAIN DATA ENGINEERS TO COLLABORATE WITH BUSINESS USERS

NUMBER FOUR NUMBER FIVE

1 “What Is Agile Software Development?” downloaded November 6, 2015 from http://www.agilealliance.org/the-alliance/what-is-agile.

Page 6: December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA

5 TDWI RESE ARCH tdwi.org

TDWI CHECKLIST REPORT: SEVEN BEST PR ACTICES FOR ADOPTING DATA WAREHOUSE AUTOMATION

The notion of a business intelligence competency center (BICC) is motivated by the early challenges associated with data warehouse development and deployment, specifically around data integration, population of the data warehouse, and coordination among the business users to ensure data quality, consistency, and leveraging the investment in data warehouse environment technologies. The original goal of the BICC was to focus on standardizing policies for data warehouse use, centralizing the support of BI, and developing good practices and repeatable processes across the organization.

Practically, though, in many environments two conflicting facets hampered the success of the BICC. The first is the difficulty in organizing an IT team intended to enforce policies across business function boundaries. The second is the constraints on flexibility imposed as a way to enforce data usage policies and standards. As a result, the BICC often becomes the bottleneck as decisions about technology and architecture are subsumed within demands for report development and configuration. Essentially, strong centralized oversight reduces agility in the organization, limiting business-user gains from unfettered data exploration and discovery.

As organizations look to more agile methods to more quickly allow business users to control their data use, it may be time to reevaluate the goals of the BICC, determine where artificial barriers to knowledge have been institutionalized, and assess ways that it can be adapted to meet the needs of an increasingly enlightened business community.

Examine how to revise the BICC’s charter to differentiate between operational decision making associated with day-to-day tactical management of the data warehouse environment and strategic decision making related to evolving the environment over time. Adopt guiding principles that expand data warehouse utilization, such as simplifying the organizational data warehousing architecture, acquiring tools that reduce overall time to value, simplifying the repeatable processes for data warehouse instantiation, and increasing hands-on training and knowledge transfer with business partners. As business use increases, encourage business experts to take on a role within the structure of the BICC so that its management and oversight can be diffused among all stakeholders.

If becoming more agile means encouraging closer collaboration between developers and business experts, we can take that a step further in the data warehousing and BI world to enable the transfer of skills from the IT professionals to the business experts to make them self-sufficient. In effect, creating a more collaborative environment between the IT/data staff and the business users increases business-user independence and facilitates increased information utilization.

At the same time, empowering self-sufficient business users reduces business dependence on IT staff, reduces IT costs to support enhanced data use, and frees IT resources to focus on BI functionality, improved analytical precision, and expanded analytical services.

Self-service BI leverages intuitive user interfaces and data accessibility functionality that both guide the user in designing reports and analyses and exercise controls over what can and cannot be accessed. In addition, self-service BI relies on the availability of a semantic metadata repository that lists business terms and table column names and provides a shared glossary to ensure consistency in use when formulating new reports.

One of the key benefits of self-service BI is that, as the business users become more adept at developing their own analyses, they can speed the review cycle for discovery of actionable knowledge. However, supporting this decreased cycle time requires complementary speed of data warehouse development. Agile tools such as DWA can be used to quickly implement the warehouses and marts that provide flexibility to end users in configuring their own reports while limiting access to data needing additional protection.

RETHINK THE ROLE OF THE BI COMPETENCY CENTERSUPPORT BUSINESS SELF-SERVICE

NUMBER SEVEN NUMBER SIX

Page 7: December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA

6 TDWI RESE ARCH tdwi.org

TDWI CHECKLIST REPORT: SEVEN BEST PR ACTICES FOR ADOPTING DATA WAREHOUSE AUTOMATION

Business analysts have acquired a more cultivated awareness of data discovery, data preparation, report formulation, and predictive analytics and are increasingly taking control of their own reports and analyses. These same data consumers are benefitting from the explosion of data sources that can be captured within the analytics environment. However, as the number of data sources increases and the data sets become more diverse, maintaining a competitive advantage hinges on the speed at which new data sources can be acquired, ingested, and prepared for discovery and analysis.

It is unlikely that any organization will completely rip out their existing data warehouses and replace them with any specific new technology. Instead, the future data warehouse environment will be an evolving hybrid environment composed of conventional data warehouse architectures and high-performance components layered on the Hadoop ecosystem, and it will perhaps include specialty appliances or software accelerators such as columnar databases and in-memory computing systems.

As this evolution proceeds, data warehouse automation tools help to maintain agility while supporting the demands of legacy customers. The items on this checklist help in preparing your organization for defining assessment criteria, evaluating the existing environment, determining where there are opportunities for agility, and adopting DWA tools as a way to accelerate the transformation of the enterprise analytics environment.

When evaluating alternatives, the tools guide the developers in source analysis, in capturing requirements, and in automatically generating the data integration, loading, and presentation components of a data warehouse. The next step is to develop an integration plan. Pilot the tools by focusing on developing a data warehouse to support a specific business function or to analyze a specific subject area (like customers or vendors). Align your development methodology with the way that developers utilize the tools and the ways users expect to use the resulting warehouse. Reflect on lessons learned in terms of rapid design cycles, empowering users with self-service capabilities, and ingesting new data sources. These lessons will inform repeatable processes that will guide the evolution of the future data warehouse environment.

CONSIDERATIONS

DEVELOPING A FEASIBLE INTEGRATION PLAN FOR DATA WAREHOUSE AUTOMATION

Page 8: December 2015 - TDWI Checklist Report - Seven Best Practices for Adapting DWA

7 TDWI RESE ARCH tdwi.org

TDWI CHECKLIST REPORT: SEVEN BEST PR ACTICES FOR ADOPTING DATA WAREHOUSE AUTOMATION

TDWI Research provides research and advice for data professionals worldwide. TDWI Research focuses exclusively on business intelligence, data warehousing, and analytics issues and teams up with industry thought leaders and practitioners to deliver both broad and deep understanding of the business and technical challenges surrounding the deployment and use of business intelligence, data warehousing, and analytics solutions. TDWI Research offers in-depth research reports, commentary, inquiry services, and topical conferences as well as strategic planning services to user and vendor organizations.

ABOUT TDWI RESEARCH

ABOUT THE AUTHOR

David Loshin, president of Knowledge Integrity, Inc., (www.knowledge-integrity.com), is a recognized thought leader, TDWI instructor, and expert consultant in the areas of data management and business intelligence. David is a prolific author regarding business intelligence best practices, as the author of numerous books and papers on data management, including Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph and The Practitioner’s Guide to Data Quality Improvement, with additional content provided at www.dataqualitybook.com. David is a frequent invited speaker at conferences, Web seminars, and sponsored websites and channels including TechTarget and The Bloor Group. His best-selling book Master Data Management has been endorsed by many data management industry leaders.

David can be reached at [email protected].

TDWI Checklist Reports provide an overview of success factors for a specific project in business intelligence, data warehousing, or a related data management discipline. Companies may use this overview to get organized before beginning a project or to identify goals and areas of improvement for current projects.

ABOUT TDWI CHECKLIST REPORTS

timextender.com

TimeXtender is a world-leading data warehouse automation vendor dedicated to Microsoft SQL Server.

The TimeXtender software, TX DWA, revolutionizes the way a data warehouse is developed and maintained by automating all manual data warehouse processes—from design to development, operation, and maintenance to change management. TimeXtender ensures an improved and inexpensive solution that is fully documented.

The TX DWA software enables medium and large data-driven enterprises to get business intelligence done faster, more efficiently, and with less stress by providing “one truth” to improve decision-making processes and overall business performance that reduces costs and saves valuable time. TX DWA turns a business intelligence project into a business intelligence process with a flexible solution that expands as the business evolves and grows.

TimeXtender collaborates with VAR and OEM partners across six continents, providing more than 2,600 customers in 61 countries all over the world with its advanced data warehouse automation software.

Why wait days before taking action? With TimeXtender, data is available at your fingertips in mere hours!

ABOUT OUR SPONSOR