Great Data By Design
Embed Size (px)
The volume, velocity, and variety of data is increasing at an unprecedented pace. The amount of data generated in the world today is doubling every two years. Great data isn't an accident. It happens by design. You have to work at it…
Transcript of Great Data By Design
- Great Data. By Design.
- 2 Great data isnt an accident. It happens by design. Ensuring that you have the clean, safe, connected data you need to power confident decisions and effective business processes isnt an easy task. You have to work at it
- 3 The challenge is that the market trends are working against you as data professionals. Your jobs are getting harder.
- More Data. In More Places. Moving Faster Than Ever Before. Market Trend #1
- The volume, velocity, and variety of data is increasing at an unprecedented pace. The amount of data generated in the world today is doubling every two years. Its the new Moores law. 2009 0.8 Zettabytes 2020 35.2 Zettabytes
- And, to top it off, we are attaching RFID devices and sensors to everything. Technologies like Hadoop allow us to affordably store vast amounts of data. The power of mainframe computing now fits in the palm of our hands.
- Take jet airplanes for example. A jet aircraft engine has up to 3000 sensors on it, and they are constantly throwing off data. The amount of data that comes off an engine during flight ranges from .5 TB to 4 TB. And we are only just beginning.
- The volume, variety, and velocity of data will only continue to increase.
- Data is Everywhere and Its Quality is Questionable Market Trend #2
- Its in all the old places, and all the new ones. Both on-premise and in the cloud. Data is scattered everywhere Mobile Devices Social Media CRM Applications ERP Applications Message Queues Flat Files SensorsFlat Files Obscure Legacy Systems Databases Unstructured Docs Cloud Hadoop Clusters Mainframes
- It used to be that data integration projects were limited or put at risk by the cost and performance of CPU, memory, network, or disk. Today, thats no longer the case. Now were limited by our ability to deal with data that is fragmented and of poor or questionable quality.
- 12 To realize the full value of their data, organizations need to be able to integrate it across the entire enterprise. And data quality needs to be built into the process. Much like manufacturing went through a transition in the 80s where the quality steps for building products were baked into the manufacturing process the same needs to be done with data.
- The Business Wants Self Service Market Trend #3
- Over the last five years, business users have become more technically savvy. Easy-to-use technology now plays a large role in their personal lives, helping them do things faster, easier, and better. It has empowered them. And they expect the same experience at work. The business doesnt want to wait for IT to deliver great data. They want to do it on their own.The Empowered Consumer Search Social Networking Apps Mobility
- There are (some pretty cool) self-service tools that allow them to visualize their data. The trouble is, they only work for a single data set at a time. When the business needs data that crosses business boundaries, or data set boundaries, they still have to come back to IT.
- Or worse yet, they come back to IT because they have done all they can with their self-service tools and then realize that the data they are using is mission-critical and requires mission-critical processes that they cant run on their laptop.
- Self-service can only take the business so far.
- A new way of thinking is needed
- A lot of companies believe that the way to achieve competitive advantage is to focus on their core business processes. If were the best at what we do, we can beat the competition. And they arent entirely wrong.
- They believe that by investing in applications to support those core business processes they can use the new efficiencies or the improved service that comes from those efficiencies for competitive advantage. We need an application that will automate and improve our core processes, so we can beat the competition. And they arent entirely wrong.
- The trouble is: people still think about their business application as a single, monolithic thing. Business-Critical Application
- Thats where theyre wrong.
- The reality is that these processes and the core applications supporting them arent a single monolithic thing. Any business process today is highly distributed across multiple systems Business-Critical Applications
- . and the number of systems and data points to which data must flow in or out is only increasing. Business-Critical Applications
- It is generally true that innovation exists at the edges of boundaries, or the intersection of different disciplines. Innovation happens here
- As more data gets created across more systems, the ability to integrate and intersect data across those boundaries becomes a critical success factor for the next generation of innovation. Do we have all the data we need to support our compliance constraints? Who are our most profitable customers? How can I improve collaboration between suppliers and contractors? How do I accelerate my supply chain? Can we drive efficiencies in our procurement processes? Can we create new information based services to offer our customers Business-Critical Applications
- But integrating data is harder than most people think.
- Take the jet aircraft, for example. While the engines may be the same from plane to plane, the data coming off of them via their 3000 sensors -- is not controlled by the engine manufacturer. Its controlled by the airlines. And each airline stores those same 3000 attributes in their own format.
- Which means that when the data for the same kind of engine is sent back to the manufacturer for analysis, they first have to normalize it. What would seem like an easy exercise analyzing data from the same kind of engine -- is much harder than it looks. The additional challenge is that the legacy data never dies and has to be pulled in as well.
- Every data project is like this. It is always harder than anyone thinks and the number of moving parts is only increasing.
- To overcome this challenge, you have to design great data into your business processes.
- Just like you invest in people, process, and technology for your core business processes, you have to invest in people, process, and technology to integrate the distributed data that supports those processes.
- That is because business agility now depends on data integration agility. And data integration agility depends on getting everyone involved -- and ensuring that the business and IT have the right tools to enable collaboration. In fact, weve seen that in companies where the business and IT collaborate, DI projects are executed 5x faster than in companies where they dont.
- 34 Considerations for Designing Great Data
- Connect to all your data RDBMS, Flat Files, XML, Hadoop, NOSQL, Social Media, Mainframe, Machine Data, and More #1 Data integration enables you to combine data from many different and rich sources to produce new business information you couldnt get from a single source. Make sure your data integration tools are able to connect to any data source (both current and legacy) including RDBMS, NOSQL, mainframe, text, applications, and so on and not just the data sources you consume today. Its this universal set of connections that makes it possible to bring all that data together.
- Support the Right Format and Latency Batch, Real-Time, Near Real-Time. Structured, Unstructured, Semi-Unstructured. #2 In the same way data integration draws data from many different sources, it also must be able to consume various and multiple data types, including structured, semi-structured, and unstructured data sources in batch and real-time modes. You need a tool that is flexible enough to work wi