Jumbune data analyzer
-
Upload
prachi-gupta -
Category
Technology
-
view
34 -
download
1
Transcript of Jumbune data analyzer
1
Jumbune Data Analyzer
2
Agenda
Enterprise Data Lake
Data Analyzer
Data Analysis Challenges
?
3
Data ETLing from all possible sources to Enterprise Data Lake throughReal time ingestionMicro batch ingestionBatch ingestionA unified hub makes analysis, management and access of data easier.Enterprise data lake enables ecosystem tools to collaboratively manage data.A place to store all data in its original fidelity, with the flexibility to run a variety of Enterprise workloads.
One Unified System: An Enterprise Data Lake
4
Data Quality – data values as per business KPI
Data Profiling– statistical assessment of data
Data Governance – management of data
Data Lineage – define data lifecycle
Data Security – protecting data from unauthorized users
Key elements of an Enterprise Data Lake
BIG DATA
5
Incremental imports may ingest Bad DataAnalyzing anomalies in HDFS dataTracking data quality over timeTracing bad data out of billions of rowsDisplaying concise meaningful results
Major challenges in Data Analysis
6
Jumbune’s Data Analyzer
7
Gain a better control over Data Analysis
• Quality
• Profile
• Control
• Analyse
Timelines
ViolationsBusiness
Rules
Anomalies
• Gives a centralized dashboard for profiling data quality to gain better control
• Leverage Jumbune’s infrastructure to get capabilities of remote profiling capabilities
• No data movement required for performing data profiling
• No specialized MapReduce or coding skills are required to validate data.
8
Offering Data Quality and Data Profiling to Enterprise Data Lake
• Tracing the conservation of data quality on timeline, even in massive data offloading environment.
• Real time data quality monitoring tracked against customizable KPIs
• Statistic assessment of data values within a data set for consistency, uniqueness and logic.
• Gauging the data profiles as per the business rules.
Data Quality Timeline
Data Profiling
9
jumbune
Data Analysis Component Data Analysis Process
HDFS/NFSRecords AnalysisData Profiling & Quality Reports
10
Validates inconsistencies in data in form of :Null ChecksData Type ChecksRegular ExpressionsIn depth record level data violation reports, can be drilled to line and field level.Offers to generically specify data quality requirements according to user’s data lake.Makes impossible looking quality checks on Big Data Lake possible.Doesn’t require data to be moved out of Hadoop for testifying anomaliesCurrently, Jumbune supports HDFS, NFS as Data Lake.
Data Quality: Provides Generic way of testifying Anomalies
11
Data Profiling: Provides lake insights
Remote
Centralized
Integrate
Generic
• Statistical analysis of data values present in the enterprise data lake.
• Computes various profiles that help you become familiar with data.
• Evaluating structure of the data set in the enterprise data lake according to the set of business rules.
• Helps to know whether existing data can be used for more analytics.
Let’s provision a clean Enterprise Data Lake
Website• http://jumbune.org
Contribute• http://github.com/impetus-opensource/jumbune• http://jumbune.org/jira/JUM
Social• Follow @jumbune Use #jumbune• Jumbune Group: http://linkd.in/1mUmcYm
Forums• Users: [email protected] • Dev: [email protected]• Issues: [email protected]
Downloads• http://jumbune.org• https://bintray.com/jumbune/downloads/jumbune