Big Data As a service - Sethuonline.com | Sathyabama University Chennai
-
Upload
sethuraman-r -
Category
Technology
-
view
106 -
download
1
Transcript of Big Data As a service - Sethuonline.com | Sathyabama University Chennai
R.Sethuraman M.E,(PhD).,Assistant Professor,
Faculty of Computing,Dept of Computer Science Engineering,
Sathyabama University,Chennai.
.
An Efficient Framework for Data As A Service in Hadoop EcoSystem
www.sethuonline.com
Agenda• Introduction
- Big Data Analytics- Hadoop EcoSystem
- Data As a Service• Literature Survey• Inference from the Survey• Problem Defined• Key Challenges of Problem• Proposed Methodology• References
www.sethuonline.com
Introduction
Big Data Analytics:• A process of examining large data sets containing various of
data types to uncover hidden patterns, unknown correlations and other useful business insights
• The sources for the large data sets includes server logs, social media, mobile devices and sensors. These data’s are of unstructured and semi-structured type.
• The traditional databases and Relational databases will not fit these unstructured and semi-structured data obtained from data sources
• This makes an necessity for the move to the new technology of Hadoop.
www.sethuonline.com
• Hadoop is an framework that supports the processing of huge and diversed data sets across clustered systems
• Hadoop does with support of related tools like YARN, MapReduce, Hive…
• This serves as an central repository for all incoming streams of raw data.
• Hadoop is not a single product instead its an collection of components.
• Its popularity is in storing, analyzing and in fast retrieval of unstructured data in low cost effective manner.
Hadoop EcoSystem
www.sethuonline.com
Data As A Service [DaaS]
• Data as a service (DaaS) is the delivery of statistical analysis tools or information obtained from large information sets in order to gain a competitive advantage for an organization.
• This is done over the immense volume of unstructured data that was updated in the regular basis
HOW IT WORKS: - the data’s obtained using web crawlers are sent into
framework of Hadoop for the following processing* Data Storage* Data Processing* Data Management
www.sethuonline.com
Literature Survey
S.NO Base Paper Proposed Work Limitations
1 Service-Generated Big Data and Big Data-as-a-Service“Zibin Zheng ; Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Hong Kong, China ; Jieming Zhu ; Lyu, M.R.”[2014 IEEE International Congress ]
BDaaS provide APIs to access service generated Big Data and Big Data Analytics results.
Heterogenous data’s are not handled in BDaaS, thus security is not considered in BigData Analysis
2 “Towards Cloud Based Analytics A s a Service for BigData Analytics in the cloud”“Chanchal Yadav, Shullang Wang, Manoj Kumar,” [IJCSN, Vol 2, Issuue 3, 2014 ISSN:2277-5420]
proposes the conceptual architecture of CLAaaS, a big data analytics service providing platform in cloud
Due to multi-tenancy, compromises are made at design level and requires efficient Text Processing algorithm for efficient data retrieval
www.sethuonline.com
Literature Survey
S.NO Base Paper Proposed Work Limitations
3 Wei Fan, Albert Bifet, “Mining Big Data: Current Status, and Forecast to the Future”, SIGKDD Explorations,2014 Volume 14, Issue 2
overview of architecture and algorithms used in large data sets. These algorithms define various structures and methods implemented to handle Big Data
Normalization, Record Linkage and Quality measures needs to be addressed.
4 Priya P. Sharma, Chandrakant P. Navdeti, “Securing Big Data Hadoop: A Review of SecurityIssues, Threats and Solution”, IJCSIT, Vol 5(2), 2014, 2126-2131
Big data security at the environment level along with security issues that we are dealing with today
Needs to ensure security for the data sources by which efficient data can be considered for business insights
www.sethuonline.com
Literature Survey “Service Generated Big Data and Big Data-as-a Service ”
“Zibin Zheng ; Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Hong Kong, China ; Jieming Zhu ; Lyu, M.R.”[2014 IEEE International Congress ]
This paper explains the Research on storing and processing the increasing amount of data obtained from various service generated Big Data and the analysis done by BDaaS for improved Analytics
Issues Addressed :• A single infrastructure provides functionality for storing and Analyzing different types
of service-generated BigData• BDaaS provide APIs to access service generated Big Data and Big Data Analytics
results. • Service Logs, Service Qos and Service Relationship are exploited to enhance system
performance.
www.sethuonline.com
Survey Contd…
“Service Generated Big Data and Big Data-as-a Service ”“Zibin Zheng ; Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Hong Kong, China ; Jieming Zhu ; Lyu, M.R.”[2014 IEEE International Congress ]
Issues not Addressed :• Heterogenous data’s are not condsidered in service generated
Big Data while enhancing quality of service oriented systems. • The technology road map for API’s are not synchronized• The security issues are not addressed with respect to service
providers and BDaaS. • Pattern Matching excluded for heterogenous data provides
inefficient retrieval of data’s.www.sethuonline.com
Literature Survey Contd…
• “Towards Cloud Based Analytics A s a Service for BigData Analytics in the cloud”“Chanchal Yadav, Shullang Wang, Manoj Kumar,” [IJCSN, Vol 2, Issuue 3, 2014
ISSN:2277-5420]
Issues Addressed :• This paper proposes the conceptual architecture of CLAaaS, a big data
analytics service providing platform in cloud. • This platform is equipped with customizable domain specific software
tools and workflow management system to facilitate the execution of big data.– Cognos product is used for statistical, business and scientific data
analysis – BigInsights used for visualization and predictive analytics– Weka provides Graphical User Interfaces
www.sethuonline.com
Survey Contd…
• “Towards Cloud Based Analytics A s a Service for BigData Analytics in the cloud”
“Chanchal Yadav, Shullang Wang, Manoj Kumar,” [IJCSN, Vol 2, Issuue 3, 2014 ISSN:2277-5420]
Issues not Addressed: • Compromises are made at Design level due to Multi-Tenancy• Seperation of the data of different users needs a new software• Promotion of web collaboration and concerns for data privacy in the cloud.• Text Processing is not handled for the retrieval activity for heterogeneous
data.
www.sethuonline.com
Inferences from literature survey
• Heterogenous data’s are not handled in BDaaS, thus security is not considered in BigData Analysis
• Needs to ensure security for the data sources by which efficient data can be considered for business insights
• Normalization, Record Linkage and Quality measures needs to be addressed.
• Due to multi-tenancy, compromises are made at design level and requires efficient Text Processing algorithm for efficient data retrieval
www.sethuonline.com
Problem Defined
• Data retrieval can be made effective for Unstructured and semistructured data by using Machine Learning Algorithms like page ranking and C4.5
• The process of Normalization can be improved with the implementation of text processing
• Record linkage done through efficient mining algorithms for heterogenous data
www.sethuonline.com
PROPOSED FRAMEWORK
www.sethuonline.com
Proposed Methodology • A new framework is proposed for to achieve the
efficient DaaS using machine learning Algorithms and Text processing.
• The machine learning algorithm C4.5 helps in building Decision Trees
• The equivalent to C4.5 is CART.• Page Ranking helps in basic graph analysis• The graphs are connected with each other.
www.sethuonline.com
List of References • How Treato Analyzes Health-related Social Media Big Data with Hadoop and HBase _
Cloudera Engineering[Assaf Yardeni,International Conference on Cloud, Big Data and Trust 2013, Nov 13-15, RGPV]
• Algorithm and Approaches to handle large Data- A Survey [Chanchal Yadav, Shullang Wang, Manoj Kumar,”, IJCSN, Vol 2, Issuue 3, 2013
ISSN:2277-5420• Managing Heterogeneous Sensor Data on a Big Data Platform IoT Services for Data-
intensive Science (Koji Zettsu, Takashi Kimata[Computer Software and Applications Conference Workshops (COMPSACW), 2014 IEEE 38th International]
• Performance and energy efficiency of big data applications in cloud environments A Hadoop case study(Eugen Feller, Lavanya Ramakrishnan, Christine Morin IJCSN” Volume 74, Issue 3, March 2014, Pages 2166–2179”)
• Service-generated Big Data and Big Data-as-a-Service Overview (Zibin Zheng, Jieming Zhu, and Michael R. Lyu university of HongKong, china[2014 IEEE International Congress ])
• Prompt Cloud is a leading web data crawling & extraction company, serving customers across the globe with valuable data to suit their business needs.