Karthikeyan 2015

5
A Comprehensive Survey on Variants And Its Extensions Of Big Data In Cloud Environment P. Karthikeyan Department of CSE, SMVEC Pondicherry, India [email protected] D. Sathian Department of CSE, Pondicherry University Pondicherry, India [email protected] J. Amudhavel Department of CSE, SMVEC Pondicherry, India [email protected] R.S. Raghav Department of CSE, Pondicherry University Pondicherry, India [email protected] A. Abraham Department of CSE, SMVEC Pondicherry, India [email protected] P. Dhavachelvan Department of CSE, Pondicherry University Pondicherry, India [email protected] ABSTRACT As technology grows very fast with trendy outcome applications like Social networking, web analysis, bio-informatics network analysis, product analysis, etc., a huge amount of heterogeneous data is delivered in a wide range. Effective management of this huge data is interesting but faces many challenges in accuracy and processing. When a term huge data arrives then a recent and growing field namely BIG DATA comes into the act as it becomes a mass attracter of industry, academia and government for efficient processing of variety of huge data. This paper surveys a various technologies and the different areas where big data is implemented currently with a help of cloud environment [1] and its complete architecture [13]. Following it also explains about the different map reduce techniques and the framework that is being implanted for processing such huge data. Finally we discuss the future on big data processing with the cloud environment and the challenges [28] faced at these areas. Categories and Subject Descriptors C. [Computer Systems Organization ] : C.2 [COMPUTER- COMMUNICATION NETWORKS ]: C.2.4 Distributed Systems General Terms Big Data Analysis, Routing mechanisms in Networking, Volume, Bio Medical Data, Big Data Storage. Keywords Big data, Cloud computing, Hadoop, Security. 1. INTRODUCTION Big data is nothing but data that overcomes the processing capacity of the current relational database systems. The data grows very fast and huge in size as technology grows rapidly in order to satisfy people daily needs and maintain business profits. And it also noted that due to vast improvement in social networking both structured and unstructured data are in process. To gain valid information from this data we need to choose an alternate way to process. The hot IT buzzword of 2012, big data has become viable as cost- effective approaches [23] have emerged to tame the volume, velocity and variability of massive data. Today‘s commodity hardware, cloud computing and some open source software[22] shows an easiest way to process such massive data very efficiently as many organizations like AMAZON,MICROSOFT rent their services at a lower cost that even small garage startups can also afford online storage and processing of data. The value of Big data to an business commodity falls into two categories: analytical use and developing new products. Big data analytics [30] can reveal insights hidden previously by data too costly to process, such as peer influence among customers, revealed by analyzing shoppers‘ transactions, social and geographical data. The past decade‘s successful startups like FACEBOOK is the milestone and an best example of Big Data which enables many users to share variety of new heavy loaded data thus enabling new services like Facebook to become an massive hit among various online services. The main attracting features of big data and cloud environment are that its data security [24] and scalability as data grows increase day by day. As a catch-all term, ―big data‖ can be pretty nebulous, in the same way that the term ―cloud‖ covers diverse technologies. Input data to big data systems could be chatter from social networks, web server logs, traffic flow sensors, satellite imagery, broadcast audio streams, banking transactions, MP3s of rock music, the content of web pages, scans of government documents, GPS trails, telemetry from automobiles, financial market data, the list goes on. 1.1 Motivation In recent times a word BIG DATA has become a big boom in almost all areas and even all researchers move towards the area to know what is Big data and its importance in forth coming years. The importance of big data resides in its three ‗V‘s‘ namely velocity, Variety and volume.Volume refers to the amount of data, variety refers to the number of types of data and velocity refers to the speed of dataprocessing. Data‘s are in two formats such as structured data and unstructured data, mostly the unstructured data are derived from data analysis, web analysis and social networking like Facebook, twitter, Gmail, etc., Thus when a huge amount of data is processed daily in every field the amount of data increases as it volume increases, so then we considered Big data is the exact tool to process all such huge data with much more efficient than the current database systems. Big data can be implemented through cloud computing [8] environments as the data resides in a secured manner. There are some popular frameworks like Apache Hadoop are the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected] . ICARCSET '15, March 06 - 07, 2015, Unnao, India Copyright 2015 ACM 978-1-4503-3441- 9/15/03…$15.00 http://dx.doi.org/10.1145/2743065.2743097

description

karthikey

Transcript of Karthikeyan 2015

Page 1: Karthikeyan 2015

A Comprehensive Survey on Variants And Its Extensions Of Big Data In Cloud Environment

P. Karthikeyan Department of CSE, SMVEC

Pondicherry, India

[email protected]

D. Sathian

Department of CSE, Pondicherry University

Pondicherry, India

[email protected]

J. Amudhavel Department of CSE, SMVEC

Pondicherry, India

[email protected]

R.S. Raghav Department of CSE, Pondicherry

University Pondicherry, India

[email protected]

A. Abraham Department of CSE, SMVEC

Pondicherry, India

[email protected]

P. Dhavachelvan Department of CSE, Pondicherry

University Pondicherry, India

[email protected]

ABSTRACT As technology grows very fast with trendy outcome applications like

Social networking, web analysis, bio-informatics network analysis,

product analysis, etc., a huge amount of heterogeneous data is

delivered in a wide range. Effective management of this huge data is

interesting but faces many challenges in accuracy and processing.

When a term huge data arrives then a recent and growing field

namely BIG DATA comes into the act as it becomes a mass attracter

of industry, academia and government for efficient processing of

variety of huge data. This paper surveys a various technologies and

the different areas where big data is implemented currently with a

help of cloud environment [1] and its complete architecture [13].

Following it also explains about the different map reduce techniques

and the framework that is being implanted for processing such huge

data. Finally we discuss the future on big data processing with the

cloud environment and the challenges [28] faced at these areas.

Categories and Subject Descriptors

C. [Computer Systems Organization ] : C.2 [COMPUTER-

COMMUNICATION NETWORKS ]: C.2.4 Distributed Systems

General Terms

Big Data Analysis, Routing mechanisms in Networking, Volume,

Bio Medical Data, Big Data Storage.

Keywords Big data, Cloud computing, Hadoop, Security.

1. INTRODUCTION Big data is nothing but data that overcomes the processing capacity

of the current relational database systems. The data grows very fast

and huge in size as technology grows rapidly in order to satisfy

people daily needs and maintain business profits. And it also noted

that due to vast improvement in social networking both structured

and unstructured data are in process. To gain valid information from

this data we need to choose an alternate way to process.

The hot IT buzzword of 2012, big data has become viable as cost-

effective approaches [23] have emerged to tame the volume, velocity

and variability of massive data. Today‘s commodity hardware, cloud

computing and some open source software[22] shows an easiest way

to process such massive data very efficiently as many organizations

like AMAZON,MICROSOFT rent their services at a lower cost that

even small garage startups can also afford online storage and

processing of data.

The value of Big data to an business commodity falls into two

categories: analytical use and developing new products. Big data

analytics [30] can reveal insights hidden previously by data too

costly to process, such as peer influence among customers, revealed

by analyzing shoppers‘ transactions, social and geographical data.

The past decade‘s successful startups like FACEBOOK is the

milestone and an best example of Big Data which enables many

users to share variety of new heavy loaded data thus enabling new

services like Facebook to become an massive hit among various

online services. The main attracting features of big data and cloud

environment are that its data security [24] and scalability as data

grows increase day by day.

As a catch-all term, ―big data‖ can be pretty nebulous, in the same

way that the term ―cloud‖ covers diverse technologies. Input data to

big data systems could be chatter from social networks, web server

logs, traffic flow sensors, satellite imagery, broadcast audio streams,

banking transactions, MP3s of rock music, the content of web pages,

scans of government documents, GPS trails, telemetry from

automobiles, financial market data, the list goes on.

1.1 Motivation In recent times a word BIG DATA has become a big boom in almost

all areas and even all researchers move towards the area to know

what is Big data and its importance in forth coming years. The

importance of big data resides in its three ‗V‘s‘ namely velocity,

Variety and volume.Volume refers to the amount of data, variety

refers to the number of types of data and velocity refers to the speed

of dataprocessing. Data‘s are in two formats such as structured data

and unstructured data, mostly the unstructured data are derived from

data analysis, web analysis and social networking like Facebook,

twitter, Gmail, etc., Thus when a huge amount of data is processed

daily in every field the amount of data increases as it volume

increases, so then we considered Big data is the exact tool to process

all such huge data with much more efficient than the current

database systems. Big data can be implemented through cloud

computing [8] environments as the data resides in a secured manner.

There are some popular frameworks like Apache Hadoop are the

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that

copies bear this notice and the full citation on the first page. Copyrights

for components of this work owned by others than ACM must be

honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior

specific permission and/or a fee. Request permissions

from [email protected].

ICARCSET '15, March 06 - 07, 2015, Unnao, India Copyright 2015

ACM 978-1-4503-3441-

9/15/03…$15.00 http://dx.doi.org/10.1145/2743065.2743097

Page 2: Karthikeyan 2015

platform to implement such big data applications as it provides its

own HDFS [9,10,12]file system. HDFS is very similar to a file

system, expect that files are replicated to multiple machines for

availability and scalability.

1.2 Organization

This paper is organized in such a way that Section I deals with the

Introduction part and the section II deals with the related works

based on survey of big data and its applications.

1.3 CONTRIBUTIONS

The main advantage of using big data in cloud is making efficient of

large data‘s. This spatiotemporal compression based approach based

upon processing the big data in cloud, which deals with the big

graph. In this they using three main process LEACH, spatiotemporal

compression and Data had driven scheduling. The cloud based

framework that for managing the big medical data [3] over

upcoming of large data‘s. This provides the cloud based self-caring

services for identify their own health problems. This will help the

patients to know their health condition directly whenever they

needed. The hybrid approach [4] for scalable anonymizaton [7] has

been done through the two algorithm TDS and BUG. This will

produce the efficient anonymizaton of data processing while large

amount of data is upcoming strategy. And the adaptive algorithm for

monitoring big data application for analyzing the performance,

workload and capacity planning and fault detection. By using this

real-time it is scalable and reliable big data monitoring.

2. RELATED WORKS

2.1A spatiotemporal compression based

approach Chi Yang, Xuyun Zhang, ChangminZhong, Chang Liua, Jian Pie [1]

proposed a new system deals with the concept of storing the big

graph data in cloud and to analyze and process those data. Here the

big data is compressed by their size with spatiotemporal features.

Now these compressed big graph data are grouped into clusters and

the workload is distributed [2] to all the edges among them to

achieve significant performance [12]. As it groups the big graph data

it is easier to access and process the data. This includes three main

processes. 1) LEACH (Low Energy Adaptive Clustering Hierarchy)

is TDMA based on MAC protocol and also uses a routing protocol

WSN (Wireless Sensor Network) which is used to compress the big

data or the big graph data. . As it compress the big data the memory

storage [17] in cloud is greatly reduced. 2) Spatiotemporal

compression can be used for the clustering process based

spatiotemporal data correlation which computes the similarities in

their time with the regression. In order to identify the similarities in

data, temporal prediction models should be developed. The

clustering process done here is by exploiting the data changing. Here

the multiple attributed data can also be compressed. 3) Data driven

scheduling to map all the data two mapping techniques are

introduced. They are node based mapping and edge based mapping.

Node based mapping produce unfair distribution of workload and

the edge based mapping is not suitable for data exchange cluster. So

the data driven scheduling is the only technique for data exchange

and for the distribution of workload. This scheduling enables to

distribute the workload more evenly in cloud platform.

2.2Home diagnosis service over big medical data Wenmin Lin, et al [2] suggested the home diagnosis method for

disease analysis. This is used as a self-caring service which means

one can primarily diagnose their health status. Here we can able to

identify our own disease by specifying the symptoms. So that it will

be useful for both patients and doctors to diagnose the problem. This

provides a cloud based self-caring service to identify their own

health problems. It helps patients to know more about their disease

condition directly. This takes less time to make decision for taking

right treating. This is implemented in three ways. (i) HDFS

(Hadoop distributed file system) is used to store the file across

collection of nodes in a cluster. The files that are stored in the cluster

are divided into blocks. (ii) Map Reduce is one of the parallel data

processing methods. The Map phase reads the data from the HDFS.

The process inputs are of independent values and these intermediate

values are stored in the local disk in the nodes. When the Mapping

phase is completed then the reduce phase starts. Reduce phase

implements the intermediate data having same key is aggregated.

(iii) Luceneisthe concepts where indexing and searching are

essential. Indexing is used to build index files for lucene document

and also enable searching ability.

2.3 External integrity verification for

outsourced big data

Chang Liu, et al. [3] brought a big picture by providing an analysis

on authenticator-based data integrity verification techniques on both

cloud and Internet of things as the data on this platform are

enormously in high increase. To solve this problem the two major

techniques are used, namely provable data possession (PDP) and

proofs of retrievability (POR). This paper mainly focuses on PDP as

it solves this issue by taking dynamic datasets [14] into account.

Setup and data upload, Authorization for TPA(Third party

Auditor),challenge and verification of data storage, proof

integration, proof verification ,updated data upload ,updated

metadata upload, verification of updated data are the general cycle

followed in CSS (Cloud Storage Service). This research work

provides a new way of retrieving data from the CSS by proving

external integrity to the third parties to verify the data. The main

problem of dynamic datasets storage with full updated information

[20] can be solved using new PDP techniques. As mentioned earlier

the two techniques namely POR and PDP are used. The main

difference between these two techniques is that POR takes all data

blocks for verification whereas PDP takes only small blocks of data

which is considered as the best economical way for analyzing data.

This paper uses two standard signature schemes (RSA & BSL) and

one authenticated data structure (MHT).PDP is further classified into

DPDP to handle dynamically more datasets. Since metadata for the

authentication to particular cloud user for storing data sets increases

with respect to the storage levels in datasets, a new type of PDP is

introduced called MR-PDP which provides verification for multi

replica cloud storage. A secret key is provided to the user so that

he/she can trust that their stored data will not be outsourced unless

until their secret key is used. This PDP technique is also extended to

hybrid cloud [6] as the datasets are need to be handled very

efficiently as many users are turning toward big data and availing

cloud service for their business truncations. This research areas also

leads to the data security [16,17,18] and data scalability in the cloud.

2.4 A hybrid approach for scalable sub-tree

anonymization using MapReduce Xuyun Zhang, et al. [4]discovered that the problem Sub-tree

anonymizaton is largely using for anonymize data sets for privacy

preservation, data privacy in big data applications is one of the most

important issues. The two sub tree anonymization is Top-Down

Specialization (TDS) and Bottom-Up Generalization (BUG).

Scalability issue of sub-tree anonymizaton over in big data on cloud

has been overcome by using combination of Top-Down

Specialization (TDS) and Bottom-Up Generalization (BUG). The

Experimental results show the efficiency of sub-tree data

anonymization while comparing with approaches which are exists.

By using hybrid approach automatically selects one of the

Page 3: Karthikeyan 2015

components (TDS) or (BUG) comparing using the k-anonymity

parameter with workload balancing point. Now existing approach

sub-tree anonymization lacking scalability in handling big data in

cloud.Still Now, the two components TDS or BUG gives poor

performance while works individually for valuing of k-anonymity

parameter. To improve this proposed a solution for the problem is

the combination of TDS and BUG gives efficient sub –tree

anonymization in big data and the Map Reduce of those two

components increase the scalability in big data The both components

are designed MapReduce jobs, So it accomplished highly Scalable..

The hybrid approach improves scalability and efficiency of sub-tree

anonymization over existing approaches. To implement this six main

algorithm are important. Algorithm 1, deals about MapReduce

Bottom-Up Generalization (MRBUG) driver the current anonymized

data satisfies the k-anonymizy requirement. Algorithm 2, is used to

identify the available generalizations. Algorithm 3, is for ILPG

calculating map. Algorithm 4, is for ILPG calculating Reduce.

Algorithm 5, is data generalization map and reduce. Algorithm 6, is

hybrid approach.

2.5 Adaptive, Scalable and reliable monitoring

of big data on clouds Mauro Andreolini, et al. [5] proposed an adaptive algorithm for

monitoring big data application because cloud resources is critical

for monitoring different types of tasks such as analyzing the

performance, management of workload and capacity planning and

fault detection. In high sampling frequency monitoring big data

producing application is difficult because of storing and managing

data. This application adapts the sampling intervals and frequency

updates to data characteristics and fulfil the needs of administrator.

The Experimental evaluation show the adaptive algorithm is

monitoring without penalizing of quality of data respect to the

algorithm. By using the adaptive algorithm it real-time it is scalable

and reliable big data monitoring. This adaptively will allow us to

limit computational [21] and communication cost. Growth of the

system components and data size is increasing, even the scalability

goal cannot take rest. This involves four main algorithms. Algorithm

1: Training phase, it evaluates the best value in the range and

choosing the maximum size in a subset used for training. Algorithm

2: Quality Evaluation, Algorithm 3: Adaptive Monitoring phase.

This will guarantee high reliability in capturing relevant load

changes. This algorithm reducing resource utilization [26] and

communication in big data without reducing the quality of data.

2.6 A security framework in G-Hadoop

JiaqiZhao, et al. [6] had proposed a security [16,17] model for a new

G-Hadoop framework [15] which is a successor of current Hadoop

framework. The Security model [19] incorporates several security

solutions such as public key cryptography and SSL protocol and it is

mainly concentrated on distributed systems for parallel processing of

multiple clusters available in the G-Hadoop framework. This

security model provides the G-hadoop framework from different

malicious attacks. This security model is focused to work with the

multiple clusters as the traditional method focuses only on single

cluster at a time. It provides communication between master and

slave. It protects against MIMT attacks. It is an extension of the

Hadoop Map Reduce framework with the functionality of allowing

the Map Reduce tasks to run on multiple clusters. G-Hadoop is an

extension of the Hadoop Map Reduce framework with the

functionality of allowing the Map Reduce tasks to run on multiple

clusters. The current prototype followed in G-Hadoop framework is

Secure Shell (SSL) protocol to establish a secure connection

between the user and the target cluster. This traditional approach

requires login authentication for every new cluster has been

processed inside the framework. This method has been overcomes in

the technique this paper proposes. The proposed model follows the

authentication concept of Globus SecurityInfrastructure (GSI) which

establishes a communication between master node and CA server. In

this method jobs are submitted to the cluster by simply logging into

the master node an no other authentication is needed. The phases

used in this model are User Authentication, Applying and issuing,

proxy credential, generating user session and preparing slave nodes,

handshaking between the master node and the slave node, job

execution and job termination.

2.7Putting analysis and big data in cloud HalukDemirkanandDursunDelen [7] proposed that in many

organizations one of the major trends [29] is using the service

oriented support-oriented decision support systems (DSS) is become

increasing. In this paper a list of requirements for service oriented

DSS, they said about the research directions and conceptual

framework for DSS in cloud. The unique part of the paper is how to

servitude the product oriented DSS environment and engineering

challenges in service oriented in cloud. This paper gives the new

knowledge about the service science in technology perspective to the

database and design science perspective for a broader audience.

Most of the organizations are care about service accuracy and

quality in addition then cost and delivery time. Information has been

managed by using the data management as metadata and information

management which has OLAP, Dashboards and internet search for

content and the last stage has operations management which has text

mining, data mining, optimization and simulation. DSS in cloud

enables scale, scope and speed economics. Reduction of unit cost

due to increase in operational size (scale) ,reduction in unit service

cost due to increase in number of services is developed and provide

(scope), reduction in unit cost due to increase in number of services

through supply or demand chain (speed).

2.8 Cloud computing with graphics processing

units Wenwu Tang and WenpengFeng [8] proposed parallel map

projection of spatial data this paper produces framework in both

cloud computing [16] and high performance computing. Growth of a

Varity of high volume of spatial data is larger. So that map

projection is needed. Due to the large size of the data complexity

[25] of map projection in algorithmic and the transformation of big

data represents the computational challenge. This framework study

based on layered work architecture and both capabilities [9] of cloud

computing and high performance computing done through the

Graphics Processing Units. The experimental results, this framework

provides the acceleration for big spatial data. For Processing and

analysis of big spatial data coupling [10] of cloud computing and

high performance computing is considered as the best efficient

solution. In this article they develop a frame work called parallel

map projection for vector-based big spatial data among alternative

map projection. This framework is based upon three main layers.

They are GPU-enable high performance computing, cloud

computing and Web GIS. This support for best leveraging

component for map projection. This has a key role in front end users

and back end high performance computing clusters. For handling

effectively big spatial data GPU clusters provide you parallel

computing power. The cloud computing emergence service provides

us scalable and demand computing resources.

Page 4: Karthikeyan 2015

3. OPEN CHALLENGES FACED BY BIG

DATA In recent trends everything is processed in cloud environment and if

the data exists current database systems capacity then a big domain

area called Big data [5] comes into the picture. Though big data is

widely used in many industries and application the following

challenges are faced.

3.1 Big Data Storage and Management The relational database system that is currently in use today does not

meet the requirements faced big data. The storage of such huge data

requires a convenient way of retrieving the data too. The main

challenge faced in data storage is that the updating of dynamic

content in the database. Even if the data are stored using some

frameworks like HDFS [9,10,12] system, the managing of data

needs an efficient way to retrieve when the data is needed. Such

challenges are needs to be concentrated while processing big data

and its datasets in cloud environment.

3.2 Data Security in Big Data The main challenge faced by big data is that the data stored are

secured or not. Big data uses many security [16,17,18] algorithms

and takes corrective measures to address the problem. Online

transactions [18] which use big data should be mainly taken care

since third parties are involved provided the access to analyze the

user‘s data. So Big data should provide valid trusted certificates in

order to provide perfect security to its user.

4. CONCLUSION

In this work, we presented a survey about importance of big data and

its applications in cloud environment by using MapReduce concept

from the Hadoop framework. Thus this paper concludes that using

MapReduce the data flow can be efficiently processed than the

current available systems so that the user to the cloud and big data

increases as data security strengthens more reliable than all the other

environment.We mainly disused on the usage of big data in the

CSScloud environment and the optimization of the MapReduce

concept. As technology improves the datasets will increase then the

data will become more complex [27] unless until it is processed with

big data .It is also guarantee that big data will lead to a bright

future of data management and security.

5.REFERENCES [1]Chi Yang, Xuyun Zhang, ChangminZhong, Chang Liu, Jian Pei,

KotagiriRamamohanarao, Jinjun Chen, A spatiotemporal

compression based approach for efficient big data processing

on Cloud, Journal of Computer and System Sciences, Volume

80, Issue 8, December 2014, Pages 1563-1583, ISSN 0022-

0000.

[2]Amudhavel, J, Vengattaraman, T, Basha, M.S.S, Dhavachelvan,

P, ―Effective Maintenance of Replica in Distributed Network

Environment Using DST‖, International Conference on

Advances in Recent Technologies in Communication and

Computing (ARTCom) 2010, vol, no, pp.252,254, 16-17 Oct.

2010, doi: 10.1109/ARTCom.2010.97.

[3]Wenmin Lin, Wanchun Dou, Zuojian Zhou, Chang Liu, A cloud-

based framework for Home-diagnosis service over big medical

data, Journal of Systems and Software, Volume 102, April

2015, Pages 192-206, ISSN 0164-1212.

[4]Raju, R, Amudhavel, J, Pavithra, M, Anuja, S, Abinaya, B, ―A

heuristic fault tolerant MapReduce framework for minimizing

makespan in Hybrid Cloud Environment‖, International

Conference on Green Computing Communication and

Electrical Engineering (ICGCCEE) 2014, vol, no, pp.1,4, 6-8

March 2014, doi: 10.1109/ICGCCEE.2014.6922462.

[5]Chang Liu, Chi Yang, Xuyun Zhang, Jinjun Chen, External

integrity verification for outsourced big data in cloud and IoT:

A big picture, Future Generation Computer Systems,

Available online 27 August 2014, ISSN 0167-739X.

[6] Xuyun Zhang, Chang Liu, Surya Nepal, Chi Yang, Wanchun

Dou, Jinjun Chen, A hybrid approach for scalable sub-tree

anonymization over big data using MapReduce on cloud,

Journal of Computer and System Sciences, Volume 80, Issue

5, August 2014, Pages 1008-1020, ISSN 0022-0000.

[7] Mauro Andreolini, Michele Colajanni, Marcello Pietri,

StefaniaTosi, Adaptive, scalable and reliable monitoring of big

data on clouds, Journal of Parallel and Distributed Computing,

Available online 26 August 2014, ISSN 07.

[8] Raju, R, Amudhavel, J, Kannan, N, Monisha, M, ―A bio inspired

Energy-Aware Multi objective Chiropteran Algorithm

(EAMOCA) for hybrid cloud computing environment‖,

International Conference on Green Computing

Communication and Electrical Engineering (ICGCCEE) 2014,

vol, no, pp.1,5, 6-8 March 2014.

[9] HalukDemirkan, DursunDelen, Leveraging the capabilities of

service-oriented decision support systems: Putting analytics

and big data in cloud, Decision Support Systems, Volume 55,

Issue 1, April 2013, Pages 412-421, ISSN 0167-9236.

[10] Wenwu Tang, WenpengFeng, Parallel map projection of

vector-based big spatial data: Coupling cloud computing with

graphics processing units, Computers, Environment and Urban

Systems, Available online 12 February 2014, ISSN 0198-

9715.

[11] Chao-Tung-Yang, Wen-Chung Shih,Lung-Teng Chen, Cheng-

Ta Kuo, Fuu-Cheng Jiang, Fang-YieLeu, Accessing medical

image file with co-allocation HDFS incloud,

Future Generation Computer Systems, Volumes 43–44,

February 2015, Pages 61-73, ISSN 0167-739X.

[12] FengTian, Tian Ma, Bo Dong, QinghuaZheng, PWLM-based

automatic performance model estimation method for HDFS

write and read operations, Future Generation Computer

Systems, Available online29 January 2015, ISSN

0167-739X.

[13] Demchenko, Y.; de Laat, C.; Membrey, P., "Defining

architecture components of the Big Data

Ecosystem," Collaboration Technologies and Systems (CTS),

2014 International Conference on , vol., no., pp.104,112, 19-

23 May 2014.

[14] Bo Dong, QinghuaZheng, FengTian, Kuo-Ming Chao, Nick

Godwin, Tian Ma, HaipengXu, Performance models and

dynamic characteristics analysis for HDFS write and read

operations: A systematic view, Journal of Systems and

Software, Volume 93, July 2014, Pages 132-151, ISSN 0164-

1212.

[15] Jason C. Cohen, SubrataAcharya, Towards a trusted HDFS

storage platform: Mitigating threats to Hadoop infrastructures

using hardware-accelerated encryption with TPM-rooted key

protection, Journal of Information Security and Applications,

Page 5: Karthikeyan 2015

Volume 19, Issue 3, July 2014, Pages 224-244, ISSN 2214-

2126.

[16] Raju, R, Amudhavel, J, Kannan, N, Monisha, M,―Interpretation

and evaluation of various hybrid energy aware technologies in

cloud computing environment — A detailed

survey‖,International Conference on Green Computing

Communication and Electrical Engineering (ICGCCEE)

2014 vol, no, pp.1,3, 6-8 March 2014, doi:

10.1109/ICGCCEE.2014.6922432.

[17] YifengLuo, SiqiangLuo, Jihong Guan, Shuigeng Zhou, A

RAMCloud Storage System based on HDFS Architecture,

implementation and evaluation, Journal of Systems and

Software, Volume 86, Issue 3, March 2013, Pages 744-750,

ISSN 0164-1212.

[18] In-Hwan Park, Moon-Moo Kim, Spermidine inhibits MMP-2

via modulation of histone acetyltransferase and histone

deacetylase in HDFS, International Journal of Biological

Macromolecules, Volume 51, Issue 5, December 2012, Pages

1003-1007, ISSN 0141-8130.

[19] Ben Brewster, Benn Kemp, Sara Galehbakhtiari and

BabakAkhgar, Chapter 8 - Cybercrime: Attack Motivations

and Implications for Big Data and National Security, In

Application of Big Data for National Security , Butterworth-

Heinemann, 2015, Pages 108-127, ISBN 9780128019672.

[20] Alberto De la Rosa Algarín and Steven A. Demurjian, Chapter

4 - An Approach to Facilitate Security Assurance for

Information Sharing and Exchange in Big-Data Applications,

In Emerging Trends in ICT Security, edited by BabakAkhgar

and Hamid R. Arabnia, Morgan Kaufmann, Boston, 2014,

Pages 65-83, ISBN 9780124114746.

[21]Chingfang Hsu, Bing Zeng, Maoyuan Zhang, A novel group key

transfer for big data security, Applied Mathematics and

Computation, Volume 249, 15 December 2014, Pages 436-

443, ISSN 0096-3003.

[22]P. Dhavachelvan, G.V. Uma, V.S.K.Venkatachalapathy (2006),

―A New Approach in Development of Distributed Framework

for Automated Software Testing Using Agents‖, International

Journal on Knowledge –Based Systems, Elsevier, Vol. 19, No.

4, pp. 235-247, August 2006.

[23]John N.A. Brown, Chapter 18 - Making Sense of the Noise: An

ABC Approach to Big Data and Security, In Application of

Big Data for National Security, Butterworth-Heinemann, 2015,

Pages 261-273, ISBN 9780128019672.

[24]Rupert Hollin, Chapter 2 - Drilling into the Big Data Gold Mine:

Data Fusion and High-Performance Analytics for Intelligence

Professionals, In Application of Big Data for National

Security, Butterworth-Heinemann, 2015, Pages 14-20, ISBN

9780128019672.

[25]P. Dhavachelvan, G.V.Uma (2005), ―Complexity Measures For

Software Systems: Towards Multi-Agent Based Software

Testing Proceedings - 2005 International Conference on

Intelligent Sensing and Information Processing, ICISIP'05

2005, Art. no. 1529476 , pp. 359-364.

[26]WenhongTian and Yong Zhao, 2 - Big Data Technologies and

Cloud Computing, In Optimized Cloud Resource Management

and Scheduling, edited by WenhongTian and Yong Zhao,

Morgan Kaufmann, Boston, 2015, Pages 17-49, ISBN

9780128014769.

[27]P. Dhavachelvan, G.V.Uma (2004), ―Reliability Enhancement in

Software Testing: An Agent-Based Approach for Complex

Systems‖, 7th ICIT 2004, Springer Verlag - Lecture Notes in

Computer Science (LNCS), Vol. 3356, pp. 282-291. ISSN:

0302-9743.

[28]Tao Huang, Liang Lan, Xuexian Fang, Peng An, Junxia Min,

Fudi Wang, Promises and Challenges of Big Data Computing

in Health Sciences, Big Data Research, Available online 18

February 2015, ISSN 2214-5796,

[29]Marcos D. Assunção, Rodrigo N. Calheiros, Silvia Bianchi,

Marco A.S. Netto, RajkumarBuyya, Big Data computing and

clouds: Trends and future directions, Journal of Parallel and

Distributed Computing, Available online 27 August 2014,

ISSN 0743-7315.

[30]Nauman Sheikh, Chapter 11 - Big Data, Hadoop, and Cloud

Computing, In MK Series on Business Intelligence, edited by

Nauman Sheikh, Morgan Kaufmann, Boston, 2013, Pages

185-197, Implementing Analytics, ISBN 9780124016965.