Associative Data Schemes for Cloud Computing

38
1 Associative Data Schemes for Cloud Computing Amir Basirat PhD Candidate [email protected] Supervisor: Dr Asad Khan Clayton School of IT, Monash University STINT Workshop, Lulea, Sweden - May 2012

description

Associative Data Schemes for Cloud Computing. Amir Basirat PhD Candidate [email protected] Supervisor: Dr Asad Khan. Clayton School of IT, Monash University STINT Workshop, Lulea, Sweden - May 2012. Contents. 1. Cloud Computing. 2. Hadoop MapReduce. 3. - PowerPoint PPT Presentation

Transcript of Associative Data Schemes for Cloud Computing

Page 1: Associative Data Schemes for Cloud Computing

1

Associative Data Schemes for Cloud Computing

Amir BasiratPhD Candidate

[email protected]

Supervisor: Dr Asad Khan

Clayton School of IT, Monash UniversitySTINT Workshop, Lulea, Sweden - May 2012

Page 2: Associative Data Schemes for Cloud Computing

2

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

Page 3: Associative Data Schemes for Cloud Computing

3

What is Cloud Computing?The vision of Cloud Computing encompasses a general shift of computer processing, storage, and software delivery away from the desktop and local servers, across the network, and into next generation of data centers hosted by large infrastructure companies.

Page 4: Associative Data Schemes for Cloud Computing

4

Big Data!

An IDC estimate put the size of the “digital universe” at 0.18 zetta-bytes back in 2006, and forecasted a tenfold growth by 2011 to 1.8 zetta-bytes.

This flood of data is coming from many sources. Consider the following:• The New York Stock Exchange generates about one terabyte of new trade

data per day.

• Facebook hosts approximately 10 billion photos, taking up one petabyte of storage.

• Ancestry.com, the genealogy site, stores around 2.5 petabytes of data.

• The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.

• The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year.

Page 5: Associative Data Schemes for Cloud Computing

5

Challenge?

Our existing capability to generate data seems to outstrip our capability to analyze it.

Page 6: Associative Data Schemes for Cloud Computing

6

Data Management in Cloud

There are some underlying issues that need to be addressed properly by any data management scheme deployed for clouds (Abadi, 2009), including:• capability to parallelise data workload• security concerns as a result of storing data at an untrusted host• and data replication functionality.

Thus the question, how to effectively process immense data sets is becoming increasingly urgent.

Page 7: Associative Data Schemes for Cloud Computing

7

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

Page 8: Associative Data Schemes for Cloud Computing

8

Hadoop

In a nutshell, what Hadoop provides: “A reliable shared storage and analysis system. The storage is provided by HDFS and analysis by MapReduce”

(Hadoop, 2011)

Page 9: Associative Data Schemes for Cloud Computing

9

Page 10: Associative Data Schemes for Cloud Computing

10

MapReduce

(Hadoop, 2011)

MapReduce programming model requires expressing the solutions with two functions: Map and Reduce. • A map function takes a key/value pair, computes and emits a set of

intermediate key/value pairs as output. • A reduce function merges all intermediate values associated with the same

intermediate key, executes some computation on them, and emits the final output.

Page 11: Associative Data Schemes for Cloud Computing

11

Word Count in MapReduce

1: class MAPPER2: method MAP (docid a, doc d)3: for all term t in doc d do4: EMIT(term t, count 1)

1: class REDUCER2: method REDUCE(term t, counts [c1,c2,…])3: sum = 04: for all count c in counts [c1,c2,…] do5: sum = sum + c6: EMIT(term t, count sum)

Pseudo code for word count algorithm in MapReduce

Page 12: Associative Data Schemes for Cloud Computing

12

Challenges and Hurdles in MapReduce

• Map function conducts its operation assuming all related data is distributed vertically, i.e. records being uniformly distributed across the network. However, it is possible that some parts of the related records being stored at different physical locations.

• Intermediate records would need to be sorted before these are input to the reduce function.

• Solution must be expressed in terms of the Map and Reduce functions working on key/value pairs, while in some cases this may not be possible or natural, such as multi-stage processes.

• Moreover, dependency on HDFS for data storage and retrieval can create single-points of failure for Map/Reduce infrastructure, especially at master nodes.

Page 13: Associative Data Schemes for Cloud Computing

13

Cloud Computing

Hadoop MapReduce

Distributed Hierarchical Graph Neuron (DHGN)

Graph Neuron (GN)

Hierarchical Graph Neuron (HGN)

Contents

8 Simulation Showcase

9 Question Time

Distributed Pattern Recognition

Edge Detecting Hierarchical Graph Neuron (EdgeHGN)

1

4

3

5

6

2

7

Existing data management schemes do not work well when data is partitioned among numerous available nodes dynamically.

Approaches towards scalable data management in cloud, which offer greater portability, manageability and compatibility of applications and data, are yet to be fully realised.

Page 14: Associative Data Schemes for Cloud Computing

14

Solution?

Treat data records as patterns

As a result, data storage and retrieval is performed using a distributed pattern recognition approach that is implemented through the integration of loosely-coupled computational networks, followed by a divide-and-distribute approach that allows distribution of these networks within the cloud dynamically.

To develop a distributed data access scheme that enables data storage and retrieval by association

Page 15: Associative Data Schemes for Cloud Computing

15

Associative Model of Data

This associative model treats data records as pattern and hence it does not matter how data is represented.

The associative model uses a single, common structure for all types of data

Page 16: Associative Data Schemes for Cloud Computing

16

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

Page 17: Associative Data Schemes for Cloud Computing

17

Distributed Pattern Recognition

Distributed computing approach offers seemingly unlimited scalability towards pattern growth with the rapid advent of network computing technology that enables processing to be performed within the body of a network rather than concentrating on exhaustive single-CPU utilization

Existing approaches are still lagged behind, due to highly-complex recognition algorithms being implemented.

Neural network approach offers promising tool for large-scale pattern recognition. However, there are also several issues related to its implementation. These include:

• convergence problems, • complex iterative learning procedures, • and low scalability with regards to the training data required for optimum

recognition

Page 18: Associative Data Schemes for Cloud Computing

18

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

Page 19: Associative Data Schemes for Cloud Computing

19

An eight node GN is in the process of storing patterns (Khan, 2002). P1 (RED), P2 (BLUE), P3 (BLACK), and P4 (GREEN)

Page 20: Associative Data Schemes for Cloud Computing

20

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

Page 21: Associative Data Schemes for Cloud Computing

21

Hierarchical Graph Neuron (HGN)

HGN compositions of 2-dimension (7x5) and 3-dimension (7x5x3) for pattern sizes

Page 22: Associative Data Schemes for Cloud Computing

22

Distributed Hierarchical Graph Neuron (DHGN)

DHGN distributed pattern recognition architecture (Muhammad Amin and Khan, 2009).

Page 23: Associative Data Schemes for Cloud Computing

23

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

Page 24: Associative Data Schemes for Cloud Computing

24

Research Objectives

• Redesigning data management architecture from a scalable associative computing perspective for creating a database-like functionality that can scale up or down over the available infrastructure without interruption or degradation, dynamically.

• Investigating a distributed data access scheme that enables data storage and retrieval by association while data records are treated as patterns

• Processing the database and handling the dynamic load using a distributed pattern recognition approach

• Developing an intelligent MapReduce framework that allows complex data representations to be used as keys for Map operations

• Reducing cloud storage fragmentation by implementing a divide-and-distribute approach

• Enhancing the existing cloud data management models for scalability

• Validation of results and finding asymptotical limits of the technique through a rigorously designed computer simulation environment

Page 25: Associative Data Schemes for Cloud Computing

25

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

Page 26: Associative Data Schemes for Cloud Computing

26

Progress to Date

• Proposing a Web-based GN for Real-time Image Recognition

Page 27: Associative Data Schemes for Cloud Computing

27

Web-based GN

(a) Total number of positive and negative matches. (b) Distortion rates for each line of image (each constructed HGN).

Image distortion rates vs. rotation degrees.

Page 28: Associative Data Schemes for Cloud Computing

28

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

Page 29: Associative Data Schemes for Cloud Computing

29

Edge Detecting Hierarchical Graph Neuron (EdgeHGN)

7-by-7 bit Binary Character A and its 7 equally-sized DHGN subnets

Reducing number of neurons by applying a drop-fall technique

Page 30: Associative Data Schemes for Cloud Computing

30

Drop Fall Scheme

• Drop-fall is often used for dividing touching pairs of digits into isolated character. Drop-fall algorithm simulates the path produced by a drop of water falling from above the character and sliding downwards along the contour under the action of gravity.

• When the drop gets stuck in a groove, it melts the character‘s stroke and then continues to fall. The dividing path produced by Drop-fall algorithm depends on three aspects: a start point, movement rules, and direction.

• There are four possible directions that generally produce four different paths to divide touching digits. They can start on the left or right side and can evolve downwards or upwards. One of the four is likely to produce the right result.

• Therefore, a set of Drop-fall algorithms consists of four methods which try to segment a block by simulating a drop-falling process: Descending-left algorithm, Descending-right algorithm, Ascending-left algorithm, and Ascending-right algorithm

Page 31: Associative Data Schemes for Cloud Computing

31

EdgeHGN Performance

Page 32: Associative Data Schemes for Cloud Computing

32

Cloud Computing

Hadoop MapReduce

Research Objective

Graph Neuron for Scalable Pattern Recognition

HGN and DHGN

Contents

8 EdgeHGN

9 Simulation Showcase

Pattern Recognition and Distributed Approach

Web-based GN

1

4

3

5

6

2

7

Page 33: Associative Data Schemes for Cloud Computing

33

Disclaimer

I am not proposing any computer vision scheme for Image processing here.

I am not suggesting in any way that my scheme is capable of competing against a bunch of image processing and face recognition algorithms which are treated in the literature.

I am doing pattern matching and I could simply use any form of data representation for the purpose of my research.

Images are complex matrixes of values, but people can relate to images very well, and that is why I found it an easy way to illustrate the effectiveness and strength of my proposed model.

Page 34: Associative Data Schemes for Cloud Computing

34

Binary Image Recognition

Fifty different individuals in the face image dataset obtained from the Face Recognition Data.

Page 35: Associative Data Schemes for Cloud Computing

35

Sobel Operator

Edge map after applying Global Binary Signature and Sobel‘s edge detection

In simple terms, the Sobel operator calculates the gradient of the image intensity at each point, giving the direction of the largest possible increase from light to dark and the rate of change in that direction.

The result therefore shows how "abruptly" or "smoothly" the image changes at that point, and therefore how likely it is that that part of the image represents an edge, as well as how that edge is likely to be oriented.

Page 36: Associative Data Schemes for Cloud Computing

36

References

Abadi, D.J. (2009). Data Management in the Cloud: Limitations and Opportunities, Bulletin of the Technical Committee on Data Engineering, pp. 3 - 12.

Khan, A. I. and Muhamad Amin, A. (2007). One shot associative memory method for distorted pattern recognition, Al 2007: Advances in Artificial Intelligence, Springer, Berlin/Heidelberg, pp. 705—709.

Muhamad Amin, A. and Khan, A. I. (2009). Collaborative-comparison learning for complex event detection using distributed hierarchical graph neuron (DHGN) approach in wireless sensor network, Al 2009: Advances in Artificial Intelligence, Springer, Berlin/Heidelberg, pp. 111—120

Nasution, B. B. and Khan, A. I. (2008). A hierarchical graph neuron scheme for real-time pattern recognition, IEEE Transactions on Neural Networks 19(2): 212—229.

Shiers, J. (2009). Grid today, clouds on the horizon, Computer Physics Communications, pp. 559 - 563.

Welsh, M., Malan, D., Duncan, B., Fulford-Jones, T. and Moulton, S. (2004). Wireless sensor networks for emergency medical care, GE global conference, Harvard university and Boston University school of medicine, Boston, MA.

Page 37: Associative Data Schemes for Cloud Computing

37

Acknowledgement

Thank You.

I would like here to thank everyone who helped me to make this possible. The first and foremost person that deserves immense gratitude is my thesis supervisor, Dr Asad Khan for his support and kind contributions.

Page 38: Associative Data Schemes for Cloud Computing

38