ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD?...

15
ISSUES THE CLOUD AND DATABASES

Transcript of ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD?...

Page 1: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

I S S U E S

THE CLOUD AND DATABASES

Page 2: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD?

• Analytical data management: data attributes• Far more reads than writes, so security and privacy less

of an issue• Tend to have far greater data needs, so there is a need

for more servers• The size of the data set grows over time and does not

stabilize, so a better fit with expanding cloud server availability

• Analytical applications often want data from multiple sources, and availability is much better in a cloud environment

Page 3: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

MORE ON ANALYTICAL PROCESSING

• Analytical Data Managements: system attributes• Shared nothing works better when access is mostly reads• ACID transactions do not need to be enforced as there is

no need for a single, global state for all users• Generally, statistical results are okay even if some very

secure data is not discovered

Page 4: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

WHAT IS NEEDED FOR NEW GENERATION OF CLOUD DBS?

• Focus on making use of broad parallelism and on shifting/expanding set of servers• Looser notion of fault tolerance, as there is often

no need to restart an interrupted query or if a branch of a query is killed• Need to be able to operate on data in multiple

formats, encryptions, attribute domains, namespaces, schemas, database products – heterogeneity!• Must be able to sit underneath business

intelligence systems

Page 5: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

HYBRID DATABASES: IS THIS THE ANSWER?

• Folks don’t want to learn/buy/program new data management products• But folks do want commercial grade systems with

professional support• Would make the transition from transaction apps

to analytical apps easier – like with relational data warehousing• But would we end up with an inelligant mess?

Page 6: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

WHAT ABOUT OBJECT DATABASES?A RETURN?

• Blending a host language with a query language makes sense when queries involve complex calculations• It is easy to extend an o-o language with

statistical procedures• The encapsulation of o-o languages is a good

match with the wide and independent distribution of data in a cloud environment• O-O procedures could be built and deployed by

distributed volunteers

Page 7: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

MOPE ON O-O DBS

• Partial results could be maintained and kept up to date, with batch updating of raw data only infrequently• We know how to build multiple language

interfaces to accommodate multiple o-o languages• O-O databases are a good match with service-

based interfaces – see diagram on page 29

Page 8: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

OBJECT-ORIENTED DBS: RELEVANT RESEARCH & DEV.

• Adaptive query processing and optimization in real time• Parallel and distributed database technology• Massively parallel systems• Shared nothing systems• Data management stream technology

Page 9: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

PROBLEM: MOST BUSINESS DATA RIGHT NOW IS IN A RELATIONAL FORMAT

• We don’t have truly massively parallel and distributed query models for relational data• We don’t have truly massively parallel and

distributed data partitioning for relational data• To perform efficient and fluid analytical

processing of data in the cloud, we would need to create new links quickly, but we won’t have a focused, fixed schema as we do in standard relational systems• Object extensions to relational systems don’t

include method encapsulation, only expanded domains

Page 10: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

MORE CLOUD ISSUES: CENTRALIZED CONTROL?

• Is the cloud trusted or anonymous?• Trusted, provider-specific commercial cloud solutions are

much safer, centrally managed, and optimized as a single network, not as a mesh of networks

• In many environments, even trusted, centralized environments, many machines are not properly managed and are controlled by immediate users

• People don’t like their machines being co-opted, and so trust is not enough to guarantee dependibility

Page 11: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

MORE ON THE CLOUD:OTHER APPLICATIONS?

• Is analytical processing the only likely application?• There are many data sharing applications• There are many applications for selling access to

bulk data• Data mining is a more focused form of analytical

processing, but demands a very precise level of heterogeneity resolution and integration in the case of most medical and financial applications (and others)

Page 12: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

DATA MINING

• Kinds of data (from Data Mining by Han and Kamber)• Relational dbs• Data warehouses• Transaction processing systems• Object-relational dbs• Time sequence and temporal dbs• Spatial dbs• Text dbs• Multimedia dbs• Legacy dbs• Data streams• The Web…

Page 13: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

HETEROGENEITY IN DATABASES: DATA MINING IMPLICATIONS

• Note how broad the “Web” is on the previous slide• Includes countless hand-rolled dbs• Includes databases hidden by web development

frameworks like Ruby on Rails• Includes data accessible only via specific APIs• Includes data accessible via XML and Xpath, Xquery

technology• Includes data stored in proprietary databases for

applications like CAD, finance, animation, geography

• The heterogeneity problem will only be solved by widespread collaboration on unifying standards

Page 14: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

MORE ON THE CLOUD: THE FUTURE OF TRANSACTION PROCESSING?

• Will the rigidly centralized notion of OLTP survive?• Corporations are adapting to the cloud incrementally and

using middleware to leverage their own clouds• With global business comes global data processing,

across time zones, and is often managed in a widely distributed fashion

• There are large corporations that handle financial and retail transactions for other companies

• Are people warming to the idea of managing their personal and small business data in the cloud, including document and other services?

Page 15: ISSUES THE CLOUD AND DATABASES. WHAT KIND OF DATA MANAGEMENT IS A GOOD FIT WITH THE CLOUD? Analytical data management: data attributes Far more reads.

BUT THE CLOUD IS PROCESS-CENTRIC AND NOT DATA-CENTRIC

• Is the process vs. data centric issued about to reawaken?• The process folks kind of lost…• Data is seen more and more as a valuable resource, even

if it is only “sold” indirectly• More of us are buying multimedia data

• There are actually 3 models, process and data centric, and encapsulated• Some argue that the cloud is actually an encapsulated

model and that in fact, data movement is difficult to optimize do to the dynamic nature of the network

• Object-oriented databases…?