EMTM 600 Software Development Spring 2011 Lecture Notes 3.

40
EMTM 600 Software Development Spring 2011 Lecture Notes 3
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of EMTM 600 Software Development Spring 2011 Lecture Notes 3.

Page 1: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600Software Development

Spring 2011

Lecture Notes 3

Page 2: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Assignments for next time

• Read about the Java EE documentation in the handout “EMTM 600 Case Study based on Anchor Machinery”. CONTINUE DESIGNING YOUR OWN ENTERPRISE APPLICATION:• Design the Java EE documentation for three (3) of your user stories

(or two longer ones). Group your design/documentation around Façade/SessionEJBs. Think carefully about granularity (fine-grain decomposition can cause inefficiency). Same groups as for Assignment 1.

• Read from chapters 10,11, and 12 in Fowler about the patterns on pp.143-181, pp.195-214, pp.278-301. Then, CONTINUE DESIGNING YOUR OWN ENTERPRISE APPLICATION:• Describe the ORM patterns that you would use for your domain

model, 1-2p, same groups.

Page 3: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Transactions

Enterprise applications use domain models and databases to store information about their state

E.g., balances of all depositors

The occurrence of an application event that changes the state of shared domain objects and db records (data) requires the execution of a program that changes said state in a way that is consistent and correct in any context

E.g., balance must be updated when you deposit

A transactiontransaction is such a program

Page 4: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Transactions

Transactions are not just ordinary programs

Additional requirements are placed on transactions (and particularly their execution environment) that go beyond the requirements placed on ordinary programs.

Atomicity

Consistency

Isolation

Durability

(explained next)

ACID properties

Page 5: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Integrity Constraints

Enterprise business rules generally limit the occurrence/effect of certain application events. Student cannot register for a course if current number of

registrants = maximum allowed

Correspondingly, allowable data states are restricted. cur_reg <= max_reg

These limitations are expressed as integrityintegrity constraints.constraints.

Page 6: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Consistency

Transaction designer must ensure that

IF the data is in a state that satisfies all integrity constraints when execution of a transaction is started

THEN when the transaction completes: All integrity constraints are once again satisfied

(constraints can be violated in intermediate states) New data state satisfies the modifications that the

transaction is supposed to perform

Page 7: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Atomicity

A real-world event either happens or does not happen.

Student either registers or does not register.

Similarly, the system must ensure that either the transaction runs to completion (commits) or, if it does not complete, it has no effect at all (aborts).

This is not true of ordinary programs. A hardware or software failure could leave files partially updated.

Page 8: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Durability

The system must ensure that once a transaction commits its effect on the data state is not lost in spite of subsequent failures.

Not true of ordinary systems. For example, a media failure after a program terminates could cause the file system to be restored to a state that preceded the execution of the program.

Page 9: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Isolation

Deals with the execution of multiple transactions concurrently.

If the initial data state is consistent and accurately reflects the real-world state, then the serialserial (one after another) execution of a set of consistent transactions preserves consistency.

Beware: serial execution is inadequate from a performance perspective.

Page 10: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Transactions in Java EE

Both domain objects and database records can be shared between multiple clients/Web servers and even multiple applications. Hence transaction support is essential!

JDBC already has a simple level of transaction support.

In the interface java.sql.Connection we have

PreparedStatement prepareStatement(String sql)… void commit()…void rollback(Savepoint savepoint)…

void setAutoCommit(boolean autoCommit)

Page 11: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Need 2-PC (two-phase commit)

Main reason: to combine JDBC and other operations within the same transaction.

Hence we need to manage transactions over multiple resources. That’s what JTA, the Java Transaction API does.

Idea: have a resource manager for each resource and a central transaction manager

• Each resource manager stores temporarily transaction effects• Transaction manager polls resource managers• If all resource managers are ready, transaction manager

sends final commit order

Page 12: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

JTA

The transaction managers in Java EE application servers and even in some Web servers implement the JTA’s interfaces. But you must make sure the JDBC drivers (or other EIS “connectors” like JMS) implement the resource manager interface.

Some of the JTA interfaces are intended for application developers. For example,

javax.transaction.UserTransaction has

void begin()void commit()int getStatus()void rollback()void setRollbackOnly()void setTransactionTimeout(int seconds)

Page 13: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Efficiency issues caused by transactions

How to deal with concurrent transactions? I.e., what level of ISOLATION is needed? Degrees of pessimism:

1. Can read uncommitted data (nasty…)2. Unrepeatable reads within a transaction are possible (pretty bad…)3. Phantom records (added during and ignored by the transaction) are

possible (bad…)4. None of the above: serializable = fully isolated (wonderful, but

EXPENSIVE!!)

(An acceptable solution is optimistic concurrency control: don’t isolate reads, perform reads before writes, abort if changed.)

Page 14: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Transactions for EJBs

Who defines the transaction boundaries and behavior?• Client object (programmed, as for JDBC before)• Entity-bean-managed transactions• Container-managed transactions (automated for CMP, semi-

automated for BMP); container can create transactions as needed.

Container-managed transaction attribute values:• Required (uses existing, if not, creates new)• RequiredNew (creates new, suspends existing)• Supports (uses existing)• Mandatory (uses existing, if not, error)• NotSupported (suspends existing)• Never (if existing, error)

Not recommended

Not recommended

Recommended

Page 15: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Transaction efficiency is the reason for the Remote Façade pattern (Fowler, pp. 388-400)

If we access entity EJB methods directly from controllers (servlets), each method call creates a new transaction!

By grouping accesses to a set of entity EJBs in one session EJB, we create a new transaction for each session EJB method call only.

Each such method will make multiple calls to entity EJB methods, but no new transactions are created (except with requiresNew transaction attribute value)

The Remote Façade pattern and the container-managed transactions work together to make the implementation more efficient.

Page 16: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Summary of EJBs

MDBs play a role similar to servlets, but without the Web server support. Instead they use the EJB container’s services.

Entity EJBs implement the data in the domain model, i.e. the state of the entire application, during and between user sessions. They need persistence and concurrency.

Session EJBs implement the business rules and/or remote façade for the domain model and are called by servlets or MDBs. For the remote façade role, they provide coarse-grain access to fine grain domain objects.

Page 17: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Simplifications in EJB 3.0

Implementing EJB with three classes is cumbersome for developers.

In EJB 3.0, only one class is used. Hence EJBs in 3.0 are often called POJOs!

But these objects are not so plain!

Access to them benefits from container-managed transactions and security. Any additional classes needed are generated automatically by the EJB compiler.

Java 5 permits annotations. These are used in EJB 3.0 to move into the code what would otherwise go into the cumbersome XML descriptors.

Page 18: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Data Mapping Layer

Purpose: separates domain from data sources.

Like the MVC pattern, Data Mapping is a “super”-pattern based on the need make the business object representation in the domain independent from the way the state of the business objects is stored.

In essence, this comes down to Object-Relational Mapping (ORM)

Three reasons for this separation:

- data sources may (will!) be shared with other apps

- data sources are relational while the domain model is OO (richer!)

- DB programming is tricky; best left to specialists.

Page 19: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Persistence Solutions for Business ObjectsAll solutions are a form of Object-Relational Mapping (ORM). Options:

1. BusObj is POJO and handles persistence itself (Active Record pattern in Fowler) and it can live in the Web container (simple webapps).

2. BusObj is Entity Bean with Container-Managed Persistence (CMP) uses vendor-specific proprietary ORMs (which may or may not follow Domain Store p. 516 textbook).

3. BusObj is Entity Bean with Bean-Managed Persistence (BMP) and handled by itself or through a Data Access Object (DAO, see Data Access Object, p. 462 textbook).

4. BusObj is POJO and hadles its persistence itself or through separate DAO.

5. BusObj is a BMP entity bean that works as a façade for several other business business objects (see Composite Entity (p. 391 textbook).

6. A group of POJO business objects may be persisted together in joint and usually complex framework using multiple DAOs following Domain Store, p. 516 textbook. This framework may be a JDO implementation.

Page 20: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Enterprise Applications: Persistence Strategies

Data AccessObject

Session EJBs

CMPEntityEJBs

domain (business) logic (rules)

domain (business) objects

persistence

Specific toApp.Server

Vendor

CMP: Container-Managed PersistenceBMP: Bean-Managed Persistence

BMP EntityEJBs

POJO

ActiveRecordPOJO

Web containerJDBC JDBC

JDBC

JDBC

CompositeEntity BMP

EJBs

JDO product?

Page 21: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Plain CMPEasiest!

It is vendor-dependent, but all vendors of Java EE application servers implement this.

For example, WebSphere Application Server provides CMP that maps entity EJBs to a single relational table (one row per object) when the object attributes permit.

More complicated mappings are supported by the WebSphere Studio Application Developer (through wizards that allow the specification of various mappings).

But it may make scalability control very hard!

Page 22: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

JPAJava Persistence API (JPA) is also an open specification. In fact it was defined as part of EJB 3.0, recognizing that CMP and BMP in EJB 2.1 were too “heavyweight”. JPA helps you do persistence with POJOs. JPA 1.0 in 2006, JPA 2.0 coming out now.

JPA assumes a RDBMS backend.

Commercial implementations

IBM WebSphere, Oracle Application Server

Open Source implementations

Apache OpenJPA, Sun GlassFish Enterprise Server

DataNucleus Access Platform (available SourcceForge)

and see Hibernate

However: JPA 1.0 is a subset of JDO 2.0!! (going crazy yet?)

Page 23: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Hibernate

Started out as the Data Mapping layer of the open source Java EE implementation JBoss.

Later it was made compatible with JPA. It is much more popular than GlassFish or Apache OpenJPA.

Uses Java 5 annotations, obviating the need for XML configuration files

Implements most ORM design patterns, see next.

Page 24: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

CDM: Phylogeny Inference Data

Analyzed data: trees, matrices,

operational taxomic units (OTUs),

standard taxa

Matrix

OTU

Tree

StdTaxon

List

Set taxon

provenance authority

StdMatrix SeqMatrix

isA

EMTM 600 2011 Val Tannen

Page 25: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

More ORM IssuesHandling object identity:

Don’t want to create two copies of the same business object!

See Fowler’s Identity Map pattern.

Idea: have the mapper object keep a dictionary of objects that have been already loaded; sometimes called a Registry.

Mapping non-relational features:

Using UML for the Domain Model, we get features that are not in the relational model: associations, inheritance.

Luckily, the DB people have already worried about mapping from Entity-Relationship Diagrams to relational schemas. Mapping from UML is similar. See Fowler’s patterns Single Table Inheritance, Concrete Table Inheritance and Inheritance Mappers.

Page 26: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Domain Model for Anchor Machinery Case Study:

Page 27: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Example of object-relational mapping for some Anchor Machinery objects

ServiceTransaction

- servTranId : DBid- saleId : String- complaintNarrative : String- resolutionCode : String- resolutionNarrative: String- date: Date- startTime : Time- closeTime : Time- stillActive: Boolean- type : Enum{S,CC,R}

RepairEvent

- repEvId : DBid- eventDate: Date- eventCode : String- eventNarrative: String

RepTranRepEv

-repTranId : DBid-repEvId : DBid-position: int

Single TableInheritance Ordered

one-many relationship

Page 28: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

EE good practices, summary

1. When using Session Façade, provide both a remote and local interface. Sometimes the servlet runs on the same JVM and local interfaces are more efficient (EJB 2.0)

2. Entity EJBs are meant for persistence. Put the domain logic in session EJBs.

3. Do not separate session beans and entity beans on different hosts. (Therefore, your entity beans need only local interfaces.)

4. Do not use entity beans for read-only information. It adds unnecessary EJB container overhead.

Page 29: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

More Good Practices (re. coarse-grain, fine-grain)

Keeping this difference in mind has an essential effect on performance and scalability!

Worry about transactional granularity:• do not expose entity bean methods directly to controller code;

instead, wrap them with a session bean!• group multiple related user stories in the same session bean!

Worry about persistence granularity: • persist multiple related business objects by using the same entity

bean with multiple dependent objects! (see Composite Entity p. 391 in textbook)

• make one entity bean correspond to multiple related records in the relational database! (with CMP this may not always work as needed)

Page 30: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Data Integration Concept: K2

A system for the integration of heterogenous data with applications in bioinformatics.

User Requirements

Capabilities: response time measured in seconds Constraints: data in relational DBMS, OODBMS,

structured files, “boutique” databases, Web; high-level query language

Page 31: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Software Requirements

Logical Model

K2

query

answer

1.

2.

K2 + data sources

query

answer(data)

subord.

query

subord.

query

answer

answer

source 1

source n

Page 32: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

EMTM 600 2011 Val Tannen

Software Requirements

Logical Model

3.query

answer

subord.

query

subord.

query

answer

Translator

Optimizer

Decomposer

Integrator

K2

Driver 1

Driver n

answer

internal query

internal query

internal query

internal data

Translator internal data

Page 33: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

Data Integration/Sharing/ExchangeData Integration/Sharing/Exchange

Approach 1: StandardsApproach 1: Standardsand Data Warehousesand Data Warehouses

Build a grand unified data model and clearinghouse for all data Worth aspiring to for certain data – standards help immensely! Probably the predominant approach in bioinformatics

Standards help but don’t fully solvedon’t fully solve problems of data exchange and integration Needs change, science changes – need new versions of the standard

(hence no one standardno one standard)! Standards evolve slowly slowly – not well-suited for cutting-edge science Some have different needs different needs for their data – want a different

schema* Even if we have a standard schema, data may be of different levels

of quality quality – how do we agree on the “standard” version of the datadata?

EMTM 600 2011 Val Tannen

Page 34: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

Approach II: ExchangeApproach II: ExchangeAmong Cooperating SitesAmong Cooperating Sites

Everyone keeps their database, and uses point-to-point translators between theme.g., based on Web services, FTP, export files, etc.

Much more flexible – each site can control its own schema and do its own data curation

But also poses new challenges: Requires significant expertise at each site, to write the translators

(often in bothboth directions) Translators seldom work on incremental changes: “refreshesrefreshes”

are a major task Conflicting data; violated constraints; unexpected dependencies No tracking of changes tracking of changes or where data originated (provenanceprovenance) Small changes to any schema can cause many things to break

EMTM 600 2011 Val Tannen

Page 35: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

: Data Sharing among Collaborators

Todd Green, Greg Karvounarakis, Nick Taylor, Zachary Ives, Val Tannen

Frequent need to share structured data in a collaborative fashion Goal: import and modify (curate) each other’s data Not all data is reliable and not everyone agrees on what’s right!

ORCHESTRA enables peer-to-peer data sharing that: Preserves local control local control of data, points of viewpoints of view

• Sites can update data• Update exchangeUpdate exchange: propagation to other peers• Local control of what is imported

• Who do we trust? • Whom do we agree to give data to?• Used locally to resolve conflicts

Tracks provenance provenance – where data came from

DB

Queries, edits∆B+/−

∆A+/−∆C+/−

Peer Bob

Peer Alice

Peer Carol

+/−

EMTM 600 2011 Val Tannen

Page 36: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

Update exchange directed by schema mappings

Bob

Queries, edits

Alice Carol

Dan

Alice does not trust Dan’s data

Bob does not trust Carol’s data

Carol won’t share data with Bob

Dan won’t share data with Carol

EMTM 600 2011 Val Tannen

Page 37: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

Schema mappings

Bob

Alice Carol

Dan

Expressed in query language formalisms

They describe• Alternative uses of data

(union)• Joint uses of data (join)

EMTM 600 2011 Val Tannen

Page 38: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

What Is a Schema Mapping?What Is a Schema Mapping?

A constraint specifying that data must exist in one set of tables, given data in another set of tables, e.g.:

RefSeq:Species (taxId, taxName)Taxo:Names (taxId, taxName, 'scientific name')

One can think of it as resembling a queryJust as SQL queries offer benefits over Java code to maintain stored data, mappings offer benefits to data translation

EMTM 600 2011 Val Tannen

Page 39: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

What Can Schema Mappings Do?What Can Schema Mappings Do?

Mappings allow us to: RestructureRestructure data – combine or split tables, flatten or add

hierarchy Perform lookups, cross-referencescross-references, or link traversals TranslateTranslate synonyms or IDs Create special markers for unknownunknown data (analogous to SQL’s

NULL, but more powerful)Mappings can be:

ComposedComposed (we don’t need mappings between every pair of sites for them to share data!)

Inverted (if we want bidirectional data sharing) Tested for certain kinds of correctness Automatically expanded to add provenanceprovenance etc.

EMTM 600 2011 Val Tannen

Page 40: EMTM 600 Software Development Spring 2011 Lecture Notes 3.

Bob

Queries, edits

Alice Carol

Dan

Provenance is data annotation

So is trust informationAnd so is access control

information

DATA

Annotation: Comes from Alice or Carol; Don’t share with Bob!

EMTM 600 2011 Val Tannen