EMTM 600 Software Development Spring 2011 Lecture Notes 3.
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of EMTM 600 Software Development Spring 2011 Lecture Notes 3.
EMTM 600Software Development
Spring 2011
Lecture Notes 3
EMTM 600 2011 Val Tannen
Assignments for next time
• Read about the Java EE documentation in the handout “EMTM 600 Case Study based on Anchor Machinery”. CONTINUE DESIGNING YOUR OWN ENTERPRISE APPLICATION:• Design the Java EE documentation for three (3) of your user stories
(or two longer ones). Group your design/documentation around Façade/SessionEJBs. Think carefully about granularity (fine-grain decomposition can cause inefficiency). Same groups as for Assignment 1.
• Read from chapters 10,11, and 12 in Fowler about the patterns on pp.143-181, pp.195-214, pp.278-301. Then, CONTINUE DESIGNING YOUR OWN ENTERPRISE APPLICATION:• Describe the ORM patterns that you would use for your domain
model, 1-2p, same groups.
EMTM 600 2011 Val Tannen
Transactions
Enterprise applications use domain models and databases to store information about their state
E.g., balances of all depositors
The occurrence of an application event that changes the state of shared domain objects and db records (data) requires the execution of a program that changes said state in a way that is consistent and correct in any context
E.g., balance must be updated when you deposit
A transactiontransaction is such a program
EMTM 600 2011 Val Tannen
Transactions
Transactions are not just ordinary programs
Additional requirements are placed on transactions (and particularly their execution environment) that go beyond the requirements placed on ordinary programs.
Atomicity
Consistency
Isolation
Durability
(explained next)
ACID properties
EMTM 600 2011 Val Tannen
Integrity Constraints
Enterprise business rules generally limit the occurrence/effect of certain application events. Student cannot register for a course if current number of
registrants = maximum allowed
Correspondingly, allowable data states are restricted. cur_reg <= max_reg
These limitations are expressed as integrityintegrity constraints.constraints.
EMTM 600 2011 Val Tannen
Consistency
Transaction designer must ensure that
IF the data is in a state that satisfies all integrity constraints when execution of a transaction is started
THEN when the transaction completes: All integrity constraints are once again satisfied
(constraints can be violated in intermediate states) New data state satisfies the modifications that the
transaction is supposed to perform
EMTM 600 2011 Val Tannen
Atomicity
A real-world event either happens or does not happen.
Student either registers or does not register.
Similarly, the system must ensure that either the transaction runs to completion (commits) or, if it does not complete, it has no effect at all (aborts).
This is not true of ordinary programs. A hardware or software failure could leave files partially updated.
EMTM 600 2011 Val Tannen
Durability
The system must ensure that once a transaction commits its effect on the data state is not lost in spite of subsequent failures.
Not true of ordinary systems. For example, a media failure after a program terminates could cause the file system to be restored to a state that preceded the execution of the program.
EMTM 600 2011 Val Tannen
Isolation
Deals with the execution of multiple transactions concurrently.
If the initial data state is consistent and accurately reflects the real-world state, then the serialserial (one after another) execution of a set of consistent transactions preserves consistency.
Beware: serial execution is inadequate from a performance perspective.
EMTM 600 2011 Val Tannen
Transactions in Java EE
Both domain objects and database records can be shared between multiple clients/Web servers and even multiple applications. Hence transaction support is essential!
JDBC already has a simple level of transaction support.
In the interface java.sql.Connection we have
PreparedStatement prepareStatement(String sql)… void commit()…void rollback(Savepoint savepoint)…
void setAutoCommit(boolean autoCommit)
…
EMTM 600 2011 Val Tannen
Need 2-PC (two-phase commit)
Main reason: to combine JDBC and other operations within the same transaction.
Hence we need to manage transactions over multiple resources. That’s what JTA, the Java Transaction API does.
Idea: have a resource manager for each resource and a central transaction manager
• Each resource manager stores temporarily transaction effects• Transaction manager polls resource managers• If all resource managers are ready, transaction manager
sends final commit order
EMTM 600 2011 Val Tannen
JTA
The transaction managers in Java EE application servers and even in some Web servers implement the JTA’s interfaces. But you must make sure the JDBC drivers (or other EIS “connectors” like JMS) implement the resource manager interface.
Some of the JTA interfaces are intended for application developers. For example,
javax.transaction.UserTransaction has
void begin()void commit()int getStatus()void rollback()void setRollbackOnly()void setTransactionTimeout(int seconds)
EMTM 600 2011 Val Tannen
Efficiency issues caused by transactions
How to deal with concurrent transactions? I.e., what level of ISOLATION is needed? Degrees of pessimism:
1. Can read uncommitted data (nasty…)2. Unrepeatable reads within a transaction are possible (pretty bad…)3. Phantom records (added during and ignored by the transaction) are
possible (bad…)4. None of the above: serializable = fully isolated (wonderful, but
EXPENSIVE!!)
(An acceptable solution is optimistic concurrency control: don’t isolate reads, perform reads before writes, abort if changed.)
EMTM 600 2011 Val Tannen
Transactions for EJBs
Who defines the transaction boundaries and behavior?• Client object (programmed, as for JDBC before)• Entity-bean-managed transactions• Container-managed transactions (automated for CMP, semi-
automated for BMP); container can create transactions as needed.
Container-managed transaction attribute values:• Required (uses existing, if not, creates new)• RequiredNew (creates new, suspends existing)• Supports (uses existing)• Mandatory (uses existing, if not, error)• NotSupported (suspends existing)• Never (if existing, error)
Not recommended
Not recommended
Recommended
EMTM 600 2011 Val Tannen
Transaction efficiency is the reason for the Remote Façade pattern (Fowler, pp. 388-400)
If we access entity EJB methods directly from controllers (servlets), each method call creates a new transaction!
By grouping accesses to a set of entity EJBs in one session EJB, we create a new transaction for each session EJB method call only.
Each such method will make multiple calls to entity EJB methods, but no new transactions are created (except with requiresNew transaction attribute value)
The Remote Façade pattern and the container-managed transactions work together to make the implementation more efficient.
EMTM 600 2011 Val Tannen
Summary of EJBs
MDBs play a role similar to servlets, but without the Web server support. Instead they use the EJB container’s services.
Entity EJBs implement the data in the domain model, i.e. the state of the entire application, during and between user sessions. They need persistence and concurrency.
Session EJBs implement the business rules and/or remote façade for the domain model and are called by servlets or MDBs. For the remote façade role, they provide coarse-grain access to fine grain domain objects.
EMTM 600 2011 Val Tannen
Simplifications in EJB 3.0
Implementing EJB with three classes is cumbersome for developers.
In EJB 3.0, only one class is used. Hence EJBs in 3.0 are often called POJOs!
But these objects are not so plain!
Access to them benefits from container-managed transactions and security. Any additional classes needed are generated automatically by the EJB compiler.
Java 5 permits annotations. These are used in EJB 3.0 to move into the code what would otherwise go into the cumbersome XML descriptors.
EMTM 600 2011 Val Tannen
Data Mapping Layer
Purpose: separates domain from data sources.
Like the MVC pattern, Data Mapping is a “super”-pattern based on the need make the business object representation in the domain independent from the way the state of the business objects is stored.
In essence, this comes down to Object-Relational Mapping (ORM)
Three reasons for this separation:
- data sources may (will!) be shared with other apps
- data sources are relational while the domain model is OO (richer!)
- DB programming is tricky; best left to specialists.
EMTM 600 2011 Val Tannen
Persistence Solutions for Business ObjectsAll solutions are a form of Object-Relational Mapping (ORM). Options:
1. BusObj is POJO and handles persistence itself (Active Record pattern in Fowler) and it can live in the Web container (simple webapps).
2. BusObj is Entity Bean with Container-Managed Persistence (CMP) uses vendor-specific proprietary ORMs (which may or may not follow Domain Store p. 516 textbook).
3. BusObj is Entity Bean with Bean-Managed Persistence (BMP) and handled by itself or through a Data Access Object (DAO, see Data Access Object, p. 462 textbook).
4. BusObj is POJO and hadles its persistence itself or through separate DAO.
5. BusObj is a BMP entity bean that works as a façade for several other business business objects (see Composite Entity (p. 391 textbook).
6. A group of POJO business objects may be persisted together in joint and usually complex framework using multiple DAOs following Domain Store, p. 516 textbook. This framework may be a JDO implementation.
EMTM 600 2011 Val Tannen
Enterprise Applications: Persistence Strategies
Data AccessObject
Session EJBs
CMPEntityEJBs
domain (business) logic (rules)
domain (business) objects
persistence
Specific toApp.Server
Vendor
CMP: Container-Managed PersistenceBMP: Bean-Managed Persistence
BMP EntityEJBs
POJO
ActiveRecordPOJO
Web containerJDBC JDBC
JDBC
JDBC
CompositeEntity BMP
EJBs
JDO product?
EMTM 600 2011 Val Tannen
Plain CMPEasiest!
It is vendor-dependent, but all vendors of Java EE application servers implement this.
For example, WebSphere Application Server provides CMP that maps entity EJBs to a single relational table (one row per object) when the object attributes permit.
More complicated mappings are supported by the WebSphere Studio Application Developer (through wizards that allow the specification of various mappings).
But it may make scalability control very hard!
EMTM 600 2011 Val Tannen
JPAJava Persistence API (JPA) is also an open specification. In fact it was defined as part of EJB 3.0, recognizing that CMP and BMP in EJB 2.1 were too “heavyweight”. JPA helps you do persistence with POJOs. JPA 1.0 in 2006, JPA 2.0 coming out now.
JPA assumes a RDBMS backend.
Commercial implementations
IBM WebSphere, Oracle Application Server
Open Source implementations
Apache OpenJPA, Sun GlassFish Enterprise Server
DataNucleus Access Platform (available SourcceForge)
and see Hibernate
However: JPA 1.0 is a subset of JDO 2.0!! (going crazy yet?)
EMTM 600 2011 Val Tannen
Hibernate
Started out as the Data Mapping layer of the open source Java EE implementation JBoss.
Later it was made compatible with JPA. It is much more popular than GlassFish or Apache OpenJPA.
Uses Java 5 annotations, obviating the need for XML configuration files
Implements most ORM design patterns, see next.
CDM: Phylogeny Inference Data
Analyzed data: trees, matrices,
operational taxomic units (OTUs),
standard taxa
Matrix
OTU
Tree
StdTaxon
List
Set taxon
provenance authority
StdMatrix SeqMatrix
isA
EMTM 600 2011 Val Tannen
EMTM 600 2011 Val Tannen
More ORM IssuesHandling object identity:
Don’t want to create two copies of the same business object!
See Fowler’s Identity Map pattern.
Idea: have the mapper object keep a dictionary of objects that have been already loaded; sometimes called a Registry.
Mapping non-relational features:
Using UML for the Domain Model, we get features that are not in the relational model: associations, inheritance.
Luckily, the DB people have already worried about mapping from Entity-Relationship Diagrams to relational schemas. Mapping from UML is similar. See Fowler’s patterns Single Table Inheritance, Concrete Table Inheritance and Inheritance Mappers.
EMTM 600 2011 Val Tannen
Domain Model for Anchor Machinery Case Study:
EMTM 600 2011 Val Tannen
Example of object-relational mapping for some Anchor Machinery objects
ServiceTransaction
- servTranId : DBid- saleId : String- complaintNarrative : String- resolutionCode : String- resolutionNarrative: String- date: Date- startTime : Time- closeTime : Time- stillActive: Boolean- type : Enum{S,CC,R}
RepairEvent
- repEvId : DBid- eventDate: Date- eventCode : String- eventNarrative: String
RepTranRepEv
-repTranId : DBid-repEvId : DBid-position: int
Single TableInheritance Ordered
one-many relationship
EMTM 600 2011 Val Tannen
EE good practices, summary
1. When using Session Façade, provide both a remote and local interface. Sometimes the servlet runs on the same JVM and local interfaces are more efficient (EJB 2.0)
2. Entity EJBs are meant for persistence. Put the domain logic in session EJBs.
3. Do not separate session beans and entity beans on different hosts. (Therefore, your entity beans need only local interfaces.)
4. Do not use entity beans for read-only information. It adds unnecessary EJB container overhead.
EMTM 600 2011 Val Tannen
More Good Practices (re. coarse-grain, fine-grain)
Keeping this difference in mind has an essential effect on performance and scalability!
Worry about transactional granularity:• do not expose entity bean methods directly to controller code;
instead, wrap them with a session bean!• group multiple related user stories in the same session bean!
Worry about persistence granularity: • persist multiple related business objects by using the same entity
bean with multiple dependent objects! (see Composite Entity p. 391 in textbook)
• make one entity bean correspond to multiple related records in the relational database! (with CMP this may not always work as needed)
EMTM 600 2011 Val Tannen
Data Integration Concept: K2
A system for the integration of heterogenous data with applications in bioinformatics.
User Requirements
Capabilities: response time measured in seconds Constraints: data in relational DBMS, OODBMS,
structured files, “boutique” databases, Web; high-level query language
EMTM 600 2011 Val Tannen
Software Requirements
Logical Model
K2
query
answer
1.
2.
K2 + data sources
query
answer(data)
subord.
query
subord.
query
answer
answer
source 1
source n
EMTM 600 2011 Val Tannen
Software Requirements
Logical Model
3.query
answer
subord.
query
subord.
query
answer
Translator
Optimizer
Decomposer
Integrator
K2
Driver 1
Driver n
answer
internal query
internal query
internal query
internal data
Translator internal data
Data Integration/Sharing/ExchangeData Integration/Sharing/Exchange
Approach 1: StandardsApproach 1: Standardsand Data Warehousesand Data Warehouses
Build a grand unified data model and clearinghouse for all data Worth aspiring to for certain data – standards help immensely! Probably the predominant approach in bioinformatics
Standards help but don’t fully solvedon’t fully solve problems of data exchange and integration Needs change, science changes – need new versions of the standard
(hence no one standardno one standard)! Standards evolve slowly slowly – not well-suited for cutting-edge science Some have different needs different needs for their data – want a different
schema* Even if we have a standard schema, data may be of different levels
of quality quality – how do we agree on the “standard” version of the datadata?
EMTM 600 2011 Val Tannen
Approach II: ExchangeApproach II: ExchangeAmong Cooperating SitesAmong Cooperating Sites
Everyone keeps their database, and uses point-to-point translators between theme.g., based on Web services, FTP, export files, etc.
Much more flexible – each site can control its own schema and do its own data curation
But also poses new challenges: Requires significant expertise at each site, to write the translators
(often in bothboth directions) Translators seldom work on incremental changes: “refreshesrefreshes”
are a major task Conflicting data; violated constraints; unexpected dependencies No tracking of changes tracking of changes or where data originated (provenanceprovenance) Small changes to any schema can cause many things to break
EMTM 600 2011 Val Tannen
: Data Sharing among Collaborators
Todd Green, Greg Karvounarakis, Nick Taylor, Zachary Ives, Val Tannen
Frequent need to share structured data in a collaborative fashion Goal: import and modify (curate) each other’s data Not all data is reliable and not everyone agrees on what’s right!
ORCHESTRA enables peer-to-peer data sharing that: Preserves local control local control of data, points of viewpoints of view
• Sites can update data• Update exchangeUpdate exchange: propagation to other peers• Local control of what is imported
• Who do we trust? • Whom do we agree to give data to?• Used locally to resolve conflicts
Tracks provenance provenance – where data came from
DB
Queries, edits∆B+/−
∆A+/−∆C+/−
Peer Bob
Peer Alice
Peer Carol
+/−
EMTM 600 2011 Val Tannen
Update exchange directed by schema mappings
Bob
Queries, edits
Alice Carol
Dan
Alice does not trust Dan’s data
Bob does not trust Carol’s data
Carol won’t share data with Bob
Dan won’t share data with Carol
EMTM 600 2011 Val Tannen
Schema mappings
Bob
Alice Carol
Dan
Expressed in query language formalisms
They describe• Alternative uses of data
(union)• Joint uses of data (join)
EMTM 600 2011 Val Tannen
What Is a Schema Mapping?What Is a Schema Mapping?
A constraint specifying that data must exist in one set of tables, given data in another set of tables, e.g.:
RefSeq:Species (taxId, taxName)Taxo:Names (taxId, taxName, 'scientific name')
One can think of it as resembling a queryJust as SQL queries offer benefits over Java code to maintain stored data, mappings offer benefits to data translation
EMTM 600 2011 Val Tannen
What Can Schema Mappings Do?What Can Schema Mappings Do?
Mappings allow us to: RestructureRestructure data – combine or split tables, flatten or add
hierarchy Perform lookups, cross-referencescross-references, or link traversals TranslateTranslate synonyms or IDs Create special markers for unknownunknown data (analogous to SQL’s
NULL, but more powerful)Mappings can be:
ComposedComposed (we don’t need mappings between every pair of sites for them to share data!)
Inverted (if we want bidirectional data sharing) Tested for certain kinds of correctness Automatically expanded to add provenanceprovenance etc.
EMTM 600 2011 Val Tannen
Bob
Queries, edits
Alice Carol
Dan
Provenance is data annotation
So is trust informationAnd so is access control
information
DATA
Annotation: Comes from Alice or Carol; Don’t share with Bob!
EMTM 600 2011 Val Tannen