SQL Unit 21: Object-Oriented Modeling and Design with UML Michael Blaha and James Rumbaugh
description
Transcript of SQL Unit 21: Object-Oriented Modeling and Design with UML Michael Blaha and James Rumbaugh
1
SQL Unit 21: Object-Oriented Modeling and Design with UML
Michael Blaha and James Rumbaugh
Summary of Selections from Chapter 19 prepared by Kirk Scott
2
Chapter 19 Databases
3
• Chapter 19 in the book is specifically on the topic of developing a database to match an object-oriented design
• Not surprisingly, the example pursued in chapter 19 is based on the example of chapter 12
• 19.1 Introduction• There is no need to go over this• It is a review of db concepts
4
19.2 Abbreviated ATM Model
• The book presents an abbreviated ATM model for chapter 19
• Some things are taken out so that the amount of stuff isn’t overwhelming
• A couple of things are added so that specific db related things can be addressed which weren’t present in the original model
• The abbreviated model is shown on the following overhead
5
6
19.3 Implementing Structure--Basic
• What happens next is a discussion of how elements of the OO model are converted into database constructs, like tables
• The book notes that there are software tools out there that will do this for you automatically
• It’s worthwhile knowing how to do it by hand in case you have to and so you know what it is the tools generate
7
• 1. Implement classes = create tables• 2. Implement associations = create tables that
are related by pk, fk pairs• 3. Implement generalizations, namely classes
in superclass-subclass relationships = again, create tables that are related by pk, fk pairs
• 4. Implement identity = make sure you have suitable pk identifiers in tables
8
19.3.1 Classes
• Each class in an O-O design will generally map to a table in a relational design
• Each attribute in a class will map to a column in the table
• Strictly speaking, classes typically don’t contain primary and foreign key attributes
• The details of this will emerge gradually• Since you already know databases, how it’s done
will be no surprise
9
• Constructors and methods in a class have nothing to do with the relational table the class is mapped to
• An example is shown on the following overhead• This pattern will be repeated for following
examples:• A UML class diagram will be shown, followed by a
table schema, followed by the SQL statement for creating the table
10
11
19.3.2 Associations
• Remember that the term associations in UML refers to relationships between classes
• If classes map to tables, then associations will generally map to the inclusion of pk and fk fields
• There is more to it than that• For example, when mapping a many-to-many
relationship in an O-O system, it will be necessary to create the table in the middle
12
• The book lists the different types of O-O associations, with differing multiplicity (cardinality)
• It gives a verbal summary of how they are mapped.
• For some it gives a complete example• For others, it’s limited to a verbal description
13
1. Many-to-Many Associations
• Make a table for each of the base classes• Remember to give those tables primary keys• Make a table in the middle• Embed the primary keys of the base tables in
the table in the middle and make its primary key the concatenation of the embedded foreign keys
• Remember to include any attributes of the association as fields in the table in the middle
14
15
2. One-to-Many Associations
• Make a table for each of the base classes• Remember to give those tables primary keys• Embed the primary key of the one table as a
foreign key in the many table• Sometimes in the O-O design an association
arc may have a name on it• If so, that would be a good choice for the
name of the foreign key field
16
17
3. One-to-One Associations
• The book states that these rarely occur• Recall that in the database discussion, the
question was whether this should be one table or two, and which way to embed
• In this context, the assumption is that there are two classes in the O-O design and there will be two tables in the database design
• The question still remains of which way to embed
18
19
4. N-ary Associations
• The book states that these also rarely occur• At an earlier point the term ternary association was used• N-ary and ternary refer to tables in the middle of star-
like designs• In other words, tables in the middle where there are
more than two base tables being connected• They are treated like tables in the middle, with primary
keys consisting of the concatenation of embedded foreign keys
• The book doesn’t provide a separate example of this
20
Association Classes
• This boils down to the idea that in an O-O design, there may already be in essence a table in the middle
• This book refers to association classes• Design pattern terminology might refer to this as some
kind of mediator class• The bottom line is that if it exists in the O-O design, the
easiest thing to do is turn it into a table in the relational design
• Again, the book doesn’t provide a separate example of this
21
Qualified Associations
• This topic will have an example• It is worth paying attention to because it
should make clear what qualified associations really are
• Remember that from the perspective of CS 202 and CS 204, the UML notation and the concept were things that hadn’t come up before
22
• The point is that we need to be aware of this because a qualified association will translate into a relational database in a certain way
• The qualified association under consideration is diagrammed on the following overhead
23
24
• When you consider the diagram, you notice that a 1-1 relationship is shown
• In fact, the relationship between banks and accounts is 1-m
• What the diagram is telling you is that within the context of the Bank class, given an accountCode, you can identify exactly 0 or 1 accounts that match that combination
25
• You could say that there is a 1-1 relationship between (bank + accountCode) and account
• But when you translate into the relational model, you get the two base tables in a 1-m relationship
• The primary key of the one table, bank, is embedded as a foreign key in the many table, account
26
• The qualifier, accountCode, appears in the O-O model in the context of bank
• However, the accountCode is a descriptor of an account
• The accountCode becomes a field in the account table in the relational model
27
• You may recall the term “candidate key”• This referred to a field or set of fields in a table
that could have served as a primary key, but wasn’t chosen as the primary key
• The concatenation of the bankID and the accountCode are a candidate key in the resulting Account table
• The book indicates this with the abbreviation ck• The illustration follows
28
29
Aggregation, Composition
• Aggregation and composition are just special forms of association
• When turning them into relational models, the process is the same as for any other association
30
19.3.3 Generalizations
• The book points out that things work differently depending on whether you have single or multiple inheritance
• Since we’re working with Java, we don’t have multiple inheritance
• We have abstract classes and interfaces though
31
• The point is that there is no such thing as an instance of an abstract class or an interface
• If, in general, classes translate into tables, then instances translate into rows in tables
• A table that cannot contain a row is meaningless
• Therefore, trying to turn abstract classes or interfaces into tables is meaningless
32
• What we are concerned with is concrete classes in an inheritance hierarchy
• The basic rule of thumb still applies:• Turn each class into a table• What glues this all together is that a record in
a subclass table will have the same primary key value as the corresponding record in the superclass table
33
• In object-oriented terms, and object inherits certain instance variable from its superclass
• In relational terms, there is a record in the superclass table and a matching record in the subclass table
• The “inherited values” are the fields that are maintained in the superclass table with the matching primary key value
34
• You may recall that in Watson’s presentation of this, animals and horses and sheep were used as illustrations.
• There was a class for each, and the horse and sheep classes were referred to as subtypes.
• You also saw something like this in the cardealership database
35
• Car and Carsale had the same primary key• Car contained information common to all cars• Carsale was a subtype• It contained information about that category
of cars that had sold• The book’s example is shown on the following
overhead
36
37
• There are a few things to observe about the example• Unlike the cardealership example, where both tables
had a vin field, the different kinds of accounts have different primary key names
• It’s not a bad idea to have differing, descriptive field names
• The important point is that they are all on the same domain, and there is a pk-fk relationship from the superclass to the subclass tables
38
• The book also points out that in this example there was a class, SavingsAccount, that had no instance variables
• It translated into a table that only contained a primary key field
• The book says that it’s still a good idea to keep this table
• If it’s in one design it should be in the other• It’s possible that it will have fields added to it later
39
19.3.4 Identity
• In discussing this issue the book uses these two terms:
• Object identity: This means making up an arbitrary (typically numeric) field as the pk for a table
• Value based identity: This means using some combination (concatenation) of actual data fields as the pk for a table
40
• This issue has come up before in the discussion of database design
• When translating from O-O to relational, the design choice remains
• The book prefers object identity—which is consistent with what we’ve talked about before
41
• When you translate a base class into a base table, you give it an arbitrary pk field
• The pk fields of tables in the middle are then concatenated
• Recall that a table in the middle might have something like a date field that gets added to the pk
• There is no way around that• In that case, a data value belongs in the key• The book illustrates its preference in the following
diagram
42
43
19.3.5 Summary of Basic Rules for RDBMS Implementation
44
19.4 Implementing Structure--Advanced
• This section will cover the following four topics:
• 1. Implementing foreign keys• 2. Implementing check constraints• 3. Implementing indexes• 4. Considering views
45
19.4.1 Foreign Keys
• This section isn’t about creating the foreign keys
• It’s about referential integrity• Depending on the translation from O-O to
relational, there may be specific ways you want to handle ON DELETE, ON UPDATE, and so on
46
• Up to this point, this was the standard default given for how to handle this:
• ON DELETE RESTRICT• ON UPDATE CASCADE
47
• Consider the case of generalizations• This was the translation of superclass and
subclass into table and subtype table• The subtype table contains a fk that refers to
the superclass table
48
• If the parent record is deleted, you would like the child record to be deleted
• This is accomplished by adding the following to the subtype table’s definition:
• ON DELETE CASCADE• The book’s application of this rule to its example
is shown on the following overhead by adding suitable constraints to some of the tables in the design
49
50
• The book points out that in reality, in this situation, if the child is deleted, it would also be desirable to delete the parent record
• Note that referential integrity does not support this
• This would become something that you had to implement separately
51
• The book illustrates two more cases, based on association rather than generalization
• A customer has an address• If you delete the customer, you would like to
delete the address• Alternatively, a customer has accounts• You don’t want to be able to delete any
customer that has accounts
52
• The book’s illustration of adding these constraints to the tables is shown on the following overhead
• Note that they are using a system with different syntax
• The default is apparently “ON DELETE RESTRICT”
53
54
19.4.2 Check Constraints
• SQL has another kind of constraint which wasn’t covered in the first half of the course
• It is a way of enforcing data integrity• If you looked through the GUI for table
creation in MS Access, you would find similar capabilities
• The idea is that you can specify the set of values valid for a given field
55
• The book illustrates how this can be useful when translating generalizations (inheritance)
• In their translation, the superclass table has a field where the type of the matching subclass record is indicated
• Types could only be those of the given subclasses• Adding such a constraint is illustrated on the
following overhead
56
57
19.4.3 Indexes
• This a very short section with nothing new in it• If you’re doing the translation, it’s up to you to
create the indexes• If you’re relying on software, it’s still up to you
to make sure that you’ve got all the indexes you need
58
• The book repeats that you get pk indexes by default on all tables
• It also reiterates that at the very least you will want indexes on all fk fields
• Others may also be desirable• The book’s illustration is given on the
following overhead
59
60
19.4.4 Views
• You can define a view for each subclass in a hierarchy
• The idea is that by doing a join query between the superclass and subclass tables, you can bring together both local and inherited instance variables
• The book’s illustration is shown on the following overhead
61
62
19.4.5 Summary of Advanced Rules for RDBMS Implementation
63
19.5 Implementing Structure for the ATM Example
• The book gives tables schemas and complete SQL for creating the database corresponding to the O-O ATM design
• The UML for the O-O design is repeated on the following overhead
• The table schemas are given on the overhead following the next one
• The SQL is also given for the sake of completeness, but I’m not going to read through it
64
65
66
67
68
69
19.6 Implementing Functionality
• Databases are all about structuring and storing data
• Software is about functionality• There are general areas that can be identified
where there are questions about matching up a software system with a database
70
• 1. Coupling a programming language to a database
• 2. Converting data• 3. Encapsulation vs. query optimization• 4. Use of SQL code
71
19.6.1 Coupling a Programming Language to a Database
• SQL is declarative• Programming languages are procedural• This means that there has to be some sort of
crossover technique for merging the two• The book identifies 8 possible ways of going
about this
72
1. Proprocessor and Postprocessor
• The idea is to work with temporary tables/files• For example, write a query that generates results.• Save them• Write a program that processes the result file• Conversely, write a program that generates file
output• Then use database tools to apply that to the db• This is clumsy and limited, although possibly useful
in some settings
73
2. Script Files
• A database management system may support saving sequences of SQL commands in a single executable file
• This isn’t really programming, but it is an expansion of one at a time SQL commands
• This is a simple approach which may sometimes be sufficient
74
3. Embedded DBMS Commands
• In other words, embedded SQL• Programs with embedded SQL are not necessarily
easy to write or maintain• The classic illustration of the mismatch of paradigms
is having to loop in order to acquire query results• You will be familiar with this from your project• Embedded SQL is a common approach• The book suggests that it’s not necessarily the best
approach
75
4. Custom Application Programming Interface (API)
• In effect, this is built on top of embedded SQL, but it provides a better alternative
• Instead of embedding SQL directly in user programs, add classes/methods which have the embedded SQL in them and embody the needed functionality
• Then user programs can be built on those constructs• ODBC and JDBC are examples of this• In a given environment, a programmer might also
develop reusable components like this
76
5. Stored Procedures
• This came up briefly in the Watson presentation on SQL
• Implementations of stored procedures can vary widely
• Roughly, the range goes from scripts to database management systems that effectively have some sort of programming language built in
• The developer can write and save dbms code on the dbms side rather than in an external program
77
6. Fourth-Generation Language (4GL)
• This is a term that refers to a GUI environment for program development
• MS Access, for example, has a visual environment for putting together reports and forms, where the data that populates them is ultimately retrieved by queries under the covers
• The book says this is good for simple applications and prototyping
• It doesn’t have the same power as a programming environment
78
7. Generic Layer
• This is a simplified interface to a database for a programming language
• It is apparently somewhat like a simplified interface for embedding commands
• Any simplification involves a trade-off• It may be easier to use• But it will limit access to functionality
79
8. Medata-Driven System
• This is an advanced topic• Applications may be structured to query the
data dictionary (SYSTABLE, etc.) and then query the database
• The book gives as an example applications that learn
• In other words, data mining, etc. might be implemented using techniques like these
80
Data Interaction Techniques
81
19.6.2 Data Conversion
• Data conversion is a practical concern• It has not been touched on before, but it is of
interest whenever you are converting data from one form or system to another, regardless of whether an O-O model is involved
• This can involve transfer of data between current systems and transfer from an old system to a new one
82
• 1. Cleansing data = correcting data integrity problems
• 2. Handling missing data• 3. Moving data = figuring out exactly how to
export/import from one format to another• 4. Merging data—word to the wise: Figure out a
combined data model first; then take care of the technical details of how to combine data from different sources/formats
83
• 5. Changing data structure• From the db design point of view, this is the
most interesting point• Different data sources may contain similar
information• However, field names and types may differ,
and more importantly, designs may differ
84
• For example, one application may handle addresses using the LineItem model
• Another may have used a different model• You need an overall model to convert both to,
and then you have the problem of doing the conversion and merging
85
• The book suggests an approach based on what are called staging tables
• The idea is to convert raw source information into relational tables
• At that point you have the full power of SQL to manipulate the contents before arriving at the final, converted data set
• This is a very good idea
86
19.6.3 Encapsulation vs. Query Optimization
• This topic is related to how you process your data
• The basic observation is this:• In SQL, you can easily write a join query across
many tables• A single query is allowed to access any field of
any table
87
• In a corresponding O-O implementation, the tables are classes which may have references to each other
• To process data belonging to three different tables might involve a call x.getY().getZ()
• Encapsulation says that x shouldn’t have direct access to z.
88
• The obvious problem is that calls of the form x.getX().getY() are complex and only get worse if more tables are involved
• If you’ve had CS 304, you will recognize that such calls are not just complex
• They are bad in the sense that they will tend to violate the Law of Demeter
• In other words, you have to break encapsulation to accomplish your goals
89
19.6.4 Use of SQL Code
• There is a range of implementation choices • Write a pure O-O front end • This will preserve encapsulation in the code• You will have the full power of a high level
language to implement complex logic• Considered from the point of view of querying
and manipulating the db back end, performance will not be good
90
• Write a front end that essentially is a framework for executing SQL queries
• Code will not be highly O-O but performance will improve
• I am prejudiced in favor of the second option, but complex applications may require the first approach
• The book illustrates this with a query for a monthly statement of ATM transactions, as opposed to an O-O method for generating those results
91
92
Object-Oriented Databases
• Object-Oriented databases can be implemented in many different ways
• Fundamentally they are based on these concepts:• Objects are persistent (they are what is stored in
the db)• Has-a relationships are captured by references• There is a tree-like relationship among types of
objects (due to inheritance)
93
• The book identifies two basic reasons for opting for an O-O database
• 1. An O-O programmer doesn’t fully understand the relational model and wants a database back end that reflects a known programming paradigm
• This is not a sound reason• Relational databases are the gold standard and
it’s necessary to adapt to them
94
• 2. The O-O database is more suitable to the problem domain or offers special features which the relational model doesn’t offer
• If you recall, the parts, sub-parts, assembly example pushed the limits of the relational model
• In some engineering or manufacturing environments, especially, an O-O database might be useful
• This is a valid reason
95
19.8 Practical Tips
• This section of the book is just a compressed summary of the foregoing points
• One claim is worth examining:• “Normal forms apply regardless of the
development approach. However, it is unnecessary to check them if you build a sound OO model.”
96
• This is reminiscent of Watson’s claim that you don’t need the normal forms if you build a sound E-R model.
• It’s basically a tautology.• It’s true that you don’t need to check the
normal forms if by chance you have created a model that doesn’t violate them.
97
• However, at the very least, this seems to be a corollary truth:
• You will only build a sound model if you have internalized the normal forms, whether you learned them formally or not
• In any case, it is worthwhile to know the normal forms and to be able to apply them when checking a model for correctness
98
19.9 Chapter Summary
99
The End