CSE3180 Summer 2005 Lect 3 / 1 Lecture 3 This lecture will introduce More aspects of database design...
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
2
Transcript of CSE3180 Summer 2005 Lect 3 / 1 Lecture 3 This lecture will introduce More aspects of database design...
CSE3180 Summer 2005 Lect 3 / 1
Lecture 3Lecture 3
This lecture will introduce
• More aspects of database design
• Modelling and transformations
• Normalisation
• Some examples of modelling
• And an excursion into the land of Non Structured Data
CSE3180 Summer 2005 Lect 3 / 2
Data Structure DiagramsData Structure Diagrams
Requirements Analysis
Conceptual Design
Implementation Design
Physical Design
CSE3180 Summer 2005 Lect 3 / 3
Conceptual DesignConceptual Design
- Create an enterprise (total) model
- Normalise user views
- Integrate user views
- Create a database
- Review
CSE3180 Summer 2005 Lect 3 / 4
Modelling User RequirementsModelling User Requirements
DATA
Entity Relationship Modelling (Semantic)
Normalisation (Minimality) - meaning the minimum number of attributes to define the Primary Key
PROCESS
Data Flow Analysis
Procedure Formation
Functional Analysis
CSE3180 Summer 2005 Lect 3 / 5
Well Structured DataWell Structured Data
Advantages
• Better integration of applications through shared access to common data.
· More adaptable systems because files and databases will follow the natural structures inherent in data rather than application oriented relationships.
· Processing should be simpler if data is held in its simplest least redundant form.
CSE3180 Summer 2005 Lect 3 / 6
Data Focussed DevelopmentData Focussed Development
1. Develop broad data model and broad function model.
2. Partition data model by function.
3. Develop detailed data model by each functional partition.
4. Develop the detailed procedure models by functional partition.
CSE3180 Summer 2005 Lect 3 / 7
Primary Key SelectionPrimary Key Selection
A data item or element that describes one entity type.
STUDENT RESULT
StudentNo Unit Code Year Semester Grade
S1 CSE4230 2003 1 N
S1 CSE4230 2003 2 HD
PRIMARY KEY or IDENTIFIER
· Some or all of the attributes describing an entity type serve to uniquely identify each entity instance.
CSE3180 Summer 2005 Lect 3 / 8
Value Sets or DomainsValue Sets or Domains
Each attribute type supports one entity type and is often based on an underlying domain or value set.
EMPLOYEE
EmpNo Emp_Gender Emp_Name Emp_Hire_Date
E1 Female Jones 20/6/1989
DEPENDENT
Empno Family_Name First_Name Gender Birthdate
E1 Jones Amelia Female 23/6/1986
CSE3180 Summer 2005 Lect 3 / 9
Synonyms or AliasesSynonyms or Aliases
· Be careful to differentiate between things and their names.
ITEM = PART = PRODUCT ???
· SYNONYM or ALIAS
The same object called different names by different people.
· This problem concerns entity types, relationship types and attribute types.
CSE3180 Summer 2005 Lect 3 / 10
HomonymsHomonyms
Different things called the same name by different people ??
EMPLOYEE START DATE
· In the Personnel area this may mean the date an employee started with the company.
· In a given department this may mean the date an employee started with the department.
· This problem also concerns entity types, relationship types and attribute types.
CSE3180 Summer 2005 Lect 3 / 11
Design and Data StructuresDesign and Data Structures
Data structures are the bricks and mortar that hold databases together.
Data structures (for the ANSI/SPARC standard) are defined in the internal model level and implemented in the physical data organisation.
Data structures are often hidden from the application programmer, since they are primarily used by the DBMS and Operating System.
Understanding data structures is important for performance reasons, to improve program design and allow easier communication with DBMS specialists.
CSE3180 Summer 2005 Lect 3 / 12
Goals of Relational DesignGoals of Relational Design
What Relations should exist and what Attributes shouldthey contain?
Avoid Redundancy if possible- minimise storage space
Avoid Anomalies
Avoid Nulls (but be careful)
Avoid Joins which produce spurious rows
CSE3180 Summer 2005 Lect 3 / 13
Some AssumptionsSome Assumptions
A group of attributes has a natural “inherent” structure
This structure is independent of the way the data is used
Normalisation (meaning : to be able to access complete, clear and correct data - to and from tables)
• introduced by E.F. Codd together with relational database theory
• originally Codd defined three normal forms
• later expanded to Boyce-Codd and fourth and fifth normal forms
CSE3180 Summer 2005 Lect 3 / 14
A Sample Data StructureA Sample Data Structure
PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP
PRIME _MINISTER
MARRIAGE
PM_NAME SPOUSE_NAME MAR_YR PM_AGE NR_CHILDREN
MINISTRY
MIN_NR PM_NAME PARTY DAY_COMM MTH_COMM YR_COMM
CSE3180 Summer 2005 Lect 3 / 15
NormalisationNormalisation
Formal measure of why one grouping of attributesmay be better than another
Each Normal Form requires that a Relation satisfiescriteria for that form and this eliminates a different kind of redundancy
Normalised Relations will remain consistent followingdatabase operations and will store each fact only once
Database operations applied to unnormalised relations may lead to anomalies
PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP
CSE3180 Summer 2005 Lect 3 / 16
Some AnomaliesSome Anomalies
Person-Id Project_budget Project Time Spent on Project
S75 32 P1 7
S75 40 P2 8
S79 32 P1 4
S79 27 P3 1
S80 40 P2 5
- 17 P4 -
Null Values are considered to be anomalies
Relation : ASSIGN
CSE3180 Summer 2005 Lect 3 / 17
AnomaliesAnomalies
Insertion Anomalyadd row ASSIGN values( ‘S85’,35,’P1’,9)- two conflicting budgets for P1 S75 32 P1 7
Deletion Anomaly S79 32 P1 4
delete row ASSIGN (‘S79’,27,’P3’,11)- removes project budget for P3 S79 27 P3 1
Person-Id Project_budget Project Time Spent on Project
S75 32 P1 7
S75 40 P2 8
S79 32 P1 4
S79 27 P3 1
S80 40 P2 5
- 17 P4 -
CSE3180 Summer 2005 Lect 3 / 18
Functional DependenciesFunctional Dependencies
- the values of one set of attributes affect the valuesof another attributes
X Y
The value of X determines the value of Y
The value of Y depends on the value of X
The simplest case is 1 attribute determines another singleattribute
AND the Primary Key value determines the values of the non key attributes in each row.
CSE3180 Summer 2005 Lect 3 / 19
Functional DependenciesFunctional Dependencies
Project Project Budget
Person-IdProject
Time Spent on Project
Functional Dependency Diagram
Project Project Budget
CSE3180 Summer 2005 Lect 3 / 20
Functional DependenciesFunctional Dependencies
• Two attributes are FUNCTIONALLY DEPENDENT if a value for ONE attribute specifies a SINGLE value for the other attribute
Grade Rate of Pay
Driver 265.00
Crane Driver 310.95
Keyboard Operator 215.55
Systems Analyst 450.00
The Rate of Pay is functionally dependent on Grade
or, Grade determines Rate of Pay
or, grade----> Rate of Pay. Grade is a DETERMINANT
CSE3180 Summer 2005 Lect 3 / 21
Functional DependenciesFunctional Dependencies
Some Terms: Primary Key Candidate Key
Foreign Key Concatenated Key
Non-Ambiguous Keys: An attribute (or attributes) of a table is the Primary Key for that table if
1. All attributes in the table are functionally dependent on the Primary Key
AND
2. Where the Primary Key is a collection of attributes (concatenated key), no subcollection of the attributes used also exhibits the functional dependence property
CSE3180 Summer 2005 Lect 3 / 22
Functional DependenciesFunctional Dependencies
Translation:
1. The primary key will identify a given row (unique)
2. No portion of the primary key can be an independent primary key
PM_NAME BIRTH_YR YRS_SERVED DEATH_AGE STATE_BORN STATE_REP
CSE3180 Summer 2005 Lect 3 / 23
Minimal Numbers of FD’sMinimal Numbers of FD’s
Employee Department
Department Location
Employee Location
Employee determines Department
Department determines Location
Employee determines Location This FD is redundant.It is known as a ‘Transitive Dependency’ and would be excluded from the design model. (E-R diagram)
Or, rationalise (minimise) the number of tables in the physical database
CSE3180 Summer 2005 Lect 3 / 24
NormalisationNormalisation
1st Normal Form
Repeating groups must not occur
Unit Student No. Result Name
CSE3180 9142717 C Wilson
9131618 D Renoir
9077184 P Gilbey
CSE2138 8967384 N Breton
8737980 P Balzac
CSE9020 9142717 P Wilson
6932475 HD Gilbey
1st Normal Form
CSE3180 Summer 2005 Lect 3 / 25
NormalisationNormalisation
Unit StudentNo Result Name
CSE3180 9142717 C Wilson
CSE3180 9131618 D Renoir
CSE3180 9077184 P Gilbey
CSE2138 8967384 N Breton
CSE2138 8737980 P Balzac
CSE9020 9142717 P Wilson
CSE9020 6932475 HD Gilbey
Corrected Table
Formally expressed as:Results(unit code, studentno,result,name
The stored data mustbe complete
CSE3180 Summer 2005 Lect 3 / 26
NormalisationNormalisation
2nd Normal Form
Designed to eliminate Update Anomalies (4 categories)
Examples
1. Update Where a Description alteration requires
total file scan and alterations made
2. Inconsistent Data Possibility of a code having more than
one description
3. Additions Condition where a row cannot be fully identified e.g. subject code only
4. Deletions If subject Code CSE9020 is deleted, location of associated data is not possible
CSE3180 Summer 2005 Lect 3 / 27
NormalisationNormalisation
3rd Normal Form
Final Stage ---> 3rd Normal Form
Watch for Determinants
Primary Keys are Determinants
Candidate Keys are Determinants
A table is in 3rd Normal Form (3NF) if
(i) It is in 2nd Normal Form and
(ii) Determinants are Candidate Keys
CSE3180 Summer 2005 Lect 3 / 28
NormalisationNormalisation
As a check, review
1. Update
2. Inconsistent Data
3. Additions
4. Deletions
for Conditions and Difficulties of access (and retrieval)
CSE3180 Summer 2005 Lect 3 / 29
Normalisation ReviewNormalisation Review
A relation is in 3NF if, and only if, the NONKEY ATTRIBUTES are
(1) Mutually Independent
(2) Fully Dependent on the Total Primary Key
Or: A relation is in 3NF if, and only if, for the life of the database, each row consists of a Primary Key Value which identifies some Entity instance, together with a set of Zero or more mutually independent attribute values which describe the Entity in some way. (Date, C.J)
Or: Each field must represent a fact about the key, the Whole Key and Nothing but the Key. (Kent, W)
CSE3180 Summer 2005 Lect 3 / 30
Normalisation - ExamplesNormalisation - Examples
1. 1NF or First Normal Form Rule: Each Row MUST CONTAIN the same number of columns
Example: Course Instructor table
class code lecturer tutor tutor tutor
C3576 Doe,J Jones,R Smith,V Ng,K K567 Nguyen,L B6745 Fabbri,M Ong,W R3289 Pratt,W Archer,V Barrat,N
The number of columns (attributes) is not consistentCreate a table of TUTORS with Class Codes
CSE3180 Summer 2005 Lect 3 / 31
Normalisation - ExamplesNormalisation - Examples
2NF: 2 criteria (1) The table must be in 1NF
(2) every attribute which is NOT part of the Primary key must be functionally dependent on the Primary Key
Building Room Seats No. of Levels
A 214 85 4
A 242A 25 4
B 213 135 6
The number of levels is dependent on Building, not Building + Room.
Create another table - ‘building table’
CSE3180 Summer 2005 Lect 3 / 32
SolutionSolution
1 table : Accommodation(building, room, seats)
Another table : Building(building, levels)
CSE3180 Summer 2005 Lect 3 / 33
Normalisation - ExamplesNormalisation - Examples
3NF : A table is in 3rd Normal Form
if (1) the table is in 2NF
and (2) every attribute which is NOT part of the Primary Key is functionally dependent ONLY on the key - meaning it is not dependent on any other non-key attribute
CSE3180 Summer 2005 Lect 3 / 34
Normalisation - ExamplesNormalisation - Examples
This table is NOT in 3NF:
ClassCodeLecturer Lecturer’s Office Phone
C3567 Doe,J 101 Bldg A 32412
K4567 Fabbri,M 312 Bldg B 31523
B7645 Nguyen,L 209 Bldg F 31518
R3289 Pratt,W 406 Bldg E 32581
ClassCode Lecturer Lecturer Lecturer’s Office Phone
C3567 Doe,J Doe,J 101 Bldg A 32412
K4567 Fabbri,M Fabbri,M 312 Bldg B 31523
B7645 Nguyen,L Nguyen,L 209 Bldg F 31518
R3289 Pratt,W Pratt,W 406 Bldg E 32581
CSE3180 Summer 2005 Lect 3 / 35
Normalisation Normalisation
Some Thoughts:• Databases should be constructed so that data occurs
only ONCE.• Redundant or repetitive data may produce
inconsistencies in the database during update
e.g. a person’s address is in 1 table and row only, and other rows in other tables have keys referring to it.
(known as referenced and referencing keys/tables)
• A quick design test - ‘If a row is deleted, will valuable data
be lost ?’
If YES, remove the valuable information and place it in another table
CSE3180 Summer 2005 Lect 3 / 36
NormalisationDiagrammatic Functional Dependencies
NormalisationDiagrammatic Functional Dependencies
suppliernumber
suppliername
status
city
partnumber
partname
color
mass
city
city assets
p.k
p.k
Notice that theseattributes are dependencies of theprimary key
This attributeIS NOT
CSE3180 Summer 2005 Lect 3 / 37
Other Normalisation FormsOther Normalisation Forms
Other forms of Normalisation: 4th Normal Form 4NF
A row should NOT contain two or more independent multivalued facts about an entity.
Facts are independent of each other when there is no direct connection between them. Generally occurs in a many to many relationship
Example: A book publishing company can have many authors producing many books
The books would (normally) be sold from many booksellers stores
CSE3180 Summer 2005 Lect 3 / 38
Other Normalisation FormsOther Normalisation Forms
5th Normal Form
A table is in 5th Normal Form when its data can no longer be decomposed into smaller tables without each row having the same Primary Key
CSE3180 Summer 2005 Lect 3 / 39
MultiValued Dependencies MVD’sMultiValued Dependencies MVD’s
Functional Dependencies cannot express all cases of associations between attributes
Consider PERSON and SKILL(s)
A PERSON may have one or many SKILLS
A SKILL may be owned by many PERSONS
PERSON --- > SKILL SKILL ----> PERSON
Neither of these expressions cover the above statements
CSE3180 Summer 2005 Lect 3 / 40
MultiValued Dependencies MVD’sMultiValued Dependencies MVD’s
An MVD defines that an ATTRIBUTE is always associated with a given set of values of another ATTRIBUTE
PERSON --->> SKILL
Interpretation:
a single value of a set of attributes determines a set of values of other attributes
AND this set of values is the same, independent of any other dependencies of the first named
attributes
PERSON ---->> skill_id, date_acquired
CSE3180 Summer 2005 Lect 3 / 41
Advanced Examples 4NF, 5NFAdvanced Examples 4NF, 5NF
• SALES_AREAS
Sales_Rep Customer Product_Class
Joe BH Store Cosmetics
Joe BH Store Haberdashery
Helen MaxiStoresCosmetics
Helen MaxiStoresClothing
Helen MaxiStoresShoes
Joe 8-Up Cosmetics
Joe 8-Up Haberdashery
Helen BH Store Cosmetics
Helen BH Store Clothing
Helen BH Store Shoes
CSE3180 Summer 2005 Lect 3 / 42
Advanced Examples 4NF, 5NFAdvanced Examples 4NF, 5NF
In this table, there are MVD’s
Sales_Rep --->> Customer
Sales_Rep --->> Product_Class
FOURTH NORMAL FORM:
A relation is in 4NF if, for any multi-values dependency
X --->> Y, there are no attributes other than those in X and Y
In the table , Sales_Rep --->> Customer, BUT
also --->> Product_Class
Solution: Decompose SALES_AREAS into:Sales_Rep(Sales_rep, Product_Class)
Areas(Sales_Rep, Customer)
CSE3180 Summer 2005 Lect 3 / 43
Advanced Examples 4NF, 5NFAdvanced Examples 4NF, 5NF
FIFTH NORMAL FORM:
Restrictions on Dependencies (also illustrates a Business Rule)
BH Store does not sell Haberdashery
MaxiStores do not sell Cosmetics
Thus the SALES_AREA cannot be decomposed as shown. It can be decomposed into
Sales_Rep(Sales_Rep, Product_Class)
Areas( Sales_Rep, Customer)
Sells(Customer,Product_Sold)
CSE3180 Summer 2005 Lect 3 / 44
Advanced Examples 4NF, 5NFAdvanced Examples 4NF, 5NF
IF not all Sales_Reps of a given Product_Class can sell that Product_Class to a store, then the resultant table, based on the original table, is
Sales_Rep Customer Product_Class
Joe BH Store Cosmetics
Helen MaxiStoresClothing
Helen MaxiStoresShoes
Joe 8-Up Cosmetics
Joe 8-Up Haberdashery
Helen BH Store Clothing
Helen BH Store Shoes
This relation cannot be decomposed and is in 5NF
CSE3180 Summer 2005 Lect 3 / 45
In Practice ---In Practice ---
1.Oracle insists that modelling to 3rd normal form is adequate for all commercial databases
2. Since the emergence of the 4th and 5th forms, database software (in the case of Oracle this is PL/SQL) is able to provide the processing necessary to accommodate special or extended control over data in database tables
3. Part of the development of a database is focussed on the presence and understanding of ‘Business Rules’
4. There will be some examples of Business Rules in a later lecture
5. You will need to develop Rules for your database as part of its development and control
CSE3180 Summer 2005 Lect 3 / 46
Stages in Data Modelling - 1Stages in Data Modelling - 1
1. Analyse User Requirements - User Views
2. Aggregate Functions - Highlight Synonyms
3. Prepare first cut of Entities
4. Determine Cardinality of Entities– Develop E-R Diagrams
5. Determine Relationships 1:1, 1:M, M:N
6. Introduce Attributes - Data Structures (use Natural Mapping)
7. Analyse Candidate Keys for Primary, Foreign and Access key functions
8. Determine success of relationships
9. Review (if necessary) the E-R Diagram
CSE3180 Summer 2005 Lect 3 / 47
Stages in Data Modelling - 2Stages in Data Modelling - 2
• Select and Test the Primary and Foreign Keys for each table
• Select Candidate keys for access
• Normalise
• Test Normalisation
CSE3180 Summer 2005 Lect 3 / 48
Stages in Data Modelling - 3Stages in Data Modelling - 3
• Review E-R Diagram and Data Structure Diagrams if necessary
• Examine Data Structures/E-R Diagrams against original views, requirements e.g. report contents
• Obtain confirmation and acceptance from User(s)
• Move to Physical Design Considerations
and, don’t forget the documentation !
CSE3180 Summer 2005 Lect 3 / 49
Data Modelling - Some GuidelinesData Modelling - Some Guidelines
• Do not prematurely combine entities into tables
• Concentrate on Access Mechanisms which can be shared among requests
• Deviate from the model in a responsible manner (i.e. get user acceptance, enthusiasm)
• Use table/view and attribute names which closely reflect the data model names (Natural Mapping)
• Do not define multiple attributes as one composite attribute in a table
CSE3180 Summer 2005 Lect 3 / 50
Database Design GuidelinesDatabase Design Guidelines
• Watch data types for each attribute, especially complex keys
• Develop an ‘architecture’ to support ‘Business Rules’ for procedures, integrity and access
• Look for developing technology trends
• Determine the reason/purpose of the proposed database
CSE3180 Summer 2005 Lect 3 / 51
Database Design GuidelinesDatabase Design Guidelines
• Determine the entities supported by the database - what are the ‘main’ collections of data --> entities
• Determine the attributes of each entity
• Determine the Entity relationships
• Determine the Entity-Relationship types 1:1, 1:M, M:N
CSE3180 Summer 2005 Lect 3 / 52
Database Design GuidelinesDatabase Design Guidelines
Carefully assess the Constraints :– Domain value sets– Data types– Calculations– Unique keys– Primary Keys– Nulls (or even better, Not Nulls)– ‘Special Processing’ requirements– In a distributed environment, are the constraints ‘local’
or ‘global’
These are examples of the existence of Business Rules
CSE3180 Summer 2005 Lect 3 / 53
Database Design GuidelinesDatabase Design Guidelines
• Determine how the relationships can be related - Primary Key/Foreign Key. Are additional attributes needed ?
• Re-assess your design - Logical, Can the objectives of (1) be achieved ?
If no, what addition/reduction of Entities or Attributes is necessary ?
• Modify your design as required, and re-evaluate
(it is an ITERATIVE Process)
CSE3180 Summer 2005 Lect 3 / 54
Some Common Design ProblemsSome Common Design Problems
• Not all attributes in a table relate to the same entity or subject. e.g. mixing passengers and aircraft details in the same table
Hint: Re-examine the Primary Key and determinant aspects
• Blank attributes in tables (no values)
Generally means they belong in another table
• Tables contains the same data-items
Can indicate that aggregation or consolidation might be necessary
CSE3180 Summer 2005 Lect 3 / 55
Some Common Design ProblemsSome Common Design Problems
• Continuous Design alterations
01. Normal during the logical design stage
Some suggestions:
Make prototypes, use sample data, determine weaknesses, oversights
Consult with the user. Get approval / acceptance
Modify, re-design
02. Unworkable at physical design stage
A clear indication of a rushed or poorly developed logical design. May be more appropriate to start again (and may also be more cost effective)
CSE3180 Summer 2005 Lect 3 / 56
And those Key Terms ?And those Key Terms ?
• A candidate key is an attribute or set of attributes whose values uniquely identify individual rows of the entity set
• One candidate key is selected to act as the unique identifier for instances of the entity. This candidate key is called the primary key. The remaining candidate keys are known as alternate keys
• A composite key is a candidate key which consists of more than one attribute
• A foreign key is a column, or set of columns, which links each referencing row in the table containing the foreign key to the row in the referenced table containing the matching candidate key value
CSE3180 Summer 2005 Lect 3 / 57
So far , we have dealt with ‘structured data’ -
All data in each table has the same form andformat
We are now going to look at some aspects ofNon Structured Data
CSE3180 Summer 2005 Lect 3 / 58
Data ManagementData Management
Now we will look at:-
the imbalance between Information sourced from structured data storage systems, mainly databases
and Information which is necessary for informed management decision making processes based on non structured data
CSE3180 Summer 2005 Lect 3 / 59
Non Structured DataNon Structured Data
Up to date, we have studied some of the methods of file organisation associated with efficient data management and retrieval as embodied in the relational data base model.
However, although industry has invested large amounts of money in relational database applications, the greater percentage of data access requirements are made of non-structured data sources (a classic one being the World Wide Web).
CSE3180 Summer 2005 Lect 3 / 60
Non-Structured InformationNon-Structured Information
• Much effort and expense has been directed at the design, capture, processing, storage and retrieval of data in a structured form.
• There are a number of database management systems (DBMS) which can store many trillions of bytes in centralised, distributed and client/server based systems
• There are extensive backup, recovery and restart procedures in place to ensure persistence and continuity of data
CSE3180 Summer 2005 Lect 3 / 61
Non-Structured InformationNon-Structured Information
The ratio of unstructured to structured information in many organisations is approximately 8 to 1
It is easy to conclude that the the most important component which drives much of the decision making in key business processes is badly neglected.
Familiar examples are the World Wide Web,
Corporate intranets
on line discussion groups
CSE3180 Summer 2005 Lect 3 / 62
Non-Structured InformationNon-Structured Information
Why is there this bias ?
Perhaps because structured information management has been synonymous with information systems design ?
The technology for unstructured data management is powerful, pervasive and well understood
CSE3180 Summer 2005 Lect 3 / 63
Non-Structured InformationNon-Structured Information
Businesses are becoming more interconnected
Each new connection is made by, and relies on, exchange of Information
One aspect of these ‘connections’ is the unpredictability of the nodes and the nodal content
This impacts on the structured information model
CSE3180 Summer 2005 Lect 3 / 64
Non-Structured InformationNon-Structured Information
The term ‘unstructured’ was invariably associated with ‘documents’.
They are a medium which we understand and use
There are other forms of unstructured data
audio
voice
images
graphical objects
These are forms of electronic documents
CSE3180 Summer 2005 Lect 3 / 65
Non-Structured InformationNon-Structured Information
These forms are ‘unstructured’ because their exact content and organisation are unpredictable
Unstructured information is any information type made up of content which does not fit a predefined, descriptive model or arrangement
It would be possible to impose (or superimpose) a structure on a document to make document selection possible, but there would be a cost
CSE3180 Summer 2005 Lect 3 / 66
Non-Structured InformationNon-Structured Information
A document may need to be tracked using
author name
title
filing date
a short abstract of the content (sound familiar ?)
The effect of this is that the ‘content’ of the document can now be accessed only through these 4 ‘keys’
Question : How could a lengthy research task be categorised by value or content using this technique ?
CSE3180 Summer 2005 Lect 3 / 67
Non-Structured InformationNon-Structured Information
That is another way of saying
the who
the when
the where
the why
and the how - realistic in today’s environment ?
So what is different with today’s decisions ?
The difference is the nature and kind of decisions made
CSE3180 Summer 2005 Lect 3 / 68
Non-Structured InformationNon-Structured Information
Today’s business is driven by increasing rates of change
We have shifted from an industrial economy to a knowledge driven economy
The need more information to support the decision-making processes, and the dynamic nature of the business environment means that the support from structured information systems is starting to be inadequate
However there is a possibility of ‘information overload’ - one of the accepted criteria of ‘knowledge work’ is a competency to manage the increasing amounts of information
CSE3180 Summer 2005 Lect 3 / 69
Non-Structured InformationNon-Structured Information
In the current environment, the ‘products’ are now our ideas
Ideas are driven by information
The term ‘Globalisation’ is appropriate
We think and we change or modify our plans - and this has led to the loss of predictability
The the ‘knowledge environment’ business success depends on the ability of knowledge workers to sift through all of the available unstructured resources and to make decisions - and faster than the competitors
The measurement of success is in degrees of innovation
CSE3180 Summer 2005 Lect 3 / 70
Non-Structured InformationNon-Structured Information
So, what are the ‘sources of information’ ?
Corporate document bases
The Internet
The Extranets
Information subscription services
Dialog with Customers, suppliers, competitors
CSE3180 Summer 2005 Lect 3 / 71
Non-Structured InformationNon-Structured Information
Are there any problems with this ?
In practice there are 2, and they are -
The 2 major problems of decision making are
1. Volume of information
2. The speed at which decisions need to be made
CSE3180 Summer 2005 Lect 3 / 72
Non-Structured InformationNon-Structured Information
Information retrievals have moved away from the filtering of unstructured data into a structured environment.
The emerging models accommodate the capture of resources (and access) which leads to a dynamic and unfiltered information repository which consists of joined but separated sources.
Web sites are sought and searched via the Internet, and possibly a corporate document repository is included in the search.
CSE3180 Summer 2005 Lect 3 / 73
Non-Structured InformationNon-Structured Information
Some of the information found will be transferred to a more structured repository - possibly a competitive analysis database.
A search in this way provides a set of search results but does not change the information sources.
The Importance of Tools:
Users don’t normally have the time nor the skills to ‘get on top of a variety and changing set of tools’ (what is your experience with the various search engines you have used on the Internet ?)
CSE3180 Summer 2005 Lect 3 / 74
Non-Structured InformationNon-Structured Information
What is needed is a retrieval tool (hardware and software) which understands how to work with different repositories
This leads to the ‘repository management system’ being able to recognise the form and requirements of different search tools or, if you like, software which recognises and communicates exactly and completely with other software
- this is ‘intelligent’ software
Another aspect is the nature of a search - this has changed from simple words or phrase retrievals to ‘context’ and dynamic analysis and categorisation
CSE3180 Summer 2005 Lect 3 / 75
A Non Structured Data QueryA Non Structured Data Query
Which of these 2 queries is likely to reveal real and intelligent in-depth information ?
1. What did our customers order last month
or
2. What is the correlation between discounts we’ve offered our best customers and predict how they will buy if we double their discounts (or perhaps restrict or cancel their customer discount cards)
CSE3180 Summer 2005 Lect 3 / 76
Non-Structured InformationNon-Structured Information
Fortunately, existing systems support many different file types in the base application - images, word-processing systems, desktop publishing systems, spreadsheets, CAD files ….
And just as fortunately, there are languages such as XML which can handle this variety of data forms and content
2 methods of access
- user defined indexes in a full functioned database (3rd party relational databases)
- a fully integrated full-text query engine
CSE3180 Summer 2005 Lect 3 / 77
What about ???What about ???
• Security
There are highly sophisticated security schemes (also mentioned in Portals)
• Revision Tracking and Control
Essential for accurate and up to date information (what revision number of MS-Word are you using ?)
• Document Check-in/Check-out
Recognition and tagging of all documents - none can be ignored or ‘go missing’
• Usage audit trails
These features are standard features
CSE3180 Summer 2005 Lect 3 / 78
Non-Structured InformationNon-Structured Information
Compound document managers
These are products which provide the facilities mentioned in the previous overheads, but they have an interesting extension
They treat documents as a collection of ‘pointers’ to various and different collections of information - this ensures that ‘new’ data or information will always be part of a search. (you recall the hypertext links - and have you used the links which are attached to my Web page ?)
Other features which ensure document integrity include roll-back, recovery, and audit trails
CSE3180 Summer 2005 Lect 3 / 79
Non-Structured InformationNon-Structured Information
There is a single focal point around which unstructured information management practice and technology have converged
No surprise - it is the Internet
This facility has enabled the widespread distribution of unstructured information from a very large base of resources
The World Wide Web is a hyper-linked, unstructured information repository with millions (?) of documents
CSE3180 Summer 2005 Lect 3 / 80
Non-Structured InformationNon-Structured Information
The Web has made it possible for specialised content to be published, as well as to advertise products
Specialised on-line information subscription services exist which provide industry-specific information (for a fee)
Stock market information services and sites
Monash University has on-line examination results services (and enrolments ?)
CSE3180 Summer 2005 Lect 3 / 81
Non-Structured InformationNon-Structured Information
Internet technology is said to be ubiquitous - that is, it can reach anywhere - if there are suitable devices of course
All current document management systems support some degree of support for Web technology
Capabilities range
from
basic document viewing using dynamic HTML to make the document ‘visible’ regardless of its source format
to
advanced document access and distribution
CSE3180 Summer 2005 Lect 3 / 82
Non-Structured InformationNon-Structured Information
For instance a Computer Aided Design drawing (perhaps the Burnley tunnel?) in a user’s browser without requiring that the remote users have CAD software installed on their machines
Or perhaps the Mitcham - Frankston ByPass construction plans and estimates
Was the construction of the Monash tennis courts on Sir John Monash Drive in this category ?
CSE3180 Summer 2005 Lect 3 / 83
Non-Structured InformationNon-Structured Information
Some advanced search engines: (form flexibility)
Verity Inc Information Server
PC Docs/Fulcrum SearchServer
Inktomi Corp Search Engine
Externalisation - knowledge management
Autonomy Inc
Semio Corp
These use a semantic or lexical analysis engine to extract meaning from the information repository
CSE3180 Summer 2005 Lect 3 / 84
Non-Structured InformationNon-Structured Information
A traditional query would look like
‘interest rates AND stock prices’ and you would receive documents which included the 2 search words
A query expressed as ‘I am interested in the effects of interest rate changes on stock prices’ does not have specific key words nor phrases.
Such an expression would be have its content meaning derived from the information base. The result would be a categorisation of topics contained in the information base.
These would then be further analysed by the user over a varying number of criteria
CSE3180 Summer 2005 Lect 3 / 85
A few thoughts on ‘Detail’A few thoughts on ‘Detail’
Retailers :
Retailing is a competitive business. Success depends on the Retailer knowing the customers.
There are 3 factors at play
1. The changing likes and dislikes of consumers.
New product preferences are the result of an aging population, changes in family structure, flexible
lifestyles
and to no small degree, enterprise bargaining and working conditions.
A successful retailer must be aware and adapt to these factors
CSE3180 Summer 2005 Lect 3 / 86
A few thoughts on ‘Detail’A few thoughts on ‘Detail’
2. The uniqueness of each customer.
Individual needs can only be accommodated by knowing individual requirements.
‘Loyalty’ programmes are based on this premise
3. The importance of managing inventory levels, controlling markdowns, maintaining margins.
Static inventory results in greater interest expense. This acts as a barrier to reinvestment of stock which is moving. Non-moving stock will (in many cases) require markdowns to liquidate the stock, which negatively impacts margins
CSE3180 Summer 2005 Lect 3 / 87
A few thoughts on ‘Detail’A few thoughts on ‘Detail’
This latter point depends on the level of success with the first 2 factors (changes, uniqueness).
Challenges to retail businesses are:
Knowing who the customer is.
Track and capture purchase history. Identify future needs by storing data at a meaningful level of detail
As an example, a back to school marketing programfor customers who purchased school materials last
year.
CSE3180 Summer 2005 Lect 3 / 88
A few thoughts on ‘Detail’A few thoughts on ‘Detail’
Understand how the customer wishes to interact.
How will information be sought about a possible purchase ? Personal visit, phone, Internet, email ..?
How will the sale be finalised - personal, phone, Internet, credit card, cash , account entry ?
What is the customer’s preferred method of interaction ?
Perhaps a ‘personal’ profile should be developed
Be able to track and evaluate the strength of the relationship with each customer
Purchase history plus other contacts - e.g. warranty service, new or updated products
CSE3180 Summer 2005 Lect 3 / 89
A few thoughts on ‘Detail’A few thoughts on ‘Detail’
Know what is ‘enough’ detail.
Costs are associated with keeping details. Summary information may however result in incorrect
projections and decisions.
Historical data (such as for air conditioners as opposed to toothpaste)
Analysis of cyclical data for air conditioners should assist in reducing risks of commitments and
distribution of such items
The revenue possibilities should outweigh the expense of maintaining air conditioner data
The question is - how to decide what data to store
CSE3180 Summer 2005 Lect 3 / 90
A few thoughts on ‘Detail’A few thoughts on ‘Detail’
Incorporate the knowledge gained of customer needs into ‘business intelligence’
This knowledge should be used to
– analyse past performance– get insight into current trends– blend this information into the business plan– develop systems which accurately reflect the customers’
needs– develop systems which interact and support the profit
model
CSE3180 Summer 2005 Lect 3 / 91
Information RetrievalInformation Retrieval
Consider the following :
When I do count the clock that tells the time,
And see the brave day sunk in hideous night,
When I behold the violet past prime,
And sable curls all silvered o'er with white:
When lofty trees I see barren of leaves,
Which erst from heat did canopy the herd
And summer's green all girded up in sheaves
CSE3180 Summer 2005 Lect 3 / 92
Information RetrievalInformation Retrieval
Borne on the bier with white and bristly beard:
Then of thy beauty do I question make
That thou amongst the wastes of time must go,
Since sweets and beauties do themselves forsake,
And die as fast as they see others grow,
And nothing against time's scythe can make defence
Save breed to brave him when he takes thee hence
Shakespeare, Sonnet No.12
CSE3180 Summer 2005 Lect 3 / 93
Information RetrievalInformation Retrieval
Or , consider this:
Now is the winter of our discontent
Made glorious summer by this sun of York
And all the clouds that loured upon our house
In the deep bosom of the ocean buried
Now are our brows bound with victorious wreaths;
Our bruised arms hung up for monuments
Our stern alarums changed to merry greetings
Our dreadful marches to delightful measuresShakespeare: Soliloquy ? ? ?
CSE3180 Summer 2005 Lect 3 / 94
Information RetrievalInformation Retrieval
Finally, another extract:
The peasants who survived the plague found themselves in many cases afflicted by fresh burdens, for with fewer people to work the land, overlords demanded a standstill in wages and a return to feudal duties in full.
But, with the shortage of labour, workers naturally expected to be valued more highly and to be given better pay and more freedom.
Europe 14/15th Century
A history of the world - Rjunstead ( perhaps Enterprise Bargaining ?)
CSE3180 Summer 2005 Lect 3 / 95
Well, that’s enough for this weekEnjoy your weekend, and I’ll see you next Monday(which should be the 29th of November,2004)