Computing for Bioinformatics Introduction to databases What is a database? Database system...

13
Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available Microsoft Access

Transcript of Computing for Bioinformatics Introduction to databases What is a database? Database system...

Page 1: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

Computing for Bioinformatics

Introduction to databases

• What is a database?• Database system components• Data types• DBMS architectures• DBMS systems available• Microsoft Access

Page 2: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

What databases are not

• unstructured piles of data (including heaps of web pages)

• spreadsheets such as Excel tables• text files with neatly tabulated data• data collected for one kind of analysis only

Why are these things not databases?

Page 3: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

Spreadsheets versus databases (1)

• A spreadsheet is typically viewed as an entire table of cells which may contain– numbers (data)

– text (labels)

– formulae (calculations producing results)

• A database may be structured in various ways, usually so that a small subset of the data is presented as the result of a search

Page 4: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

Spreadsheets - databases (2)

Spreadsheets

• Can be used immediately with little preparation (or thought)

• Data is visible

• Data entry is simple

Databases

• Require planning

• Data is hidden

• May require a program to help you enter or retrieve data

Page 5: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

Spreadsheets - databases (3)

Spreadsheets

• Little checking is carried out

• Tables and graphs can be produced

• Single user

Databases

• Extensive integrity checks can be arranged

• Reports can be programmed

• Searches can be made

• Can be multi-user

• Can be put on the Web with a suitable user interface program

Page 6: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

What a database is

• Data is stored separately from any application programs which might use it

• Multiple uses of the data are envisaged• Designed for retrieval in various anticipated and

unanticipated forms

Page 7: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

What are they used for?

Biology:• data about species• details of publicationsBiodiversity:• data about biological specimens• data about areas, places, sampling sites, habitats etc.

(sometimes in Geographical Information Systems (GIS)Bioinformatics:• results of experiments• molecular sequences, protein structures• gene frequencies, gene expression data, etc.

Page 8: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

DBMS types (database internal structure)

What are the main types of database design? (The internal mechanics, not the information stored or the appearance of the database as seen by the user.)

• “Free text” - records not divided into fields

• “Flat-file” - records have fields (one table with columns like a spreadsheet), common and easy to understand, often inefficient

• Hierarchical, Network - now obsolete

• Relational - several tables, usually the choice of the professional (solid, boring)

• Object-oriented - for the adventurous (cutting edge)

Page 9: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

Database system components (1)

A database management system (DBMS) has the following essential components:

• Data tables (the data itself)• Database “engine” (stores data to and retrieves

data from the tables)• User interface (for humans to enter, view and edit

data)Some commercial general-purpose DBMSs, such as

Microsoft Access, make the engine and the interface appear as one (although Access can use other engines)

Page 10: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

Stand-alone and client-server systems

• Some database systems are integrated (“stand-alone”): the engine and the interface are combined (MS Access)– the data may also be on the same machine

• “Client-server” database systems put the data tables and the storage engine on a remote “server” computer– the user accesses the remote database server using a

local database client program

Page 11: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

Accessing the data in the database

• A user can use a built-in user interface to search, edit, etc. (e.g. in Microsoft Access)

• A user can use a separate or even third-party general-purpose client program, especially in the case of client-server systems such as MySQL, Oracle, etc.

• Such clients often use the SQL language (pronounced either “ess-cue-ell” or “sequel”) as a (fairly) standard way to formulate search requests, data editing instructions, etc.

• Special-purpose client programs may also be written (in Perl, Java, PHP, etc.) to perform such access, using SQL “embedded” in the program

Page 12: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

Database system components (2)

A DBMS is usually also associated with: • Database “drivers”, import & export modules, etc.

(for programs to store, retrieve and alter data)• Application programs, which use drivers to

connect to the database, send SQL commands to it and do useful things, sometimes called “business logic”; may be general-purpose or specialised)

• Report writer (a specialised application program)• Utilities (ditto, for back-ups, integrity checking,

etc.)

Page 13: Computing for Bioinformatics Introduction to databases What is a database? Database system components Data types DBMS architectures DBMS systems available.

Smallest ever guide to SQL

• Database table definition: column names, data types, indexes, etc.

• Data records may be inserted, altered or deleted• Data retrieval is based on the idea of selecting

columns and rows to obtain a subset of a larger stored table, e.g.– SELECT name, salary FROM Employee WHERE

name LIKE ‘Smith%’;

• Data may be retrieved from two or more tables using “joins” on linking data fields (keys)