Introduction - مدونة مادة M150 This unit aims to: Describe the notion of persistent data,...

24
11/2/2009 1 Introduction This unit aims to: Describe the notion of persistent data, how it is created, and how it is stored and accessed (logically and physically) on various types of storage device. E li h th it t d th li ti th t Explain how the internet and the applications that use it work, and address some of the issues that arise from transmitting data between computer systems. Explain how databases facilitate the storage, access and protection of data, and how metadata is important in providing access to multimedia dtb databases. Explore the issues of privacy and ownership of data and analyze some of the risks arising from storing data on computers and transmitting it across networks.

Transcript of Introduction - مدونة مادة M150 This unit aims to: Describe the notion of persistent data,...

11/2/2009

1

IntroductionThis unit aims to:

Describe the notion of persistent data, how it iscreated, and how it is stored and accessed (logicallyand physically) on various types of storage device.E l i h th i t t d th li ti th tExplain how the internet and the applications thatuse it work, and address some of the issues thatarise from transmitting data between computersystems.Explain how databases facilitate the storage, accessand protection of data, and how metadata isimportant in providing access to multimediad t bdatabases.Explore the issues of privacy and ownership of dataand analyze some of the risks arising from storingdata on computers and transmitting it acrossnetworks.

11/2/2009

2

Storing and accessing data in documentsFolders

We use filing cabinets to organize our paper documents.The cabinet has several drawers and each drawer hold anumber of files.Each file contains a number of related documents.For quickly retrieval of documents, we label each drawerwith a name, put a name tag on each file and place eachdocument in its correct file in specific order.The computer operating system works on the sameprinciple as a filing cabinet.

– If you access your hard disk from the computer desktop, youwill find a small number of documents and folders

Figure 2 A filing cabinet

will find a small number of documents and folders.– The rest documents can be reached by inspecting the

contents of the folders (by double clicking) until you reachthe lowest level of the hierarchy, in which the contents are alldocuments and there are no further folders to open.

– This is called a hierarchical or nested folder structure,because each folder may contain other folders

Figure 2.3 Hierarchical or nested folders

Storing and accessing data in documents

Windows Explorer is used to inspectthe contents of a disk.

– To open an Explorer window, rightclick on a disk or folder icon, and

l t E lselect Explore.An Explorer window has two panes:

– The right-hand pane is essentiallythe same as an ordinary folderwindow; its contents can bedisplayed as icons or small icons oras a list.

– The left-hand pane does not showany documents, but it shows hardydisks, folders and any icon whichholds other items (e.g. ‘networkshare’).

11/2/2009

3

The folder structure can be seen as a tree lying on its side.– The desktop is the root of the tree, and each folder is a branch.– The leaves of the tree correspond to documents.– Any similar hierarchical arrangement of objects is frequently called a tree

structure

Storing and accessing data in documents

structure.Although there are two distinctfolders called ‘mail’ in Figure 2.4 and2.5, no confusion arises becauseeach is in a different place and has adifferent path leading to it.

– A path contains the names of all thefolders that lead to it from the root.

– It allows you to identifyunambiguously a folder or document

– One ‘mail’ file would have the path:Ibex\Holidays\mail

– The other ‘mail’ file has the path:Ibex\Internet stuff\mail

Operating systems come with a search function which allows you to find items you have ‘lost’. – The Windows XP search function is called ‘Search’, and you can access

it through the ‘Start’ menu.

Storing and accessing data in documents

– The ‘Search Results’ window shows full path names.When you open a folder under a Windows operating system, the folder window shows the path name of the folder in its title bar. Consider the Windows path name: C:\Projects\M150\Assignments\TMA02.doc– ‘C:’ is the root, and refers to the computer’s hard disk.– ‘Projects’ is the name of a folder at the top level of the hard disk. j p– ‘Projects’ contains a folder called ‘M150’, which in turn contains an

‘Assignments’ folder. – The document ‘TMA02.doc’ is in the ‘Assignments’ folder.

11/2/2009

4

Each folder has a directory:– Directory is a list with information about the contents of a folder.– A folder is sometimes loosely referred to as a directory

Part of the directory for a given folder can be displayed on screen in

Storing and accessing data in documents

Part of the directory for a given folder can be displayed on screen ina number of ways to aid human identification of the contents:alphabetically, in order of last modification date, by size and by type.The directory of a folder also lists the address or physical location onthe disk of each document and subfolder in that folder. Such addressrepresents the line number of the document (or folder) in the VTOC.

– VTOC will be discussed later in the unit.– This address is internal to the operating system and cannot be seen in a userp g y

window.

Storage technologiesThere are various technologies of data storage such as: the hard disk(typically is built into your computer) or other removable storage media suchas CDs, DVDs, Zip disks, high-capacity tape cartridges.The number of bytes of data that can be held on a storage medium is called

Storing and accessing data in documents

The number of bytes of data that can be held on a storage medium is calledthe capacity of the medium.Typical document sizes are measured in kilobytes and megabytes, butmedia capacities are much larger than this and can be measured ingigabytes and terabytes.Table 2.1 gives the names and sizes of these commonly used terms.

11/2/2009

5

A hard disk is a storage medium comprising one or more(aluminium, ceramic or glass) plates, whose surfaces arecoated with magnetizable ferrite coating.Data is recorded on each surface by magnetizing a seriesf t i i l ll d t k h i Fi 2 9

Storing and accessing data in documents

of concentric circles called tracks as shown in Figure 2.9.The disk surface is divided into a number of equal sizedwedge-shaped regions called sectors.A marker identifies the first sector on each track.Within a sector each track holds the same amount of data(usually 512 bytes).

– The bytes are packed more closely on the small inner circles than onthe larger outer circles.

This is the basic unit of data handled by the disk controlmechanism, and is called a block.Irrespective of its physical location on the disk, each blockof data is guaranteed to be the same size.

– This simplifies the transfer of data between a disk and thecomputer.

The actual reading from and writing to the disksurface is performed by a read/write head, which isattached to an arm that moves to and from thecentre so that it can locate any track on the disk( Fi 2 10)

Storing and accessing data in documents

(see Figure 2.10).The disk is kept spinning continuously, so eachsector is under the head at some time.The head hovers close to the spinning surface,which needs to be engineered carefully to avoidphysical contact between the head and thesurface.

– If this happens, the surface coating will be damaged,d i h i d h d ddestroying the magnetic pattern and the data storedcan no longer be retrieved.

This is an example of a disk crash.

– Another cause of a disk crash is if a particle of dustgets in the tiny gap (5 microns or less) between thehead and the surface.

11/2/2009

6

For each plate in a disk there are two read/write heads, one for eachsurface.In a read operation the head detects a magnetised pattern and transmits itas a series of bits to the processor

Storing and accessing data in documents

as a series of bits to the processor.In a write operation, the head magnetises the relevant pattern of bits on tothe surface.When new data is written to a magnetic disk, the only thing that changes isthe magnetic pattern recorded on the disk.For this reason, magnetic disks as they are called can be reusedrepeatedly. It is only when they deteriorate physically that they can nolonger be used.Th h d i t d ith ll f i d tThe heads associated with all surfaces move in and outtogether; at anyone time they can read from thecorresponding tracks on both surfaces of every plate in thedisk. This set of tracks is called a cylinder as shown inFigure 2.11.

There are various removable storage media devicessuch as:– Zip drive: similar to a hard disk drive but uses removable disks.

Storing and accessing data in documents

– Floppy disks, which work on the same principle as a Zip drive,but with a capacity limited to 1.4MB.

– Optical disc, which may be a CD (compact disc) or DVD (digitalversatile disc).

The capacity of a CD is 650MB and data is stored on only one sideof it in a single spiral groove which winds round the disc 22,188times.The data is packed niforml along the groo e so that o ter tracksThe data is packed uniformly along the groove, so that outer tracksof the groove hold more data than inner ones.

To cope with this, the disc spins more slowly when accessing data nearthe centre.

Conventional CD-ROMs and DVD-ROMs are sometimes called‘write once, read many’ (WORM) media.

11/2/2009

7

Conventional CDs (also called CD-ROMs) have bits of data stored as‘pits’ in their groove.Beams of laser light are used to burn the pits on the disc. A CD driveworks by shining a low-power laser beam on the disc, which detectsthe presence or absence of a pit

Storing and accessing data in documents

the presence or absence of a pit.There are two kinds of CDs which ordinary computer users can writeto:

– Recordable CDs (CD-R): is a Write Once Read Many (WORM) optical medium,though the whole disk does not have to be entirely written in the same session. Ithas a sensitive dye layer. Instead of burning pits on the CD, the writing processdyes the relevant parts of the groove. When read by a CD drive these dye spots areindistinguishable from pits on a conventional CD. The process is not reversible, soyou can’t rewrite data on CD-R.y

– Rewritable CDs (CD-RW): use a different technology altogether. (For this reasonnot all CD drives can read CD-RW disks.)

– The CD-RW writer uses a laser to create two states in the recording layer. The lasercan change each state to the other.

This process is reversible, so you can write to a CD-RW many times.CD-RW has the read/write characteristic of magnetic disks.

DVDs (also called DVD-ROMs) work in much the same way, but the data ispacked more tightly, using: smaller pits, a narrower groove and less overheadfor error correction.

– These factors increase the capacity of a simple DVD to 4.7GB.DVDs can also be manufactured to use both sides of the disk, and each side

Storing and accessing data in documents

DVDs can also be manufactured to use both sides of the disk, and each sidecan have one or two layers, yielding a theoretical maximum capacity of 19MB.One important difference between optical CD/DVD-ROM discs and magneticdisks (fixed or removable) is the ability to rewrite to them.

– Magnetization is a reversible process, so magnetic disks allow you to rewrite data.– The basic technology used for optical discs relies on burning pits into, or dyeing

spots on, a spiral groove. This process is irreversible so you can’t rewrite data onthem. (CD-RW is an exception).

Typically a single hardware storage unit such as a hard disk, Zip disk, CD orTypically a single hardware storage unit such as a hard disk, Zip disk, CD orDVD is called a volume.A volume needs to have physical (external ) labels with a title to be identifiedeasily.A volume should also have electronic label holding the name of the volume. Itwill be displayed when you search the contents of your computer.

11/2/2009

8

Sensible organization of storageEach volume contains a large number of documents, so there has to bea means of locating the one you want.In the case of a magnetic disk three numbers are required to identify a

Storing and accessing data in documents

g q yblock of data: cylinder number, surface number and sector number.

– This set of three numbers is called the address of the block.

To locate a document on the disk the operating system needs to knowits address.

– Therefore, each volume has a volume table of contents or VTOC.

The VTOC (shown in Table 2.2) is a table with one line for eachdocument to record the physical address on the volume where thed t b i d tt ib t h h th th d t bdocument begins, and attributes such as whether the document may bemodified or is read-only.There is also a line in the table to record where the remaining freespace starts.

– For the operating system to decide where to save new documents.

A single document might occupy one or more blocks on the disk.At the end of each block there is a marker which either indicates that this is the finalblock for the document or gives the address of the block that holds the next portion ofthe document.

Storing and accessing data in documents

11/2/2009

9

Moving documentsWhat happens when you move a document ‘M150notes.doc’ from a folder (say ‘Current’) to another (say ‘Models’) on your hard disk? The simple answer is nothing.Moving a document between folders on a disk is really an illusion because the document does not move at all!What really happens is that the document’s physical location remains unchanged, but the di i h ill d i h T bl 2 3 d 2 4

Storing and accessing data in documents

directories change as illustrated in the Tables 2.3 and 2.4.

After the document ‘M150notes.doc’ has been moved, the directories are asshown in Tables 2.5 and 2.6.

Storing and accessing data in documents

11/2/2009

10

What happen when you delete a document?– The document is moved to a special folder called ‘Recycle Bin’ or ‘Trash’ from

which it can be retrieved.– The document did not go anywhere; it remained in the same physical position on

the disk. It was the directory entry for the document that was removed, with a

Storing and accessing data in documents

new directory entry being created in the recycle bin.What you perceive when you navigate through the folders on your computeris not where the documents are located physically (this is hidden from you),but where they are located logically.

– Logical view shows the relationship of documents to each other in a hierarchical(nested) folder structure.

When you can decide to ‘empty’ the recycle bin documents are not movedphysically too.

– The entries for the document in the directory of the Recycle Bin are deleted. Sothe document may remain on your disk for a long time without being overwritten.However, it is inaccessible since its directory entries have disappeared and sothe operating system can no longer locate the document.

– It may be possible to recover the deleted document using a disk-recovery utility(an application that can find documents without using directories)

To put it all together, computers assist users who wish torecover a deleted document in many ways:

1. Moving the document to a special folder (Recycle Bin)

Storing and accessing data in documents

from which it can be recovered provided the bin has notbeen emptied.

2. Marking the document for deletion, but not actuallydeleting it or its directory entry. An ‘undelete’ operation isthen available until:a) the current application exits.b) The computer is powered down.c) The user issues an explicit purge command.

3. Providing a disk-recovery utility.Refer to exercise 2.4.

11/2/2009

11

Other storage media:– Magnetic tape is a storage medium which is slow and difficult to

access.The key difference between tape and most other storage media is that tapeis linear.(To reach a point on the tape it is necessary to wind the tape to that

Storing and accessing data in documents

point).The main strengths of tape are its high capacity, its reusability and itscheapness. (A single tape could hold 200GB).Magnetic tape is ideal for data back-up (emergency copy)and archiving(saving data that is rarely used for an indefinite period).

– Holostore is computerized storage using laser-producedholograms(three-dimensional image made with the aid of a laser).

Unlike discs, which are two-dimensional, a hologram is three dimensional,i th t t i h hi h l f d topening the way to storing much higher volumes of data.

– Biological storage media whose main idea is to represent 0s and 1susing two color states of a suitable form of synthetic DNA.

– A number of such memory units would be attached to a supportsubstrate to form a memory cell which is capable of transferring data athigh speeds since there are no moving parts.

Transmitting dataA network of computers is linked together by communications links.These links may be:– Dedicated cable links;– Public telephone networks;

R di i li k– Radio or microwaves links.Any organization using more than one computer is likely to have alocal area network (LAN) to exploit the benefits of resource sharing,such as: data, printers, Internet connection, …Pocket-sized computers known as PDAs (personaldigital assistants) as shown in Figure 3.1 cancommunicate with each other and with desktopcomputers using infra red signalscomputers using infra-red signals.– While in communication with each other they form a small

local network.– For local networks, a new standard called Bluetooth, is

now available to provide a single standard to replaceinfrared and high-frequency radio connections.

11/2/2009

12

Transmitting dataThe forerunner of the internet was ARPANET: a network of just fourcomputers at four universities linked together as a project of the AdvancedResearch Projects Agency (ARPA) in the United States.By 1996 that figure had grown to 15 million host computers.Just six years later in 2002 it had multiplied ten times to 150 million hostsJust six years later in 2002 it had multiplied ten times to 150 million hosts.The internet comprises a huge collection of computers (called hosts) withtelecommunications links between them.The internet links together not just one type of computer but any type ofcomputer running any operating system.

– By adopting the internet protocol each of these computers can become an internet host.

Public telephone line can be used to link to the internet with the help of aModem.

– A modem converts between analogue and digital signals. It is used with conventionaltelephone lines that are analogue at the point of access.

Nowadays, telephone companies are offering ADSL (Asymmetric DigitalSubscriber Lines).

– This is a technology which allows data to be transmitted digitally at high speed overconventional copper telephone wires.

Transmitting dataIn 1990, Tim Berners-Lee at CERN created the forerunner of theweb which today is a collection of hypertext documents distributedworldwide and linked by the internet.The value of the web is that trillions of pages of web content arelinked together via multiple hyperlinks, like a spider’s web.linked together via multiple hyperlinks, like a spider s web.The basic unit of web content is the web page which is an HTMLdocument. The browser accesses the page, held on a remotecomputer, and downloads it to your computer.– Downloading a page means transmitting a document from a computer

(web server) somewhere in the world to your computer (the client).The speed of download of a web page is influenced by:1. The amount of data in the page, i.e. the size of the download.p g ,2. The speed of your modem.3. The quality of the phone line. If the line is noisy the speed will go down

to increase the accuracy of the transmission.4. The speed of your computer.5. The amount of traffic on the internet.

11/2/2009

13

Transmitting dataInternet addressing

A message sent across the internet must have an address which have severallevels (similar to the conventional postal system).At the highest level, there is the top-level domain where domain means acollection of internet hostscollection of internet hosts.The internet has two types of top-level domain:

– Those with codes of three letters or more group users by category as in Table 3.2.– Those with two-letter codes are normally country specific as in Table 3.3.

Transmitting dataCountry code domains are usually subdivided, such as:– ac.uk (academic community), co.uk (commercial), and gov.uk (national

and local government).Many individual domain names are available within each top-levelMany individual domain names are available within each top leveldomain.

– Within ac.uk there is open.ac.uk, the OU domain. This acts as the centraladdress for the OU on the internet.

Once the domain name open.ac.uk has been approved by anexternal agency, the OU is free to allocate sub-domains and hostnames within this naming scheme.All internet addresses ending with .open.ac.uk belong to this domainand traffic within this domain is handled by local computers andsoftware.By convention, domain names are written in lower case.

11/2/2009

14

Transmitting dataThe address associated with a hyperlink is given in the form of aURI (uniform resource indicator), which specifies the servicerequested and the full address of the required document. Here is anexample: http://mcs.open.ac.uk/mcsexternal/courses/m150.htmp p p– The first part of the URI (http://) identifies the protocol (HTTP) to be

used when transferring the document.The protocol guarantees that the web server on the computer beingaddressed (called a server) understands the nature of the request.

– The next part of the address, ‘mcs.open.ac.uk’, specifies the server, i.e.the host that will supply the service.

– The host address is in two parts: ‘mcs’ , which identifies a particularcomputer and ‘open ac uk’ which identifies the domain in which ‘mcs’computer, and open.ac.uk , which identifies the domain in which mcs ,is to be found.

– The rest of the address is the path within ‘mcs’ that leads to the requireddocument. In this case the document is not at the root level of ‘mcs’.

– Instead, there is a folder ‘mcsexternal’ at root level, which contains thefolder ‘courses’, which in turn contains the document ‘m150.htm’.

Transmitting dataIt is usually convenient to assign a name to each computer on a network so thatusers can identify it easily.In a small network the names may be chosen arbitrarily. In a larger network it iscommon to use a systematic naming scheme. Examples: ‘anemone.open.ac.uk’ and‘buttercup.open.ac.uk’.The naming scheme for hosts is very convenient for humans (more memorable thanThe naming scheme for hosts is very convenient for humans (more memorable thanthe IP number, reducing the likelihood of error) but it is not used by the messagesthat travel across the internet.Instead each host has a 4-byte number associated with it, called its IP (internetprotocol) number.The IP number carried by a message ensures that it reaches the correct destination.How does the message discover the IP number of its destination host?

– The answer is that special directories, called domain name servers, keep this information.– Domain name server is a directory which resolves URIs into IP numbers; it thus allows users to quote

address names rather that actual numerical addressesaddress names rather that actual numerical addresses.

The first thing that happens when a URI is executed is that the host name is sent to adomain name server to be resolved.New hosts are being added to the internet all the time. So domain name servers needto be kept up to date.

– Within each domain there is a domain name server which knows the address of each host at the next lowerlevel in the hierarchy within its own domain.

11/2/2009

15

Transmitting dataLogical and physical names

Sometimes, websites need to be moved to another host, because itsserver has crashed, or because its no longer has enough storage.In this case no one can find your web pages any more as they areIn this case no one can find your web pages any more, as they areno longer at the same address unless you maintain the old webserver and redirect all the requests for web pages to the new server.

– Messy solution!

A better solution is to avoid reliance on named physical machines.This is done by identifying the web server to the internet by a logicalname (rather than physical names).This means that an index must be kept which associates the logicalThis means that an index must be kept which associates the logicalname of a server with its current physical host.

– It is the domain name server job to ensure that logical names delivered theaddress of its true host.

Transmitting dataEmail over the internet

Typically you use a mail application, or mail client, on your computer to handle email. Email is asynchronous.Like other internet applications, email works on all computing platforms .pp , p g p

– One way in which email achieves this universality is that it uses text messages comprising ASCII-coded text only.

Unlike other URIs, which identify hosts on the internet, an email address identifies a user. It looks like this:[email protected]– The part before the @ symbol is the user name (which has to be unique).– The part after the @ symbol is the domain name.

When you send documents or email messages data is broken up into unitsWhen you send documents or email messages, data is broken up into units of a standard size called packets and travels across the internet.Each packet carries the address information so that it will reach its intended destination.The packets are re-assembled into a single item on arrival.

11/2/2009

16

Transmitting dataAlong with the actual message (or data)content, an email also carries transmissioninformation in a number of lines, calledheaders as shown in Figure 3.3.The purpose of the first five headers isobvious.Reply-to field gives the reply address of thesender, and is used by mailers to generatean automatic mail address when replying toa messageX-Mailer: Pegasus Mail for Windows, meansthat the message was composed usingversion 4.01 of the Pegasus mail client forWindows.MIME-version: 1.0, indicates the version of,the internet standard for encoding mailattachments (version 1.0 of MIME).Content-type: text/plain, specifies the natureof the data as types/subtypes.

– The type ‘text/plain’ will usually be followed by‘charset=iso-8859-1’ for English text.

Transmitting dataAlthough email transmission is restricted to text, it is possible toattach documents of any kind to an email message.– This is achieved by encoding the attached file as a series of alphabetic

characters (sequence of bits grouped into bytes )and appending them to( q g p y ) pp gthe end of the message.

– The resulting sequence of bytes are interpreted as text characters.– In this way an arbitrary attachment can be converted into ASCII code

suitable for email transmission.The receiving mail client needs to know that the message has anattachment how the attachment was encoded (so that it can decodeit).– Hence, the encoding scheme must conform to a standard.– The internet standard for encoding mail attachments is MIME

(Multipurpose Internet Mail Extensions).– Other standards are also available, but the key factor is that both sender

and receiver can implement the same protocol.

11/2/2009

17

Transmitting dataThe MIME standard was originally published in 1982.Currently it not only covers mail attachments, it allows the followingextensions to the basic principle that email be ASCII-coded text:– The use of non-ASCII character sets in email messages;– The use of non-ASCII character sets in email messages;– An extensible set of formats for handling non-text parts of messages

(e.g. pictures);– Non-ASCII text in email headers.

These extensions to the basic email principle are notified by meansof several additional mail headers, as seen in Figure 3.3: headersMIME-version and content-type (types/subtypes).Table 3 4 describes the main content types/subtypes andTable 3.4 describes the main content types/subtypes andappropriate methods of handling.

Transmitting data

11/2/2009

18

Transmitting dataHow does data travel?

Sequence of bits are transmitted between computers through amedium.– This could be a wire carrying electrical signals an optical fiber carrying– This could be a wire carrying electrical signals, an optical fiber carrying

light signals or a wireless connection, such as an infra-red, radio ormicrowave link.

Communication between computers takes place in the form of serialtransmission – a single channel carries a stream of bits in sequence.– When bits arrive at a host, there is no way of telling what purpose the

data serves, or even where each byte begins and ends, without the useof a protocol.

– Protocols are needed to ensure that, on arrival, the receiving computerinterprets the stream of bits with its original meaning.

– If an invalid transmission is received, the receiver then sends a simplemessage to the transmitter indicating a failed transmission.

– The transmitter uses this information to transmit the data again.

Accessing dataA database is a collection of data stored in a computer systemaccording to a set of rules, and organized to facilitate accessinvolving complex searches and selection.Database is a form of persistent data i e it exists after theDatabase is a form of persistent data, i.e. it exists after theapplication that created or modified it finishes and after the computerthat stores it is switched off.The primary emphasis of Databases is on making the datapersistent, and structuring it so as to minimize redundancy, avoidinconsistency and maximize the usefulness of the data for thepurposes of access and updating.A query (request to database) is used to get specific informationA query (request to database) is used to get specific informationfrom the db. The response to the query ideally extracts from thedatabase all the relevant information. So a database is part of aninformation system.

11/2/2009

19

Accessing dataIn practice, databases consist of many tables holdingvast amounts of data, which have to be designed toprovide answers to (possibly complex) queries.A typical industrial database system will consist of thefollowing:– A collection of tables.– Data (called metadata) which describes the tables; for example,

what each column in a table means, and how many tables thereare in a database.Facilities for backing up the tables which enable the company to– Facilities for backing up the tables which enable the company tostore the data on some safe medium like a collection of CDs.

– Facilities for ensuring security (The system might keep the creditcard details).

– A query facility.

Accessing dataObject database represents an enhancement to a conventional database,and provides facilities for the storage and retrieval of additional data typesapart from text, like images, audio clips, video streams.At a basic level, a data object (sound, image, video) may be containedwithin a database as ‘an object in a box’.jThe box has a name and the database can access the object using thisname.An object stored in this way is called a BLOB (binary large object).The weakness of this approach is that you can NOT query the content of aBLOB.

– Thus, an image can be saved and retrieved, but you cannot query the databasefor all images that match a certain color.The BLOB has no structure to it– The BLOB has no structure to it.

Object databases expand the concept of a database just like the XMLmarkup language adds several dimensions to HTML.

– There is a software product which can take an XML document and build acorresponding object database

11/2/2009

20

Accessing dataIn order to describe anything other than the simplest of data, it is necessary toprovide some form of explanatory data (i.e. metadata) about the data. Email headeris an example, which is used to describe an email message and its possibleattachments

– Metadata: additional data held in an information system and used to describe the main data.Web pages have a primary form of metadata in the form of keywords that can beused by search engines to locate web pages of a particular topic.Each item in the <HEAD> section of an HTML document is an example of metadata.It is not part of the content of the document; rather, it says something about thecontent.

Accessing dataNowadays a massive amount of audio-visual information is becomingavailable in digital form (including audio CDs and DVDs with digital video).

– The more material there is, the less valuable it is unless a desired item can be retrieved withrelative ease.

A new MPEG standard, MPEG-7, is being developed to provide a frameworkfor audio visual descriptorsfor audio-visual descriptors.The MPEG-7 visual descriptors describe the visual features of images andvideo such as color, texture, shape, position, motion and face recognition,while the audio descriptors include key, mood, tempo and tempo changes.Using MPEG-7, a metadata description of a scene could be re-assembledinto English as:

– This is a scene with a barking brown dog on the left and a blue ball that falls down on the right,with the sound of passing cars in the background.

Using metadata to describe multimedia content will allow users to retrieve theUsing metadata to describe multimedia content will allow users to retrieve thefollowing sorts of information from databases.

– Find musical pieces which match a few bars played at a keyboard.– Find graphics and logos matching a given diagram.– Given an extract of vocal music, locate photographs, video clips and recordings by the same

artiste.– Search a football video for clips of the goals scored.

11/2/2009

21

Ethical, legal and security issuesPrivacy means keeping some things removed fromgeneral or public knowledge.Most Western countries now have some form of data-protection legislationprotection legislation– aims to enable people to maintain a reasonable level of privacy

Data-protection laws impose some restrictions upon thestorage and transmission of personal information, suchas:– Anyone storing or transmitting information about you should

have good and justifiable cause for doing so, and the datahave good and justifiable cause for doing so, and the datashould be accurate.

– They should only keep the data for as long as necessary for thepurpose, and they should protect the data from unauthorizedaccess.

Ethical, legal and security issuesEthics is defined as a set of moral principles that should guide our acts as a citizen.Here are the ten principles listed by the Computer Ethics Institute:

1. You shouldn’t use a computer to harm other people.2 You shouldn’t interfere with other people’s computer work2. You shouldn t interfere with other people s computer work.3. You shouldn’t snoop around in other people’s computer files.4. You shouldn’t use a computer to steal.5. You shouldn’t not use a computer to bear false witness.6. You shouldn’t copy or use proprietary software for which you have not

paid.7. You shouldn’t use other people’s computer resources without

authorization or proper compensation.authorization or proper compensation.8. You shouldn’t appropriate other people’s intellectual output.9. You should think about the social consequences of the program you are

writing or the system you are designing.10. You should always use a computer in ways that ensure consideration and

respect for your fellow humans.

11/2/2009

22

Ethical, legal and security issues

Web pages often contain links to web pages developed by other users. For example: gateways.gateways.Example: Consider an internet newspaper site which contains links to individual stories stored at other online newspaper sites. What should the ethical position be on this? – It could be regarded as an example of intellectual g p

property theft. – It could be argued that the material has not been

stolen because the text has not been cut and pasted but simply linked to.

Ethical, legal and security issuesLegislation to protect data, and in particularcomputerized data, is desirable. However, the law itselfis never sufficient.– There are lots of intruders who do not respect the law.

Once you link your computer to the internet, you need tothink about ways of making it less accessible tounwanted visitors (hackers).Two possible solutions are:– Allow access using a password only.– Secure a whole network of computers from unauthorized outside

access by using a firewall (a software system which controlsdata traffic entering and leaving the network).

11/2/2009

23

Ethical, legal and security issuesOwnership and rights over data

The concept of data ownership is legally very unclear in most countries.– Who actually ‘owns’ medical records? (Patients, Doctors, pharmacists or health

departments).– Different countries have different opinions!

Currently, information technologies, particularly those that enable copyingand reuse, are at the forefront of discussions about intellectual propertyrights (the right to gain financially from the products one creates) andmoral rights (the right to say how one’s products can be used).

– For example, you have rights over the content of letters that you write, and eventhe content of assignments that you prepare for this course.

– The right remains with you, the author, not with the recipient.– So if you send a letter to someone that person cannot legally publish the

contents of the letter unless they comply with relevant laws.

Copyright laws: afford some sort of protection for intellectual property(databases are subject to copyright)

Ethical, legal and security issuesThe security of computer systems can be compromised by attacks from worms, viruses and Trojan horses. Even junk mail can be seen as a breach of privacy.Junk mails are unsolicited email messages which are of no interest to you.

– Forwarded email and emails sent to many addressees spread email addresses to the public domain and hence become the target for junk messages.

A worm is a program intended to subvert a whole network of computers. It transfers copies of itself to other machines on the network. A virus is a program designed to cause specific damage to your software by attaching itself to documents. (run macros which damage your software or even your operating system; delete large numbers of documents from your hard disk).A Trojan horse is a code which looks legitimate but attempts to do something quite different (modify documents on your hard disk, collect passwords…). Typically the name of the document will be misleading. Anti-virus software will protect your system but needs regular updates because new viruses appear on a daily basis.

11/2/2009

24

Unit SummaryIn this unit you learn about:

The various types of storage devices which are suitable for different purposes.The internet and its services such as the web and email and how data is transmitted between computers on the internet according to protocols.Database and how it is used to organize large collections of data in a structured fashion to provide ease of access, backup facilities and secure access.Metadata and its importance in providing access to multimedia databases.Legal and ethical frameworks which place controls over what data may be stored, by whom, and for what purposes.Other measures that are needed to protect the data from malicious applications like viruses.