Video Over an ATM Desk Area Network - pdfs.semanticscholar.org fileiii Acknowledgments I would like...

Video Over an ATM

Desk Area Network

Brendan Behan

Supervised By Dr Mark F. Schulz

The University of Queensland

Department of Electrical and Computer Systems

Engineering

Undergraduate Thesis 1996

ii

20/44 Brisbane St,

TOOWONG QLD 4066

1st November, 1996

The Dean,

Faculty of Engineering,

The University of Queensland,

ST LUCIA QLD 4072

Dear Professor Simmons,

In partial fulfilment of the requirements of the Bachelor of Engineering Degree (Honours) in

Electrical Engineering (Computer Systems), I submit for evaluation the thesis entitled

“Video Over an ATM Desk Area Network”.

Yours faithfully,

Brendan Behan.

iii

Acknowledgments

I would like to thank the other two members of the UQ DAN project team, David Gregory

and Thillainathan (Ted) Aravinthan for their support in realising an operational Desk Area

Network. The eternal optimism of David Gregory was always inspiring at times when the

situation appeared bleak.

Thanks must also go to Dr Mark Shulz for providing the means to allow the UQ DAN team

to pursue this topic, and to Len Payne for his guidance throughout the year and the use of

his equipment.

Finally, I am indebted to Guillan Fava and Kieran Behan for their patience and willingness

to preview this document, and the suggestions that they made.

B.B.

iv

Abstract

The use of multimedia data in computing applications has increased exponentially since its

emergence. In many cases, the increase is such that traditional computers struggle to handle

the volume of data associated. This thesis attempts to show through demonstration, that

such problems can be alleviated by basing the design of the computer around a high speed

network.

The Desk Area Network (DAN) is physically small, yet fast, Asynchronous Transfer Mode

(ATM) network that can be used to replace the bus of a traditional workstation. Such a

replacement provides a great improvement in the systems ability to handle multimedia data

as it removes the bottleneck associated with a bus architecture. ATM was chosen as the

network technology as it is capable of handling a variety of data types, and it allows

concurrent data transfers between devices.

This thesis outlines the design and implementation of a display node for the University of

Queensland Desk Area Network. The display node is one of three devices in the UQ DAN

project. Its purpose is to display full motion, colour video that is placed onto the network

by the UQ DAN’s camera node. A Peripheral Component Interconnect (PCI) bus interface

was constructed to allow this video signal to be displayed on a computer screen.

v

Table of Contents

1. Introduction..................................................................................................................1

2. ATM and the DAN.......................................................................................................5

2.1 Synchronous Communication....................................................................................5

2.2 Asynchronous Communication..................................................................................6

2.3 Asynchronous Transfer Mode...................................................................................7

2.3.1 ATM Protocol Specifics.....................................................................................7

2.3.2 Categorisation of Service Requirements.............................................................9

2.4 Desk Area Network..................................................................................................9

2.4.1 The Cambridge DAN.......................................................................................10

2.4.2 The VuNet Desk Area Network.......................................................................12

2.4.3 DAN Simplifications........................................................................................13

3. The University of Queensland DAN..........................................................................14

3.1 UQ DAN’s ATM Protocol......................................................................................14

3.2 The OSI Model.......................................................................................................15

3.3 The DAN Physical Layer........................................................................................16

3.4 The DAN Data Link Layer.....................................................................................18

3.5 Operation of the UQ DAN......................................................................................19

4. The Display Node.......................................................................................................21

4.1 Display Node Requirements....................................................................................21

4.2 Implementation Technology....................................................................................22

4.3 Receiver Structure..................................................................................................23

4.3.1 Processing Control Cells..................................................................................25

4.3.2 Pixel Address Calculations...............................................................................25

4.3.3 Processing Data Cells.......................................................................................27

4.4 The Transmitter’s Structure....................................................................................28

vi

5. The PCI Interface.......................................................................................................29

5.1 Computer Buses.....................................................................................................29

5.2 The PCI Bus...........................................................................................................31

5.3 The UQ DAN’s PCI Interface.................................................................................34

5.3.1 Address Decoding............................................................................................36

5.3.2 Configuration Registers....................................................................................37

5.3.3 Software Support.............................................................................................40

6. Results.........................................................................................................................42

6.1 Display Node Implementation.................................................................................42

6.1.1 Board Implementation......................................................................................43

6.2 The Physical Layer’s Performance..........................................................................44

6.3 PCI Interface Performance......................................................................................44

6.3.1 Average Grant Latency....................................................................................45

6.3.2 Response Time of Video Card..........................................................................45

6.4 Receiver Performance.............................................................................................46

6.4.1 Maximum Achievable Resolution.....................................................................47

6.5 UQ DAN’s Performance.........................................................................................48

7. Conclusions.................................................................................................................51

8. References...................................................................................................................53

9. Bibliography...............................................................................................................55

10. Appendix A...............................................................................................................56

11. Appendix B...............................................................................................................59

12. Appendix C...............................................................................................................60

13. Appendix D...............................................................................................................61

14. Appendix E...............................................................................................................62

vii

List of Figures

Figure 1.1 A Traditional Bus System and the Desk Area Network.....................................1

Figure 1.2 The Desk Area Network Developed at UQ.......................................................3

Figure 2.1 Division of Transmission Time as Frames and Timeslots...................................5

Figure 2.2 ATM Cell Format.............................................................................................8

Figure 2.3 The Cambridge Desk Area Network...............................................................11

Figure 2.4 The VuNet Switch and Operational DAN.......................................................12

Figure 3.1 The University of Queensland DAN................................................................14

Figure 3.2 The OSI Seven Layer Model and ATM...........................................................16

Figure 3.3 Physical Layer Port Connections.....................................................................18

Figure 4.1 Overview of the Display Node’s Structure......................................................21

Figure 4.2 Comparison of Maximum Bus Throughputs....................................................22

Figure 4.3 Flowchart of Receiver's Operation..................................................................24

Figure 4.4 Format of Control Cell Information.................................................................25

Figure 4.5 Pixel Address Calculation...............................................................................26

Figure 4.6 The Transmit State Machine...........................................................................28

Figure 5.1 Configuration of Devices on the PCI Bus........................................................31

Figure 5.2 The Required and Optional PCI Signals..........................................................32

Figure 5.3 Timing of Interface Signals in a Write Transaction..........................................33

Figure 5.4 The Display Node's Datapath..........................................................................35

Figure 5.5 Timing of Address Decoding..........................................................................36

Figure 5.6 Defined PCI Configuration Registers..............................................................38

Figure 6.1 The UQ DAN’s Display Node........................................................................43

Figure 6.2 The Wire-Wrapping of Pin Connections..........................................................43

Figure 6.3 Measured Response of the Video Card...........................................................46

Figure 6.4 The UQ DAN’s Operation on Demonstration Day, 1996.................................49

Figure 6.5 UQ DAN's Interface in Patchwork Mode........................................................50

viii

Table of Nomenclature

AAL ATM Adaptation Layer

ATM Asynchronous Transfer Mode

BIOS Basic Input Output System

BISDN Broadband Integrated Services Digital Network

CLP Cell Loss Priority

CMOS Complementary Metal Oxide Semiconductor

CPU Central Processing Unit

CRC Cyclic Redundancy Check

DAN Desk Area Network

EPLD Electronically Programmable Logic Device

FIFO First-In First-Out

HEC Header Error Check

I/O Input / Output

ISA Industry Standard Architecture

ISDN Integrated Services Digital Network

Mb/s Megabits per second - 1,000,000 bits/sec

MIT Massachusetts Institute of Technology

OSI Open Systems Interconnect

PCI Peripheral Component Interconnect

PQFP Plastic Quad Flat Pack

PTI Payload Type Identifier

QoS Quality of Service

STM Synchronous Transfer Mode

UNI User-Network Interface

UQ University of Queensland

VCI Virtual Circuit Identifier

VESA Video Electronics Standards Association

VPI Virtual Path Identifier

Chapter One

1. Introduction

‘Multimedia’ is one of the latest buzz words in the computing field, and its arrival has

brought about a whole new generation of computing applications. Day to day computing is

being revolutionised by the accessibility of the internet and the emergence of multimedia

applications that are able to bring the internet to life. Video-conferencing, interactive video

and internet phone systems are all examples of multimedia applications that enable people to

communicate with distant acquaintances, or increase their knowledge through the use of

their computer. However, even though the types of data that people are processing has

changed dramatically over the past decade, the computers used to perform this processing

have remained the same.

This thesis investigates a novel computer architecture that is destined to make as big an

impact on traditional computer design as multimedia has on the computer industry in

general. The architecture in question is known as the ‘Desk Area Network’, or DAN for

short. The Desk Area Network was born out of the necessity to build a computer that was

better equipped to handle multimedia data. Conventional computers strain under the load

of having to display a full motion video signal and play the accompanying audio necessary

to take part in a video conference. The DAN, however, is able to handle both of these data

streams with ease, and still dedicate all of its processing power to other applications.

BUS

CPUMain

Memory

VideoCard

NetworkInterface

Figure 1.1 A Traditional Bus System and the Desk

Introduction

Page 2

Area Network

The key to the DAN’s success is that it replaces the ‘bus’ architecture of a traditional

computer with a high-speed network. Figure 1.1 shows the configuration of both systems

and highlights the limitation that bus architectures incur. This is that the system bus can

only be used by one device at a time. In the example illustrated, the CPU isn’t able to fetch

data from memory while a video image is coming from the network to the video card.

In the Desk Area Network, devices are effectively removed from within the computer and

connected directly to a network that allows simultaneous communication between multiple

devices. Removing the computer’s bus consequently removes the single biggest bottleneck

that faces the movement of multimedia data in the computer. The choice of network

technology to implement the DAN is critical to the system’s performance as not all

networks are capable of allowing simultaneous communication.

The Desk Area Network constructed uses an Asynchronous Transfer Mode (ATM)

network to connect individual devices. ATM is a high speed network protocol that is

increasing in popularity worldwide due to its acceptance as the basis of the next generation

of telecommunication networks by the International Telecommunications Union (ITU).

Networks that implement the ATM protocol receive the benefits of its ability to handle

many different forms of data very efficiently. Also, the use of ATM switching technology

to connect devices together provides the simultaneous communication between devices

sought, a feat which can not be achieved in bus systems, or on ring based networks like

Ethernet.

To demonstrate the concept of a Desk Area Network, an ATM network was developed that

incorporated a camera node (Gregory96) to source a form of multimedia data, an ATM

switch (Aravinthan96) to implement the ATM network, and a display node to receive the

multimedia data and display it to the users. The interconnection of these devices is

illustrated in Figure 1.2. This thesis discusses the design, implementation and resulting

performance of the DAN’s display node and the physical transmission system of the ATM

network. The display node was required to accept the digital video from the camera and

display it to the users without any CPU intervention. This goal was achieved by designing

Introduction

Page 3

and implementing a PCI bus interface to the PC through which the display node could write

the video data.

Thesis Structure

Some background information essential to understanding the concept of the DAN and the

design decisions made during the course of its implementation is provided in Chapter Two.

This includes basic theory on the manner in which ATM operates and a discussion of

existing DAN implementations.

The third chapter presents a detailed description of the Desk Area Network implementation

at the University of Queensland, and the simplifications that were able to be made due to the

nature of the DAN. The principle of layering network protocols is then discussed before

providing a detailed analysis of the design of the ATM network’s physical layer. Following

this is a brief description of the data link layer, and a discussion of the network’s operation.

Chapter Four of this thesis covers the design of the DAN’s display node. In particular, it

provides an overview of the different components involved in the display node’s operation,

and a closer examination of two vital components: the ATM cell receiver and the cell

transmitter. It also describes the interaction of these components with the computer

interface.

The computer interface in the UQ DAN’s display node was achieved via the Peripheral

Component Interconnect (PCI) bus. The fifth chapter of this document is dedicated to a

Figure 1.2 The Desk Area Network Developed at UQ

Camera Node ATM Switch DisplayNode

Introduction

Page 4

basic discussion of the operation of this bus architecture and the reasons for its selection as

the computer interface for the display node. This is followed by the design choices made in

implementing the PCI interface for the DAN.

The performance metrics of individual components of the display node are stated in Chapter

Six, along with a summary of the performance of the DAN as a whole. The final chapter

provides a summary of the design project and the work remaining before a ‘complete’ DAN

is realised.

Chapter Two

2. ATM and the DAN

This chapter begins with a discussion of synchronous and asynchronous communication

networks, highlighting the differences between the two and the advantages offered by an

asynchronous system. The ATM protocol is then described and its ability to handle

multimedia traffic streams noted. Finally, the concept of a Desk Area Network is explored

and existing implementations are outlined.

2.1 Synchronous Communication

The telecommunication networks that form a ubiquitous part of today’s society are based

on synchronous transmission systems. These systems were designed to handle voice grade

data optimally. Synchronous networks were designed to take advantage of the constant bit

rate of audio signals and the large delays between transmissions, to maximise the utilisation

of a communication link. The transmission time of a connection link is divided into fixed

size frames, that are comprised of a fixed number of equal sized timeslots, shown in Figure

2.1. When a user establishes a connection, that user is allocated a time slot within the frame

in which data can be transmitted. This timeslot is reserved for the user in every frame that is

transmitted, for the duration of the connection. This enables the network to determine

which connection is currently transmitting by examining the timeslot in the current frame.

Consequently, the time allocation is distributed to users in a round-robin fashion, and is

‘synchronised’ for each user to a position in the transmitted frame.

***Time

Frame 1 Frame 2

Slot 1 Slot 2 Slot 1 Slot 2Slot N Slot N… … Slot 1

* Synchronisation Point for User 2

Figure 2.1 Division of Transmission Time as Frames and Timeslots

ATM and the DAN

Page 6

This system is extremely efficient when handling voice grade signals due to their constant

bandwidth requirements. Problems arise, however, when less traditional data transmissions

such as file transfers are also using the system. A file transfer is very ‘bursty’ in its

bandwidth requirements. When the file is ready to transfer it will transfer at a very high

rate, but it then may experience a considerable delay before it is ready to transmit the next

file block. To allow the synchronous network to cater for the burst rates of the file transfer,

the transfer may be allocated several timeslots in the frame. The problem that synchronous

networks encounter is that in the idle periods between transfers, the file transfer is still

allocated its quota of timeslots in every frame. These timeslots must then go unused as the

network is not able to allocate them to other users. Asynchronous communication systems

were devised to overcome this inefficient use of bandwidth.

2.2 Asynchronous Communication

Asynchronous communication systems were developed to overcome the inadequacies of

traditional telecommunication networks. Such systems dispense with the frame structure

that inhibits transmissions in synchronous networks, and allocate timeslots to users on an ‘as

required’ basis. This approach ensures that no timeslots are wasted due to their allocation

to data sources during periods in which the data source is not prepared for transmission.

Another limitation of synchronous networks is that the maximum number of connections

that the network can support is equal to the number of timeslots in the frame. In this

situation, one timeslot is reserved for each connection and the network is saturated. With

asynchronous networks, however, it is possible to support more connections than an

equivalent synchronous network. This is accomplished through a statistical analysis of the

data streams and use of the assumption that not all bursty data sources will burst at the

same time. Additional buffering must be added to the network to allow for the contingency

when multiple sources do burst at the same time. The network utilisation is obviously much

higher in the asynchronous network, making it a far more efficient communication system.

A consequence of the system’s asynchronous nature, however, is that it is no longer

possible to use the time that data was transmitted to determine the connection to which that

data belongs. For this reason, a ‘tag’ is attached to all data sent through the network to

ATM and the DAN

Page 7

identify its respective connection. The network is responsible for associating the

connection’s source and destination to its tag when the connection is established.

2.3 Asynchronous Transfer Mode

Asynchronous Transfer Mode, or ATM, is a communication protocol that incorporates the

features of asynchronous networks. The number of ATM network implementations is

increasing rapidly due to its acceptance for the Broadband Integrated Services Digital

Network (BISDN) by the European telecommunications standards committee ITU,

previously known as CCITT. BISDN is a high bandwidth digital network that is being

established as the major network standard of the future for carrying, not only voice data,

but also computer and High Definition TV traffic.

The basic unit of information exchange in ATM is referred to as the ‘cell’. An ATM cell is

a 53 byte packet of data as shown in Figure 2.2. Of the 53 bytes, the first five bytes

comprise the tag field known as the ‘header’, and the remaining 48 bytes constitute the data

being sent in the cell, referred to as the cell’s ‘payload’. It is the small cell size that permits

ATM to function effectively with a variety of different data types. This is because a small

cell size allows the network to multiplex the data with a fine granularity. This ensures that

urgent data won’t be stalled in the network waiting for a large block of file transfer to finish

transmitting, effectively allowing a priority scheme to be established.

2.3.1 ATM Protocol Specifics

ATM is a connection oriented protocol that guarantees in-order delivery of cells. Delivery

is unreliable, however, which requires the use of additional hardware or software to support

this feature. The ATM format for the User-Network Interface (UNI) has only recently been

standardised (ATM95) by the ATM Forum and many of the fields contained in the header

remain ill defined. The format of the cell is shown in Figure 2.2. The tag field used is split

into two components: the Virtual Path Identifier (VPI) and the Virtual Circuit Identifier

(VCI). The VPI field in the header is used to determine the virtual path of a cell between

two endpoints in the network. This value is only valid between two ATM switches in the

network and is remapped by each switch before being sent on. The virtual circuit identifier

ATM and the DAN

Page 8

is used by the endpoints to determine the connection to which the cell belongs. Many VCIs

can be multiplexed onto the same virtual path through the network.

The Generic Flow Control (GFC) bits are undefined and must be set to zero. A Cell Loss

Priority (CLP) bit is included to provide information to determine which cells should be

discarded when the network becomes congested. This bit is cleared for higher priority cells.

The fifth byte in the header is the Header Error Check (HEC) which contains an eight bit

CRC check of the header to attempt to detect bit errors in the routing information.

The final three bit field is the Payload Type Identifier (PTI). Of these three bits, the first bit

is set to indicate whether the cell contains control information, or cleared when the payload

contains standard data. The second bit in this field is reserved (set to zero) and the final bit

is used for the ATM Adaptation Layer (AAL). As all data must be encapsulated into the 48

byte payload of a cell for transmission, a system must be incorporated to handle the

Segmentation and Reassembly (SAR). The ATM Adaptation Layer achieves this purpose.

The significance of the AAL bit depends upon which of the many adaptation layers defined

is being used. The AAL5 adaptation layer standard is regarded as being the simplest and

most efficient adaptation layer to implement (Greaves94). In this standard, the AAL bit is

GFC

Cell Payload

48 bytes

VPI

VCI

VCI

CLPPTI

HEC

VPI

VCI

7 6 5 4 3 2 1

1

2

3

4

5

6

8

53

.

.

.

Bit Position

Byte

Cell Header

GFC - Generic Flow ControlVPI - Virtual Path IdentifierVCI - Virtual Circuit IdentifierPTI - Payload Type IdentiferCLP - Cell Loss PriorityHEC - Header Error Check

Figure 2.2 ATM Cell Format

ATM and the DAN

Page 9

set to indicate that the cell received is the final cell in a block of data. This final cell

contains a count of the number of cells transmitted in the AAL5 block and a 32 bit Cyclic

Redundancy Check (CRC) of all the cells in the block. The cell count is provided to

determine whether the correct number of cells was received from the network and the CRC

is used to detect bit errors and cell reordering errors.

2.3.2 Categorisation of Service Requirements

There are typically two dimensions used to classify the service requirements of different

types of data transfers. These are the data’s sensitivity to delay and it’s sensitivity to loss.

Audio or video communication streams, for example, can tolerate the occasional cell being

lost while travelling through the network but cannot tolerate variable length delays in

transmission. This is because there is a low likelihood of an observer noticing the slight

deterioration in sound or video quality when a cell is lost, but a high probability of user

annoyance when sound segments are delayed by varying amounts. Such data streams are

also termed ‘jitter sensitive’. File transfers, however, are unaffected by delays through the

system but the loss of a single cell can render the received file useless.

ATM is capable of catering for all data types by providing a ‘Quality of Service’ (QoS)

parameter that can be adjusted for different transmissions. The Quality of Service

parameter is akin to a contract that is negotiated between the user and the network when a

connection is established. The user must specify details about the connection’s destination,

peak and average bandwidth requirements, and cell loss and delay requirements at this time.

If the network can provide the service quality that the user requires then the connection is

established, otherwise negotiations either continue for a lesser connection or no network

connection is established at all. The QoS guarantees that ATM can provide to many

different traffic types, is one of the primary reasons why ATM is ideal for use on the Desk

Area Network.

2.4 Desk Area Network

The Desk Area Network is a multimedia network intended to replace the bus in a traditional

workstation. Traditional buses have the limitation that they can only be used by a single

ATM and the DAN

Page 10

device at any time. This limitation becomes far more apparent when dealing with

continuous multimedia streams such as video or audio. Such data streams require very little

or no processing but require large amounts of data transfer and, consequently are termed

‘I/O intensive’. Attempts have been made to create ‘multimedia network interfaces’

(Blair93, Hopper90) that attempt to bear the bulk of this traffic from the workstation’s bus.

The specialised network interfaces developed take the form of autonomous peripherals that

must be connected directly to the network. The DAN architecture avoids the necessity of

additional multimedia network interfaces by removing the I/O bottleneck that the traditional

workstation bus imposes.

Implementing a Desk Area Network involves removing all components from the

workstation and connecting them directly to the ATM network. That is, the workstation’s

CPU, memory, display, storage devices, and any multimedia peripherals are all removed

from the workstation and become nodes on the DAN. The term ‘multimedia peripheral’

refers to any device, such as a camera, that can source or sink continuous multimedia data.

By connecting these devices directly to the ATM switch, different pairs of devices are able

to communicate with each other simultaneously. This means that the switch can route a

video stream directly from the camera to the display at the same time as sending a file from

the storage device to memory. It also means that I/O intensive streams that overwhelm the

traditional workstation can now be moved between endpoints on the network without any

CPU intervention. Two different DAN implementations exist to demonstrate the potential

of this architecture.

2.4.1 The Cambridge DAN

The configuration of the Cambridge Desk Area Network (Barham95) is illustrated in Figure

2.3. This figure clearly shows the manner in which different streams can pass through the

switch simultaneously. Each interface to the ATM switch was capable of supporting full

duplex data transfers at 100Mb/s. With this data rate, the DAN was able to support two

live video streams, one 48kHz audio stream, one transfer of a processed image and a

considerable amount of memory traffic. It was also recognised that, for the DAN to be a

feasible approach to implementing a multimedia workstation, the cost of the network

interface for each device must not be substantially greater than the cost of producing an

ATM and the DAN

Page 11

equivalent bus interface. Cambridge’s implementation of the DAN proved this to be the

case.

The DAN research group categorised the devices attached to the DAN into one of three

classes depending upon their ability to perform network functions. The three classifications

used are: dumb nodes, supervised nodes and smart nodes. Dumb nodes are only capable of

handling data and loading internal configuration registers from a well defined configuration

cell. Supervised nodes posses a certain degree of local processing power to enable them to

perform limited network operations. These nodes still require a more complete processing

node to establish connections in the network. The third class, the smart nodes, have

sufficient processing power to perform the network management functions for, not only

Figure 2.3 The Cambridge Desk Area Network

ATM and the DAN

Page 12

themselves, but for dumb and supervised nodes also. All data and network management

operations on the DAN are performed predominantly in hardware.

2.4.2 The VuNet Desk Area Network

VuNet is the DAN implementation by a research group at Massachusetts Institute of

Technology (MIT). Although the VuNet’s network architecture is equivalent to that of the

Cambridge DAN where multimedia devices are connected directly to an ATM network, the

implementation differs. The VuNet Desk Area Network implements as much functionality

as possible in software (Houth95). This approach was taken to allow the performance of

the network to increase with time, as the performance of future generations of workstations

increases. The ATM switch used for the VuNet project is shown in along with the DAN

created by inserting multimedia devices into this switch.

Figure 2.4 The VuNet Switch and Operational DAN

Although the VuNet’s performance will increase with upgrades in processor performance,

the software approach has yielded throughputs far below design specifications. Each

network interface in this DAN implementation was capable of 500 Mb/s full duplex data

transfers, five times that of the Cambridge DAN. The sustained throughput of data

achieved during testing was reduced to 37 Mb/s due to software contention for resources

and process switching. This throughput is less than that achieved using the specialised

network interfaces of the Pandora system (Hopper90).

ATM and the DAN

Page 13

2.4.3 DAN Simplifications

The two DAN implementations described utilised certain simplifying assumptions to aid the

implementation. The first of these was that the Desk Area Network occupied a small

physical space. Consequently, it was feasible to transmit the network data in parallel, a

solution that is typically too expensive to implement in larger networks. The Cambridge

DAN used eight bit wide data paths and the VuNet either 32 or 64 bit data paths. Parallel

transmission of data reduces the network clocking frequency required to achieve a certain

throughput by the degree of parallelism implemented, and results in simpler and cheaper

network interfaces.

A second simplification was that all devices connected to the DAN can be trusted not to

behave to the detriment of the DAN, just as any device connected to the bus of a

workstation can be assumed to be non-hostile. This greatly reduces the amount of security

precautions required when compared to a similar Local Area Network. When using this

assumption with one final assumption that the network topology is static for any period of

operation, a range of services from access control and fairness policies to congestion

control and topology discovery can be ignored. The majority of these network operations

are substantially complex to require a microprocessor to be included on the interface.

Reductions in the complexity and production costs of the network interfaces for the devices

can then be achieved by not supporting these operations.

Chapter Three

3. The University of Queensland DAN

The Desk Area Network being implemented at the University of Queensland is

architecturally much simpler than that at either Cambridge or MIT. To demonstrate the

benefits of a Desk Area Network, a four port switch was developed along with a source and

sink of video data. A diagrammatic overview of this system is given in Figure 3.1. The

ATM Network developed was capable of 100 Mb/s full duplex communication.

Discussed first in this chapter are the deviations from ATM specifications of the UQ DAN’s

ATM protocol. The concept of layered communications protocols is then explained, as well

as how ATM fits into this picture. The UQ DAN’s physical and data link layers are then

described before the UQ DAN’s operation is explained.

Camera

Node

Display Node

ATM Switch

Figure 3.1 The University of Queensland DAN

3.1 UQ DAN’s ATM Protocol

Using the Desk Area Network assumption of reliability, the ATM protocol used on the UQ

DAN was modified to use 52 byte cells. This was achieved by removing the Header Error

Check byte from the cell header. This feature is computationally expensive and extremely

difficult to implement in hardware. It is also unnecessary if it assumed that no errors will be

encountered in data transmissions through the network.

A second simplification was made to ease routing. Whereas a traditional ATM network

requires dynamic connection establishment and Quality of Service negotiation, the DAN

The University of Queensland DAN

Page 15

was capable of assigning these parameters statically. The static connection requirement is

analogous to the traditional computer bus in that, if a sound card is installed when the

computer is booted, it can be assumed that the card will be present for the duration of the

computers operation. Similarly, any device connected to the port of a switch can be

assumed to stay connected for the duration of the networks operation. Addressing of

devices in the network was then simplified by using the port number on the switch that

connects to the destination device, as the VPI in the cell header. Also, every device

connected to the DAN has a very well defined bandwidth requirement. The QoS available

to a device will subsequently not change during the network’s operation since new devices

cannot be added during the DAN’s operation and the current traffic patterns are invariant.

3.2 The OSI Model

The OSI (Open Systems Interconnection) model developed by the International Standards

Organisation provides a theoretical basis to the construction of a communication protocol.

It uses a seven layered system where each layer provides a slightly greater level of

abstraction than the underlying layer. The layers of this model are shown in Figure 3.2.

Also shown in this figure is the ATM protocol which does not fit cleanly into the OSI

model. The ATM protocol best fits into the level two data link layer, but as it provides

support for end-to-end connection, flow control and routing at the cell level, it also

incorporates some of the functionality of higher level OSI layers (Ebrahim92).

The ATM Adaptation Layer sits above the ATM layer and best coincides with level three of

the OSI model as it performs message reassembly from incoming cells. The fourth layer of

the OSI model, the transport layer, is the first layer in the model that guarantees reliable

delivery of information. Since the ATM protocol does not provide this functionality,

additional support (typically software) must be added at this level.


Page 16

Application

Presentation

Session

Transport

Network

Data Link

Physical1

2

7

6

5

4

3 AAL

ATM

Physical

Figure 3.2 The OSI Seven Layer Model and ATM

3.3 The DAN Physical Layer

The physical layer of the OSI model represents the layer at which electrical signals are

transmitted through the network and are detected as bits by the receiver. It is at this layer

that the throughput measures are quoted in network implementations. A 100 Mb/s full

duplex network was required for the UQ DAN. This was achieved using eight bit wide data

paths for each direction of data transfer. Whereas this data rate would require at least a

100 MHz serial communication link, the parallel implementation required only a 12.5 MHz

network clock frequency to achieve the same performance.

Noise immunity is a serious consideration when transmitting information at these

frequencies. Schmidt95 states that experience with high speed SCSI devices suggests that

shielded twisted-pair cable should be used for any external connections operating at greater

than 5 MHz. Twenty-five pair twist and flat cable was consequently used for the network

transmission medium to limit the cross-talk between transmitted signals. Each twisted pair

of the cable was assigned one signal line and a ground line.

A second important consideration with dealing with high frequency communications is the

transmission line effects of the cables. These effects cause signals that travel down the cable

to be reflected back if there is a discontinuity in the resistance at the cable’s termination.

Superimposition of reflected signals with the true signal can cause errors in bit detection.

To combat these effects, a standard passive terminator was added to the end of the signal

line of each pair. This terminator involves a 220Ω pull-up resistor and a 330Ω pull down


Page 17

resistor. The parallel combination of these two resistances produces a terminating

resistance of 132Ω, which is close enough to the 105Ω characteristic impedance of the

twisted pair cable to minimise reflections (Horowitz89).

With the design of the signal path completed only the drivers and receivers for the network

interfaced needed to be chosen. Having brought the network frequency back to 12.5 MHz

through the use of parallel data transmission, inexpensive 74LS series devices were suitable

for this purpose. The twisted pair cables used had a capacitance of 51.5pF/m, or

approximately 52pF for the one metre lengths between devices. The 74LS240 inverting

octal line driver was tested by the manufacturer with a load of 45pF. The rated times were

therefore approximately valid due to the similar operating conditions. The receiver was

implemented with 74LS14 hex inverting Schmitt triggers. These devices have a typical

hysteresis level of 0.8V which again improve noise immunity.

At 12.5 MHz, the network clock had a cycle time of 80ns. The line drivers had a maximum

propagation delay of 18ns measured at full load, and the receivers had a maximum

propagation delay of 22ns. Adding in the expected 8ns propagation delay through the one

metre twisted-pair cable, the maximum propagation delay of the signal between devices was

calculated at 48ns. This was still than the clock cycle time of the network but greater than

the half cycle time, which prevents the network from being clocked by a single central

clock. To avoid clock skew through the network, each port was assigned a twisted-pair

over which it would transmit its own network clock. All data coming into a receiver would

then be synchronised to the received network clock with negligible clock skew. The

network clocks transmitted by all the devices were of the same frequency but differed in

phase.

The final design of the physical layer for both device ports and switch ports is given in

Appendix A. These port designs differed for device and switch ports to prevent the

necessity of twisting the network cables that connect the transmit port of one device to the

receive port of another device. An illustration of such a connection is shown in Figure 3.3.

A parity signal was also included with each data port to provide a limited degree of error

detection. The ‘Connect’ signal was provided on each port to power an LED when a

connection is established with another port.


Page 18

8

8Rx_Data

Rx_Pari ty

Rx_Val id

Rx_Clock

Tx_Data

Tx_Pari ty

Tx_Val id

Tx_Clock

Rx_Data

Rx_Pari ty

Rx_Val id

Rx_Clock

Tx_Data

Tx_Pari ty

Tx_Val id

Tx_Clock

Connect Connect

T w isted Pair #S w itch PortDevice Port

1-8

9

10

11

13

15

16

17

18-25

Figure 3.3 Physical Layer Port Connections

3.4 The DAN Data Link Layer

The remaining signals in the 25 pair cable were assigned for use by the data link layer,

referred to as the ATMlink layer (Gregory96). The ‘Valid’ signals used by the data link

layer were asserted by a transmitting port to allow the receiver to detect incoming cells.

The transmitter holds this signal asserted for the 52 byte transmission. The receiver then

used the valid signal as an enable to latch incoming data.

The ATMlink was also responsible for providing a cell level abstraction to the higher level

components of the system. Both the inputs and outputs of each port were buffered using

receive and transmit FIFOs respectively. Placing a layer of buffering between the device

and the network allowed the devices to operate independently of the network clock. The

added buffering also enabled the ATMlink to provide the cell level abstraction by holding

the deassertion of a ‘FIFO Empty’ signal until at least one complete cell was available in the

receive FIFO. The ATMlink prevented system components from reading bytes of

incomplete cells by controlling the Enable inputs of the FIFOs.


Page 19

3.5 Operation of the UQ DAN

The interaction of devices on the Desk Area Network depended largely on the category in

which different devices fall. Using Cambridge’s nomenclature to classify devices in the UQ

DAN, the video node was regarded as a dumb node and the display was regarded as a smart

node. The video node’s purpose in the network was to satisfy requests for video data. All

the information required by the video to satisfy the request was contained in the request

itself. This included network considerations such as the VPI and VCI to which the video

data was to be sent as well as video parameters such as frame rate, resolution, colour,

contrast, etc. The display node achieved its ‘smart’ status by containing enough processing

power to assemble this information to configure the video node’s operation.

The fundamental unit of video data transferred across the network was the video line. The

video line was an arbitrary length row of pixels extracted from the scan-line of the video

signal. The number of pixels contained in this line depended upon the horizontal resolution

of the requested video stream. The UQ DAN used a variation of the AAL5 standard to

perform the segmentation of scan-lines into ATM cells at the video node, and the

reassembly from cells back to scan-lines at the display node. The video node prefixes the

transmission of a scan-line with a control cell. The control cell detailed whether an odd or

even field was about to be sent and the line number in that field. It also specified the

number of data cells that comprise that scan-line. The final data cell in the scan-line block

had its AAL bit asserted to indicate that it was the final cell. A CRC of the transmitted

scan-line was not included in this cell due to the non-critical nature of the data and the

reliability assumption of the DAN.

In summary, for video data to be sent across the network, the display node generated a

request and sent it to the switch with the VPI set to the port number of the video node.

Upon receiving the cell, the switch examined the VPI and routed the cell to the video

node’s port for transmission. The video node then received the cell and processed the

request by setting internal registers with the parameters the cell contains. The first line of

the video source was then scanned and digitised by the video node to meet the required

parameters. A control cell to specify the line number and scan-line cells was then sent to

the switch using the return VPI provided in the request cell. The switch examined this VPI,


Page 20

which was the port number of the display node, and routed the data to the display. The

display used the control cell to calculate the pixel address of the scan-lines start point and

latched the number of data cells per line. The video node then sent the digitised scan-line to

the display node one cell at a time, which then displayed the image on the screen.

Chapter 4

4. The Display Node

The design of the display node was segmented into several independent components. These

were the PCI interface, the ATM data-link layer, a cell receiver and a cell transmitter. Of

these, all but the data-link component were implemented as finite state machines. The

relationship between these components of the display node is shown in Figure 4.1.

ATMLink PCI

Interface

CellReceiver

CellTransmitter

ATMNetwork

PCIBus

Display Node

Figure 4.1 Overview of the Display Node’s Structure

4.1 Display Node Requirements

The DAN display node’s function was to accept the digitised images from the video node

and display them to the user. Being a ‘smart’ node, it was also responsible for configuring

other devices in the network. Two requirements of the display node’s design were that it

must be able to sink data at the network rate of 100Mb/s and that it must be able to display

the images without any CPU intervention.

Rather than attempting to implement a stand-alone display controller, the display node was

designed to use the monitor of a PC. The required data rate of the display node limited the

computer interface options to either a Peripheral Component Interconnect (PCI) or VESA

Local (VL) bus. Figure 4.2 provides a graphical illustration of the maximum throughputs

that can be achieved using different bus architectures. A PCI interface was chosen, as the

The Display Node

Page 22

PCI bus was a truly platform and processor independent bus. This then allowed the

creation of a display node that was capable of operating on a greater number of systems.

CPU independent operation could be achieved using this bus due to its bus mastering

capability. Bus mastering involves a device, known as the ‘bus master’, claiming ownership

of the bus and reading data from or writing data to an address that it specifies.

ISA 8bit

ISA 16bit

EISA

MCA

VL

PCI

0 20 40 60 80 100 120 140

Maximum Throughput

ISA 8bit

ISA 16bit

EISA

MCA

VL

PCI

Bu

s A

rch

itec

ture

Maximum Throughput of Different Bus Architectures

Figure 4.2 Comparison of Maximum Bus Throughputs

4.2 Implementation Technology

The display node’s implementation technology was chosen before detailed design

commenced. The MAX 7000 family of Electronically Programmable Logic Devices

(EPLDs) from Altera Corporation were found to be suitable technology. The MAX 7000

family provides EPLDs with from 600 to 5000 useable gates and 44 to 256 flip-flops on a

single chip. A consistent propagation delay equal to the speed grade of the device is

experienced through every logic element. MAX 7000 devices are available in speed grades

from 5ns to 20ns.

The choice of implementation technology became the single greatest restriction during the

design phase, as it limits the number of bits of storage available and the complexity of logic

expressions used. Appendix B provides more information on the use of the MAX 7000

EPLDs and the associated Max Plus II software.

The Display Node

Page 23

4.3 Receiver Structure

A flowchart of the cell receiving state machine is shown in Figure 4.3. This state machine

remained in its idle state, Receive_Wait, until the ATMlink layer indicated that the receive

FIFO was no longer empty. This occured when the ATMlink layer deasserted its rx_empty

signal. Care had to be taken when interfacing to the ATMlink as all of the logic elements it

contained were either asynchronous or synchronised to the network clock. To avoid timing

problems, the rx_empty signal needed to be latched to synchronise it to the display node’s

clock. The latched signal nRx_Empty_reg1, could then be used in the receive state machine

without encountering setup or hold time difficulties.

When the receive state machine left its Receive_Wait state to begin processing the incoming

cell, it must first pass through the Set_Rx_Enable state. This state was introduced to satisfy

the setup time of the FIFO’s read enable signal. The additional state was only required

prior to the first read from the FIFO as an extra byte was read at this time. The Altera

devices then always had the next byte available on their inputs, on subsequent reads.

Special attention was then required at the end of the cell to ensure that a 53rd byte would

not be read. This was achieved by deasserting the FIFOs read enable signal whenever the

count of bytes read from the FIFO equalled 52.

The receive state machine behaved differently when processing a control than when

processing a data cell. After the first four bytes of the cell, the cell header, had been read

out of the FIFO, the receive state machine entered the Decode_Header state to determine

whether the cell was a control or data cell. This was done by inspecting the control bit of

the Payload Type Identifier in the most significant byte of the header. If this bit was set, the

cell being received was a control cell and the receive state machine moved to the

Receive_Control state. Otherwise, the state machine progressed to the Reset_Cell_Num

state to process a data cell.

1 Signal names prefixed by an ‘n’ are active low signals.

The Display Node

Page 24

Receive_Wait

Set_Rx_Enable

Receive_Header

Decode_Header

Receive_Control

Set_Line_Addr

Dump_Cell

Reset_Cell_Num

Read_Word

Request_Write

Write_Data

nRx_Empty_reg

msbyte_loading

control_cell

msbyte_loading cell_num = 0

msbyte_loading

request granted

cell_loaded cell_loaded

Y

N

Y

N

N

Y

N

Y

Y

NN

Y

Y

N

Y

N

Y

N

address set

Y

N

Figure 4.3 Flowchart of Receiver's Operation

The Display Node

Page 25

4.3.1 Processing Control Cells

The Receive_Control state was used to latch the information the video node sent about the

next line it would transmit. The format of the next four bytes that contained this

information is shown in Figure 4.4. When the three bytes of pertinent information had been

read into the 32 bit receive register, internal to the Altera device, the receive state machine

extracted the required information. The number of data cells in the following scan line was

latched into the four bit cell_number register. The Cell Error, Data Error and Parity Error

bits were provided by the video node for diagnostic purposes only and were disregarded by

the display node.

Line Number8 bits

CellError

Ev/OdBit

DataError

ReservedLine #MSB

Reserved # Data Cells / Line4 bits

0x00

5

6

7

8

Byte Number

PixelError

Figure 4.4 Format of Control Cell Information

The absolute line number of the following line was also assembled at this time from the first

two bytes of control information. The Ev/Od bit in the control information determined

whether the line number referred to the even or odd field of the image. Fields occur

because the video node transmitted frames in an interlaced format. This meant that all the

even numbered scan-lines, called the even field, are sent first and then all the odd numbered

scan-lines, called the odd field, follow.

4.3.2 Pixel Address Calculations

Once the control information had been latched at the end of the Receive_Control state, the

state machine progresses to the Set_Line_Addr state. It was here that the state machine

calculated the pixel address of the start of the next line. The pixel address was broken into

three components that had to be summed together. The first of these components was the

base address of the video card in memory. To facilitate the simple operation of the

The Display Node

Page 26

receiver’s addressing scheme, the video card was programmed to implement linear

addressing. This meant that all video memory could be accessed exactly like main memory.

When linear addressing is implemented, the base address of the video card is the address of

the first pixel.

Base Address

Pixel Address

Line StartAddress Pixel Offset

Line Offset

Figure 4.5 Pixel Address Calculation

The final two components in assembling a pixel address were the offset from the base

address to the start of the current line and the offset of the pixel from the start of that scan-

line, as shown in Figure 4.5. These are the two components of the pixel address that were

set in Set_Line_Addr state. As a control cell precedes the beginning of a scan-line, the next

pixel data should be written at the beginning of the new line, and the pixel offset variable

must be reset.

Two variables were used in the setting of a line address: line_count and line_offset. The

line count variable stored the line number of the scan-line that was currently being

addressed. Setting the address of the beginning line then involved incrementing the

line_count variable and adding the length of a display line to the line_offset variable until the

line_count variable equaled the line number supplied in the control cell. When this

condition was true, the line_offset variable was guaranteed to hold the offset from the base

address to the start of the line by virtue of its synchronisation with line_count. Also, if the

line_count variable was greater than the target line number, both it and line_offset were

reset to begin the counting from zero.

The Display Node

Page 27

4.3.3 Processing Data Cells

Processing of data cells was performed differently to that of control cells. After decoding

the header to determine that the cell was indeed a data cell, the receive state machine

progressed to the Reset_Cell_Num state. This state was used to reset the cell_number

variable if the data cell received was the final data cell in the scan-line. In this case, the

AAL5 bit of the Payload Type Identifier in the cell would be set. If the data cell was not the

final cell in the scan-line, the cell_number variable was decremented instead. This variable

then effectively stored the number of data cells remaining in the current scan-line. If this

variable equalled zero when a data cell was received, the receive state machine would

discard the data cell by moving from the Decode_Header state to the Dump_Cell state.

Counting cells in this manner ensures that if the final data cell of a scan-line, and the control

cell at the beginning of the next line, are both lost through the network, then no data would

be written to the screen until the next control cell was received to adjust the counters.

The writing of data to the screen was a simplistic task from the receiver’s viewpoint. The

PCI interface wrote data to video memory four bytes at a time and performed this action

twice in every write transaction. Hence, the receiver read in a double word (four bytes) of

data into the 32 bit receive register and moved to the Request_Write state. The PCI

interface then issued a request to the system for permission to write data and remained in

this state until the request was granted. When granted, the PCI interface then addressed the

next pixel location in video memory to which the data was to be written, whilst the receiver

copies the contents of the receive register to another 32 bit register. The receiver was

therefore able to read in the next double word of data while the PCI interface was writing

the current double word. When the PCI interface had finished writing the data and the

receiver had finished loading the next double word, the PCI interface was prompted to

complete the transaction by writing the next four bytes. The receiver waited for the second

write operation to complete before returning to the Read_Word state read remaining data

or to the Receive_Wait state if the entire cell has been processed.

The Display Node

Page 28

4.4 The Transmitter’s Structure

The transmitter had a much simpler implementation than the receiver. This state machine

was only required to write the double words of data received by the PCI interface into the

transmit FIFO. The least significant byte of the double word was written to the FIFO first

to maintain a consistent byte ordering. The flow chart of the Transmit state machine is

shown in Figure 4.6. Two paths were required from the Tran_Byte3 state as it was possible

that another double word was received during this state. If this is the case, the state

machine returns to the Tran_Byte0 state to output the new double word to the transmit

FIFO. Otherwise, the state machine returns to the idle state.

Transmit_Wait

dword_received

Tran_Byte0

Tran_Byte1

Tran_Byte2

Tran_Byte3

dword_receivedY N

Y

N

Figure 4.6 The Transmit State Machine

The ATMlink layer counts the number of bytes written into the transmit FIFO by this state

machine. When fifty-two bytes had been written, the ATMlink asserted the Valid signal to

transmit the cell to the switch. The transmitter consequently assumed that the data being

received has been formatted into 52 byte cells by the system software.

Chapter 5

5. The PCI Interface

This chapter discusses bus architectures and the place of PCI in this context. It discusses

the advantages and disadvantages of PCI buses and the motivations for using PCI in

implementing the DAN's display node. A discussion of the design of an ATM - PCI

interface follows, and the chapter is concluded with remarks about the implementation of

this type of interface.

5.1 Computer Buses

The bus of a computer is an interconnection device that allows the different components of

the computer to interact. Physically, the bus is merely a series of parallel wires to which a

limited number of connectors can attach. The earliest computer buses like the Industry

Standard Architecture, or ISA, bus were completely under the control of the computer’s

CPU. The only possible form of communication on the bus was between the CPU and

another device, and it always took the form of a processor initiated read or write.

Interrupts allowed the devices residing on the bus to signal the CPU that some specific

action was required. This prevented the CPU from continually polling the devices checking

for the appropriate conditions that indicated action was necessary. However, a large

latency is still encountered in servicing the interrupt and a substantial overhead is incurred in

having to switch to the appropriate service routine to perform the function.

A more sophisticated solution to this problem is to use Direct Memory Access (DMA). A

DMA transfer involves a device reading or writing to or from a contiguous block of

memory. The transfer between the device on the bus and system memory still must be

initialised in software. Once initialised, though, the device can request additional reads and

writes at any time, until the transfer has completed. In PC systems, up to half of the

The PCI Interface

Page 30

processor cycles can be allocated to performing the DMA transfer which dramatically

increases the throughput that can be achieved by the device. Again this approach generally

requires the use of interrupts and software such as device drivers to service the interrupts

and initialise each DMA transfer.

Bus architectures that support bus mastering avoid these problems by allowing the devices

attached to the bus to initiate data transfers. All devices attached to these buses fall into

two categories: masters and slaves, also known as initiators and targets respectively. An

arbitration scheme must be implemented to support multiple bus masters. When a bus

master wishes to use the bus, it issues a request to the bus arbitrator. Several bus masters

can issue requests simultaneously which then forces the bus arbiter to implement some form

of scheduling, usually on a priority basis, to avoid bus contention. After applying the

scheduling algorithm, the bus arbiter grants the request of one of the bus masters to allow

that device to use the bus. When that bus master has completed the transaction, the bus

becomes idle and the arbiter is able to process the next set of requests and grant the use of

the bus to another device.

As control of this resource in now no longer handled by the operating system, transactions

on the bus can be conducted without any CPU intervention. Any bus master is therefore

able to communicate with any slave without having to go through the processor. A bus

mastering implementation is a more expensive solution due to the extra hardware required

to perform arbitration. It does, however, promote a much more efficient communication in

that the latency for a device to read or write data is reduced, as is the frequency that the

processor must be interrupted or switched to another task. These were the reasons that a

bus mastering architecture was chosen for the display node. The high bandwidth

requirement of the video data could not be met by a 16 bit ISA bus. Also the delay

sensitive nature cannot be easily satisfied in an implementation that requires software

support.

The PCI Interface

Page 31

5.2 The PCI Bus

The PCI bus is one such architecture that provides bus mastering capability. It was

developed by Intel corporation to extend the advantages offered by the VESA Local bus to

devices other than video graphics adaptors. Multiplexed address and data lines allow PCI

devices to maintain a relatively low pin count, with 49 pins required for a bus master and 47

pins required for the slave. The additional two pins required by the master are the request

and grant lines.

Every transaction on the PCI bus is a ‘burst transfer’. A burst transfer consists of an

address phase followed by one or more data phases. The PCI specification (PCISIG95)

allows one data phase to be completed on each bus cycle during a burst transfer. On a

33 MHz implementation this produces a peak throughput of 132 Mbytes per second. The

PCI specification also details 32 or 64 bit PCI implementations in frequencies ranging up to

66 Hz which enable an absolute maximum transfer rate of 528 Mbytes/sec. The

12.5 Mbytes/sec requirement of the display node can be easily satisfied on any of these PCI

systems.

CPU PCIAdaptor

PCI BUS

ISABridge

ISA Bus

DisplayNode

ATMNetwork

VideoController

CRT

Memory

Figure 5.1 Configuration of Devices on the PCI Bus

The display node’s PCI interface implemented both a master and a target state machine and

therefore implemented all 49 of the required signals. The master state machine was required

to write the video information received from the ATM network to the video controller and

The PCI Interface

Page 32

the target state machine was required to accept data written to the display node from the

CPU. The video controller illustrated in Figure 5.1 is a PCI slave device. Consequently,

the display node’s PCI interface was able to address this slave device and write the pixel

data directly to the video controller’s memory independently of the CPU. Figure 5.1 also

illustrates that the CPU was only connected to the PCI bus through a specialised PCI

adaptor. This device was connected to the CPU’s secondary cache to control any memory

references made by the CPU to either the memory subsystem or to a device on the PCI bus.

Very few signals were required to implement the PCI protocol. The six ‘interface control’

signals shown in Figure 5.2 provide all the information necessary to control the outcome of

a bus transaction. Signals in this figure that have their names prefixed with an ‘n’ character

are active low signals. To explain the use of the interface signals an example of a write

transaction on the PCI bus is given in Figure 5.3. It is assumed that in the cycles before

those illustrated in Figure 5.3, the master requested the bus arbitrator for the use of the bus

Figure 5.2 The Required and Optional PCI Signals

The PCI Interface

Page 33

and was just granted permission.

The transaction begins in clock cycle one with the master asserting the nFRAME signal and

driving the address of the target it wishes to write to onto the bus. On the rising edge of

clock two, all target devices on the bus sample the nFRAME signal asserted and realise that

a valid address is on the bus. They latch this address and compare it to their own base

address stored in an internal register. Also in the second clock cycle, the master drives the

data it wishes to write onto the AD bus and the byte enables for the data onto the nCBE

bus. The byte enable signals indicate which bytes of data on the 32 bit AD bus contain valid

data. The master asserts the initiator ready (nIRDY) signal to indicate that that valid data is

present and that it is ready to conclude the first data phase of the transaction. The master

leaves the nFRAME signal asserted, indicating that this is not the final data phase in the

transaction.

CLK

nFRAME

nIRDY

nTRDY

nDEVSEL

1 2 3 4 5 6

Address/Data

nCBE

Address

Command

Data 1

Byte Enables 1

undefined

undefined

Data 2

BE 2

undefined

undefined

Transaction Begins

Data Phase Completes

Figure 5.3 Timing of Interface Signals in a Write Transaction

In this example, the target in question realised that it was the target of the transaction by the

end of the second clock cycle. It consequently asserted its device selected (nDEVSEL)

signal on the rising edge of clock cycle three to inform the master that it has decoded the

address. It also asserted the target ready (nTRDY) signal to indicate that it is also ready to

take part in the first data phase. The data phase therefore completes on the rising edge of

clock cycle four as both the master and the target are ready, nIRDY and nTRDY both

asserted. At this point the target latched the data and byte enables, and both devices

The PCI Interface

Page 34

deasserted their respective ready signals as they prepared for the next data phase. A device

typically deasserts its ready signal if it needs to insert a wait cycle into the transaction. This

could be to allow the master to fetch more data or to allow the target to respond to a full

buffer condition. If neither device in this example deasserted its ready signal, the second

data phase would be completed at the end of clock cycle four.

By the rising edge of the fifth clock cycle both devices were again ready to take part in

another data phase and so asserted their ready signals. The master drove the data it wished

to write onto the AD bus and the byte enables for the data onto the nCBE bus. The master

also deasserted the nFRAME signal at this time to indicate that this will be the final data

phase of the transaction. At the rising of clock cycle six, both devices sample both the

nIRDY and nTRDY signals asserted and realise that the second data phase has completed.

The target latches the data that the master was driving and ends the transaction by

deasserting the nDEVSEL signal when it deasserts its ready signal.

5.3 The UQ DAN’s PCI Interface

An efficient PCI interface is one that takes advantage of burst transactions to minimise the

overhead associated with arbitration latency, the address phase and the address decoding of

the target device. This was made difficult due to the different data rates of the ATM

network and the PCI bus. A data phase on the PCI bus requires four bytes to be written at

a time. Fewer bytes can be written by invalidating the unwanted bytes via the nCBE bus but

again this is an inefficient use of the bus bandwidth.

As there was only a byte wide data path from the Altera devices to the receive FIFO, and

this FIFO was clocked using the PCI clock, data can only be sourced from the FIFO at one

quarter the rate that the PCI bus can sink the data. This made burst transfers difficult to

achieve as 32 bit registers were too expensive in terms of the available resources of the

EPLDs to implement and a certain degree of buffering was required to overcome the

differences in data rates.

The display node designed was capable of burst transactions with two data phases. This

was achieved by using the delay encountered in the address phase, address decoding and

The PCI Interface

Page 35

first data phase of the transaction to load the second double word. The first double word

was moved to the transmit register, Tx_reg, while the second double word was loaded.

This could not be done until the master state machine had taken control of the PCI bus as a

processor write to the display node could occur in the meantime, overwriting the contents

of the transmit register. The datapath of the display node, shown in Figure 5.4, illustrates

the manner in which the data loaded from the receive FIFO can be moved around the

internal storage elements to allow a burst transfer.

PCI Bus

A_D[31..0] CBE[3..0] PAR

A_D_oe C_BE_oe PAR_oe

A_D_Reg[31..0] AD[31..28]

Rx_reg[31..0]

Receive FIFO

Rx_Data[7..0]

Tx_reg[31..0]

Transmit FIFO

Tx_Data[7..0]

ATM Network

C_BE_reg PAR_regC_BE

Altera Device

pixel_addr

Figure 5.4 The Display Node's Datapath

The amount of time spent in prelude to the first data phase depended largely on the address

decoding speed of the target. Assuming that the video controller required one clock cycle

The PCI Interface

Page 36

to decode its address phase, this approach would mean that the fourth byte of the new

double word would be loading when the first data phases completes. A single wait cycle

was required in this case to load the new data into the A_D_reg register so that it could be

placed on the bus. A mandatory wait cycle was therefore inserted between the master state

machine’s data phases to reduce the conditional logic required. The master was then

prepared to finish the second data phase on the next cycle. The display node is

consequently able to write two double words in three cycles giving a peak transfer rate of

88 Mbytes/sec when using a 33 MHz PCI clock. The design files for the PCI interface are

provided in Appendix E and a detailed state transition diagram of the master state machine

is provided in Appendix C.

5.3.1 Address Decoding

Every PCI device must have a programmable base address register to enable dynamic

relocation of that device’s address space. This is required in accordance with the ‘Plug and

Play’ goals of the PCI bus. When a valid address is placed on the bus, the device compares

that address to the contents of its base address register to determine if it is the target of the

PCI transaction. The specification allows devices up to three clock cycles to perform this

function and uses the time taken to classify devices as being either fast, medium or slow

address decoders. The timing of asserting the nDEVSEL signal for different classes of

devices is shown in Figure 5.5. If the transaction is not claimed within the three cycle

period, it is claimed by the subtractive decoder which assumes that the address is for the

ISA bus.

CLK

nFRAME

nIRDY

nTRDY

nDEVSEL Fast Medium Slow Subno responseacknowledge

1 2 3 4 5 6 7

Figure 5.5 Timing of Address Decoding

The display node joined the majority of commercial PCI products, including the video

controller interfaced by the DAN, as being classified as a medium speed decoder. This class

The PCI Interface

Page 37

of devices latch the address at the beginning of clock cycle two, in Figure 5.5, when

nFRAME was sampled asserted and were ready to assert the nDEVSEL signal by the rising

edge of clock cycle three. The number of bits that the device must decode depends on the

size of the address space it implements. That is, if the device required a 2 Gbyte address

space, it would only have to decode the upper one bit in the 32 bit address. Whereas if a

one byte address space were required, the device would be forced to decode all 32 address

bits.

To minimise the number of storage bits used by the base address register, and the logic

required to perform address comparisons, the display node was assigned an unnecessarily

large address space. The 256 Mbyte address space allocated to the display node required

only four bits of storage in the base address register and a second four bit register,

AD[31..28], to latch the upper lines of the address bus for the comparison.

5.3.2 Configuration Registers

Systems that incorporate a PCI bus have three address spaces. As well as the memory and

I/O address spaces implemented by the majority of workstations, the PCI bus additionally

implements a configuration space. The configuration space contains 64 double words of

data to configure devices. The first 16 of these double words are PCI specific, and are

shown in Figure 5.6, while the latter 48 are for device specific purposes. Of the 16 double

word PCI configuration header, all devices on the bus must implement at least the

mandatory configuration registers shown. These registers allow devices present on the bus

to be configured and enabled or disabled as required. It is necessary that all devices on the

bus respond to configuration accesses at all times in order for the BIOS to determine system

requirements and configure devices during startup. The system motherboard includes a

unique line, the IDSEL signal, to each device on the bus for the purpose of addressing that

device’s configuration space. A physical communication path is required as devices have no

address space assigned that they can decode during a system reset. For a detailed

description of the PCI configuration registers and configuration transactions, refer to

Shanley95.

The PCI Interface

Page 38

Byte3 2 1 0

00

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

Double WordNumber

Device ID Vendor ID

Status Register Command Register

Class CodeRevision

IDHeaderType

LatencyTimer

CacheLine SizeBIST

Base Address 0

Base Address 1

Base Address 2

Base Address 3

Base Address 4

Base Address 5

CardBus CIS Pointer

Subsytem ID Subsystem Vendor ID

Expansion ROM Base Address

Reserved

Reserved

Max_Lat Min_GntInterrupt

PinInterrupt

Line

Required Configuration Register

Figure 5.6 Defined PCI Configuration Registers

The majority of the mandatory configuration registers shown in Figure 5.6 were able to be

hardcoded in the display node’s implementation. Hardcoded values were stored in the

EPLDs in the connecting logic between registers and therefore do not consume any bits of

storage in the devices. Only the Status and Command Registers required programmable

bits. Of the hardcoded register values, he Vendor ID and Device ID registers were the only

registers to have a non-zero value. Assigning a zero value to the Class Code register

The PCI Interface

Page 39

implied that the device was developed before class codes were introduced. This was done

as the logic required to implement handling of configuration accesses could be greatly

reduced by handling as many of the registers as possible in an identical fashion, by returning

a value of zero.

The number of bits of the command and status registers implemented was also minimised to

reduce the amount of resources consumed on the Altera devices. Only two bits of the

command register and five bits of the status register were implemented. These were:

Command Register

• Bit 1 - ‘Memory Access Enable’ When set, this bit indicates that the device is to decode

memory addresses on the bus to respond to memory accesses.

• Bit 2 - ‘Master Enable’ This bit must be set for a device to act as a bus master.

Status Register

• Bits 9 & 10 - ‘Device Select Timing’ These bits encode the slowest timing of the

assertion of the nDEVSEL signal. This field is hardcoded to 01b to indicate that the

display node is a medium speed decoder.

• Bit 12 - ‘Received Target Abort’ This bit is set by the bus master whenever a bus

transaction is terminated by the target before it reaches successful completion.

• Bit 13 - ‘Received Master Abort’ This bit is set by the bus master whenever it is forced

to terminate a bus transaction because no target decoded the address presented in the bus

master’s address phase.

• Bit 15 - ‘Detected Parity Error’ This bit should be set by a device whenever it detects a

parity error on the bus.

A final simplification made to reduce implementation complexity was to avoid checking

parity. The PAR, signal on the PCI bus is used to generate even parity over the

address/data and byte enable buses. All PCI devices must generate parity when data is

driven onto the bus, but not all devices need check the parity when receiving data. The

display node was consequently designed only to produce parity for master writes and reads

from the configuration space. Parity errors are not checked on data written to the target.

The PCI Interface

Page 40

This design choice benefits the display node’s implementation as parity checking of received

data would require either duplication of the logic intensive parity module, or at least much

more complex inputs to the module. The latter option may not be suitable if the logic is so

complex that it must be broken into several layers. In that case, the time to produce a valid

parity signal may not meet the timing requirements of the PCI specifications, again making a

second parity module necessary.

5.3.3 Software Support

One of the primary goals of the PCI specification was to support a ‘Plug and Play’ (PnP)

ideology. A PnP system allows cards present on the bus to be automatically detected and

configured at system startup by the system BIOS. There are currently two approaches that

a PnP BIOS can take (Shanley95b). The first is to determine the requirements of all devices

on the bus and use this information to configure devices present with appropriate address

ranges and interrupt pins etc to ensure conflict free operation. The second approach

preferred by newer BIOS implementations is to configure only those devices required to

bootstrap the operating system, and to then let the operating system configure the remaining

devices. In the latter approach, the BIOS still examines every device on the bus but writes

all the information into a data structure known as the Extended System Configuration Data

(ESCD) structure. A PnP operating system, like Windows 95™ will then examine this

structure and use the information to configure devices itself. This has the advantage that

resources can be allocated to devices in a consistent manner each time the system is booted,

and that the configuration is free of the ‘bugs’ that are currently plaguing many PnP BIOSs.

As DOS is not a PnP operating system, code had to be written to access the configuration

registers of the devices on the PCI bus. Configuration accesses can be achieved either

through one of the configuration mechanisms defined in the PCI specification, or through

use of the PCI BIOS (PCISIG94). The code written to perform either of these access

methods is provided on diskette in Appendix E . The configuration routines were used to

examine the requirements of every device, and so no use was made of the ESCD when

configuring the display node. The device was merely assigned an unused region of the

system’s memory space and then had its Master Enable and Memory Enable bits in the

Command Register set. Additional code was produced to check the device’s adherence to

The PCI Interface

Page 41

the PCI specification in implementing the configuration header. This code is presented on

the digital medium of Appendix E .

Chapter Six

6. Results

This chapter describes the final implementation of the DAN’s display node and the degree

of performance that was achieved by individual components and the DAN as a whole.

Where appropriate, possible areas of improvement of the display node’s design are noted.

6.1 Display Node Implementation

The final implementation of the display node was achieved using two Altera EPLDs and the

network interface devices of Appendix A on a two layer PCB board. The Altera devices

had to be chosen carefully to ensure that they were able to fit the hardware description

language design file and that they met the PCI timing and electrical specifications. Two

Altera Max 7000 EPM7192SQC160-10 devices were chosen. These devices are 160 pin

plastic quad flat pack packages, that each have 192 bits of internal storage and 10ns gate

delays.

Although the individual PCI devices were PCI compliant in their timing and electrical

characteristics, the display node could not be made to meet the electrical specifications with

these devices. The specification stated that at most one load can be placed on any line of

the PCI bus. This constraint could not be met as the design had to be distributed over the

two devices due to its size, and some of the PCI bus signals were required by both devices.

The resultant double loading of bus signals had little effect on the testing environment as the

PCI bus was only loaded with the display node and the video card. If all card slots on the

bus were in use, the additional loading could cause the system’s integrity to be violated.

This problem could only have been overcome by using the larger ‘Flash’ series of Altera

devices, or the Field Programmable Gate Array (FPGA) type devices that, due to their

density, are more likely to fit the entire design onto a single device.

Results

Page 43

6.1.1 Board Implementation

The input and output pins of the Altera devices change each time the project is recompiled

after an alteration. Wire-wrapping was consequently chosen to implement pin connections.

This method allowed changes to be made relatively easily and permitted construction on a

two-layer Printed Circuit Board (PCB) of a design that would otherwise require a multi-

layer board. As the PCB was relatively free of tracks because of the wire-wrapped

connections, large power and ground planes were also able to be added to the board. These

planes help to reduce noise on the board that accompanies any high speed digital circuit.

Figure 6.1 shows the display node and Figure 6.2 illustrates the wire-wrapped

implementation of this PCI card.

Figure 6.1 The UQ DAN’s Display Node

Figure 6.2 The Wire-Wrapping of Pin Connections

The PCI signals originating from the connector were an exception to the wire-wrap

approach and were joined to the circuit by soldering one end of a wire-wrap wire to the bus

Results

Page 44

connector and wrapping the other end to the destination pin. This approach was chosen

due to the density of the PCI signals in the grid of pads formed. In retrospect, this may not

have been necessary but fewer problems were experienced in this region than on the

connection of wire-wrap pins to signal pads.

6.2 The Physical Layer’s Performance

The physical layer implemented to accomplish the network communication performed

beyond all expectations. No problems were experienced in running the network at its rated

frequency of 12.5MHz. In an attempt to determine the physical layer’s fail point, the

network clock frequency was increased to 25MHz, which was the highest frequency that

could be obtained using available clock generators. Data transmissions occurred at this

network frequency with no occurrences of bit errors. This implies that the physical layer

could be implemented for a Desk Area Network rated at 200Mb/s without modification.

6.3 PCI Interface Performance

The PCI interface of the display node performed as intended. The operation of the target

state machine experienced no difficulties operating at any speed up to the maximum of

33MHz. The master state machine worked without fault up to 25MHz but became

unreliable at 33MHz. When operating at this speed, the master worked but would ‘lock up’

after random periods of operation. The cause of this problem was not isolated but appeared

to be the result of a setup time violation that arose due to the shorter clock periods.

Measurements of the display node’s performance were made difficult to obtain due to

excessive loading experienced when logic analyser probes were placed on the pins of the

Altera devices. The Altera devices were not able to drive both the wire-wrapped

connection to an input pin as well as the additional load of the probes. Erratic behaviour of

the circuit was occasionally witnessed when measurement devices were added.

Results

Page 45

6.3.1 Average Grant Latency

Timing analysis of the PCI interface’s write cycles were investigated over many

transactions. In the transactions witnessed, the smallest latency from the assertion of the

request signal to being granted use of the bus was two clock cycles. The largest latency, in

this same period, was 32 clock cycles. Larger delays in the assertion of the grant signal can

usually be attributed to the bus being used by another device. The average delay

experienced by the PCI interface waiting for nGNT to be asserted was five clock cycles at

25 MHz, or approximately 200 ns.

6.3.2 Response Time of Video Card

When writing data to the Cirrus Logic GD54N30 video card through the PCI interface,

measurements of the timing revealed that the video card itself produced a major bottleneck

in the transmission of information. The write cycle measured using a logic analyser is

reproduced in Figure 6.3 for a bus clock frequency of 25MHz. When compared to the

timing diagram of Figure 5.3 that closely portrays the display node’s abilities, it is obvious

that much larger delays are being experienced in actual operation.

The video card inserts approximately eight wait states into each data phase, or

approximately 320ns. This wait state delay measured was relatively consistent for different

bus cycle rates. The transaction in Figure 5.3 that theoretically should take six bus cycles,

took 24 bus cycles. The entire transaction depicted in Figure 6.3 took 960ns to complete.

Preliminary software tests of the video card’s linear addressing were conducted by

continuously colouring a 640 x 480 pixel square in a tight software loop. The maximum

frame rate that was obtained during these tests was 21.35 frames per second. This relatively

lethargic frame rate was originally attributed to software inadequacies. However, this frame

rate corresponds to an average of 610ns between double word transfers. As the CPU-PCI

bridge used in the test computer was only capable of single data phase transactions, the

calculated inter-transaction delay is more likely to be the result of the delays inserted by the

video card. Hence, even with the slightly better performance offered by the display node’s

burst transactions, it does not appear that the goal of a 640 x 480 pixel video frame,

refreshed 25 times a second can be achieved with the video card used.

Results

Page 46

6.4 Receiver Performance

The level of performance that could be extracted from the ATM cell receiver was

diminished due to the video card imposed delays. To support the 100Mb/s data rate of the

network links, the receiver had to be able to process cells in an average time of 4.5µs. The

average time for the receiver to process a cell was measured to be 7µs. Control cells were

an exception and were able to be processed in less than 3µs, but these cells occur much less

CLK

nFRAME

nIRDY

nTRDY

nDEVSEL

1 2 3 4 5 6

nREQ

nGNT

CLK

nFRAME

nIRDY

nTRDY

nDEVSEL

13 14 15 16 17 18

nREQ

nGNT

7 8 9 10 11

19 20 21 22 23 24

12

Wait Cycles

0 ns 200 ns 400 ns

600 ns 800 ns 960 ns

Figure 6.3 Measured Response of the Video Card

Results

Page 47

frequently than the data cells that were processed very slowly. Figure 6.3 implies that

approximately 1µs of processing time was spent on each PCI transaction. As six of these

transactions were required for each data cell, a lower limit was imposed on the receiver’s

cell processing time that was greater than the time required to meet the specifications.

One change that could be made to the receiver to slightly decrease this processing time

would be to move the second double word from the receive register (Rx_reg) to the

transmit register (Tx_reg) for the second data phase as well as the first. The additional data

movement would cause one additional clock cycle delay before the assertion of nIRDY for

the second data phase, but would allow the first double word of the next transaction to be

loaded while waiting for the second data phase to complete. This approach of further

overlapping data read and write times can reduce the cell processing time by decreasing the

delay between one transaction ending and the request for the next transaction. Using the

transmit register for the source of pixel data for both data phases has the secondary

advantage of reducing internal connections in the Altera devices, and may improve the

compiler’s ability to fit the project.

6.4.1 Maximum Achievable Resolution

Due to the properties of the raster video signal processed by the camera node, a delay of

64µs is experienced between the start of video line transmissions. Using this information

with the knowledge of the average cell processing time, the maximum horizontal resolution

that can be supported by the display node can be calculated. Assuming an average of 7µs to

process a cell, nine complete cells can be processed in the 64µs inter-line time period.

However, the first of these cells is a control cell that only requires 3µs to process. This

leaves 5µs between the end of the ninth cell and the beginning of the next line. Another five

transactions from an incomplete cell could conceivable be processed in this time, but this

feature is not implemented on the display node.

With a maximum of eight data cells being sent per line, a maximum horizontal resolution of

384 pixels can be achieved before cells begin to accumulate. Cell accumulation in the

Receive FIFO will eventually lead to cell loss and picture quality degradation when the

Receive FIFO becomes full. If the eight wait cycles per data phase were removed from the

Results

Page 48

write transactions, the Receiver would be able to process data cells in approximately 3µs

and achieve the full 640 x 480 pixel image at 25 frames per second.

The display node was shown to work at a 640 x 480 pixel resolution by attaching a loop

back network cable to the display node. Any data transmitted by the display node is then

looped back to the display node’s receive port and is processed. Test software was then

written to utilise the loop-back configuration to send the display node an image with larger

inter-cell delays. This test procedure verified the ability of the display node to handle

images of all sizes.

6.5 UQ DAN’s Performance

The UQ DAN’s performance at the completion of the project was below the group’s initial

expectations but encouraging as a proof of concept for the Desk Area Network

architecture. Problems in implementing the ATM switch (Aravinthan96) could not be

resolved, but network performance could still be ascertained from the point to point

connection of the camera node and display node. The best full motion video image that was

able to be produced by the UQ DAN was at a resolution of 48 x 480 pixels and 25 frames

per second, which equates to a data rate of 576 kbytes/sec. This data rate was far below

the UQ DAN’s specifications but was comparable to, if not better than, that which can be

achieved on Ethernet networks or ISA buses. The horizontal resolution was limited to that

of one data cell’s payload due to problems experienced with cell delineation.

Cell delineation occurs when additional bytes are present in the network FIFOs. A one byte

offset in the header information of a cell causes the receiver to completely misconstrue that

cell’s contents and leads to data cells being treated as control cells and vice versa. When a

data cell is mistakenly recognised as a control cell, line and cell numbers are set using the

random bit patterns that occur in bytes five to eight of that cell. The display node’s output

is then an incoherent mess of randomly placed control and data cells.

In the UQ DAN, these spurious bytes were occurring somewhere between the camera

node’s cell transmitter and the display node’s cell receiver. The exact cause of the extra

bytes that lead to the occurrence of cell delineation was never precisely isolated. Closer

Results

Page 49

examination found that additional bytes were occurring in the video node’s Transmit FIFO.

However, the problem only ever occurred when more than one data cell was sent per line.

A 48 pixel, or one data cell payload, wide image was consequently the largest size video

image that could be obtained in native mode.

Figure 6.4 The UQ DAN’s Operation on Demonstration Day, 1996

The UQ DAN’s operating system implemented a ‘patchwork’ display mode to overcome

this problem. This involved requesting 48 pixel wide strips of the image one frame at a

time. Each request for a strip of data was then offset by 48 pixels so that the sequence of

strips arriving at the display node would form a complete image when arranged correctly.

The operating system then copied the received strip from the left hand edge of the screen

where it was placed by the display node, to its correct place on the screen. It is this method

that was used to produce the image seen in Figure 6.4 used to demonstrate the UQ DAN on

the Department of Electrical Engineering’s undergraduate thesis demonstration day.

A screen capture of the UQ DAN’s operating system interface using this display mode is

given in Figure 6.5. This figure clearly shows the narrow 48 pixel strip to the very left of

the image that belongs towards image’s centre. This working strip was copied to the

correct position on the screen before the next strip was requested. The working strip was

Results

Page 50

then overwritten with the new request by the display node, and the software copied this new

strip to its respective position on the screen. A frame rate of approximately 3 frames per

second was achieved using this display mode at a resolution of 480 x 340 pixels. This

provided acceptable quality video for slow moving images and was comparable to many

existing video conferencing technologies.

Figure 6.5 UQ DAN's Interface in Patchwork Mode

Working Strip Patchwork Image

Chapter Seven

7. Conclusions

Multimedia applications and their associated multimedia data streams are becoming a

commonplace addition to the everyday computing environment. This thesis described the

implementation of a novel computer architecture that was designed to handle these

multimedia data streams with a much greater efficiency. The Desktop Area Network

implements a multimedia workstation by replacing the workstation’s bus with a high speed

ATM network, to remove the I/O bottleneck that plagues the majority of existing systems.

The DAN developed at the University of Queensland demonstrated this philosophy through

the implementation of a working system. The system developed consisted of two

multimedia devices and an ATM switch to form the 100Mb/s network. The first device, the

camera node, sourced video data which was received by the display node and presented to

the user.

This thesis detailed the implementation of the UQ DAN’s display node. This node was

responsible for receiving the ATM cells from the camera node that contain the pixel

information, and extracting this information to enable the raw image data to be displayed to

the users. The pixel data received by the display node was presented on the display of a

standard PC by writing the information directly to that computers video memory. The CPU

independence achieved through this approach was in keeping with the basic principle of the

DAN.

A PCI interface to a computer was designed and implemented to provide the high

bandwidth, CPU independent interface required by the display node. This interface allowed

the display node to directly access the video memory of the computer’s video card via the

bus mastering ability of the PCI bus. The display node was capable of displaying 640 x 480

pixel, 256 colour images at 25 frames per second.

Conclusions

Page 52

The display node implemented was proved to be functionally correct but was constrained in

the upper level of operation it could provide. This constraint was enforced by the limited

ability of the video card to accept data. It was shown that the video card used was not

capable of accepting the full-size video images at a rate of 25 frames per second. This

constraint was never imposed during operation of the UQ DAN, however, as the image size

that could be achieved was limited by problems in the network. These problems prevented

images wider than 48 pixel being displayed in true motion.

Future Work

With a fully operational camera node and display node, the implementation of a complete

Desk Area Network is very feasible. The goal of producing a complete DAN would require

the addition of an audio node to complement the existing devices. A signal processing node

could also be added to the system to enable the video or audio signals that the DAN

receives to be ‘massaged’. Video conferencing could then be achieved by duplicating the

DAN and networking the two systems together using an ATM local or wide area network.

This configuration would allow the benefits of the Desk Area Network’s distributed nature

to be fully investigated.

Many enhancements can also be made to the DAN display node. One such enhancement

would be to enable the display node to handle multiple video streams. This feature would

be especially pertinent in group video conferencing situations where more than two parties

are required to communicate concurrently. The addition of multiple data streams requires

that support for multiple windows also be added. This addition will increase the processing

power required by the display node considerably, especially for overlapping windows.

8. References

Aravinthan96 T. Aravinthan, “100Mbps ATM DAN Switch”, UQ Undergraduate Thesis,

December 1996

ATM95 ATM Forum, “ATM User Network Interface Specification v3.1”,

http://www.atmforum.com, January 1995

Barham95 P.Barham, M. Hayter, D. McAuley, and I. Pratt, “Devices on the Desk Area

Network”, IEEE Journal on Selected Areas in Communications, Vol. 13

No. 4 May 1995, pp722-732

Blair93 G. Blair, A. Campbell, G. Coulson, F. Garcia, D. Hutchinson, A. Scott and

D. Shepherd, “A Network Interface Unit to Support Continuous Media”,

IEEE Journal on Selected Areas in Communications, Vol. 11 No. 2

February 1993, pp264-275

Ebrahim92 Z. Ebrahim, “A Brief Tutorial on ATM”, March 1992,

www-ipg.umds.ac.uk/~dlgh/teaching/atm-tutorial.html

Greaves94 D. Greaves, D. McAuley, L. French, and E. Hyden, “Protocol and Interface

for ATM LANs”, March 1994, The Blue Book,

http://www.cl.cam.ac.uk/Research/SRG/bluebook/11/protocol_and_interface

/protocol_and_interface.html

Gregory96 D. Gregory, “Digital Video over an ATM Desk Area Network”, UQ

Undergraduate Thesis, December 1996

Hopper90 A. Hopper, “Pandora - An Experimental System for Multimedia

Applications”, Operating Systems Review, Vol 24 No.2 April 1990

Horowitz89 P. Horowitz, and W. Hill, “The Art of Electronics”, 2nd Edition, Cambridge

University Press, 1989

Houth95 H. Houth, J. Adam, M. Ismert, C. Lindblad, and D. Tennenhouse, “The

VuNet Desk Area Network: Architecture, Implementation, and

Experience”, IEEE Journal on Selected Areas in Communications, Vol. 13

No. 4 May 1995, pp710-721

PCISIG95 The PCI Special Interest Group, “PCI Local Bus Specification”, Revision

2.1, June 1995

References

Page 54

PCISIG94 The PCI Special Interest Group, “PCI BIOS Specification”, Revision 2.1,

August 1994

Schmidt95 F. Schmidt, “The SCSI Bus and IDE Interface”, Addison Wesley, 1995

Shanley95 T. Shanley, and D. Anderson, “PCI System Architecture”, Third Edition,

Mindshare Inc., 1995

Shanley95b T. Shanley, “Plug and Play System Architecture”, Mindshare Inc., 1995

9. Bibliography

Altera Data Book, 1996

B.Britton, and E. Cook, “Design an FPGA-Based PCI Bus Interface”, Electronic Design,

Vol. 43 Iss. 5, pp100-105

B. Davie, “The Architecture and Implementation of a High-Speed Host Interface”, IEEE

Journal on Selected Areas in Communications, Vol. 11 No. 2,

February 1993, pp228-241

C. Geber, “Peripheral Component Interconnect (PCI) Interface with the Quicklogic

QL16x24B FPGA”, WESCON/94 Idea/Microelectronics Conference

Record, pp568-573

D. Gordon, “The Clock Generation Board and BackPlane V2.0”,

ftp://cell-relay.indiana.ed/pub/cell-relay/docs/ftp.cl.cam.ac.uk/93-2

I. Leslie, and D. McAuley, “EISA ATM Interface Card”, The Green Book,

http://www.cl.cam.ac.uk/Research/SRG/GreenBook

A. Light, “Design a PCMCIA Add-In Card for the PCI Bus”, Electronic Design, Vol. 42

Iss. 24, pp140-146

T. Moors, and A. Cantoni, “ATM Receiver Implementation Issues”, IEEE Journal on

Selected Areas in Communications, Vol. 11 No. 2, February 1993,

p254-263

Motorola, “Fast and LS TTL Data”, Rev 5

I. Pratt, “The DAN Frame Store”, The Green Book,

http://www.cl.cam.ac.uk/Research/SRG/GreenBook

K. Ramakrishnan, “Performance Considerations in Designing Network Interfaces”, IEEE

Journal on Selected Areas in Communications, Vol. 11 No. 2, February

1993, pp203-219

W. Stevens, “Unix Network Programming”, Prentice Hall Software Series, 1990

10. Appendix A

UQ DAN Device and Switch Port Schematics

This appendix provides the circuit schematics of the physical layer ports for UQ DAN

switch and devices. These schematics were drafted using Protel Advance Schematic3 for

Windows™ and are provided on the diskette in Appendix E .

Appendix A

Page 57

Appendix A

Page 58

11. Appendix B

Altera MAX 7000 Development Environment

The MAX 7000 devices used in the construction of the display node were programmed

using the proprietary software package, MAX Plus II, developed by Altera. A design file

can be entered into MAX Plus II in a variety of formats including: hardware description

language, graphical depiction of circuitry or input-output waveform specification. The

majority of the display node’s design was written in Altera Hardware Description Language

(AHDL). This programming interface is similar to conventional programming and offers a

great amount of flexibility.

Once the design was complete, the MAX Plus II software was used to compile the project

to minimise logic functions and find the optimal manner in which to fit the design into the

specified MAX 7000 device. The software also included a comprehensive simulation

facility in which circuit operation of the design could be tested before actual

implementation. The simulator was found to be very accurate in its timing analysis

compared to that of the implemented circuit and was an extremely useful tool. When the

design had been simulated completely, the MAX Plus II software was used to program the

devices with the design.

The MAX Plus II compiler also provides a number of options to determine the logic that is

synthesised. These options had to be tweaked to fit the design files of the display node

provided in Appendix E . The ‘Register Packing’ and ‘Automatic I/O Cell Registers’

options had to turned on for a good fit. The most dramatic benefits, however, were

received by using the ‘One Hot State Machine Encoding’ option. This option assigns a

unique bit to each state in every state machine. The availability of unique bits greatly

reduces the size of logic expressions, and consequently reduces the number of Shareable

Expanders required by the design. It was found, though, that once the state assignments

had been produced, they should be placed directly into the project’s design file if the pin or

logic cell assignments are to be kept.

12. Appendix C

State Transition Diagram for PCI Master

Bus_Busy_M Idle_M

Dr_Bus

M_addr

M_addr2

M_data

M_data2

Backoff_M

Turn_ar_M

nFRAME & nREQ & !nGNT

nFRAME & nGNT

nFRAME & !nREQ & !nGNT

!nFRAME

nFRAME & nIRDY

!(nFRAME & nIRDY)

!nGNT & nREQ

!nGNT & !nREQ

= nSTOP & ((nDEVSEL & !Time_out) # (!nDEVSEL & nTRDY & !Data_Phase_Timeout)

!nFRAME

nFRAME

nTRDY & Data_Phase_Timeout

= nSTOP & !nDEVSEL & (nTRDY # nIRDY) & !Data_Phase_Timeout= (!nTRDY & !nIRDY) # (!nSTOP # nDEVSEL)

End Phase 1

Terminate Phase 1

Continue Phase 1

= nFRAME & (!(nSTOP & nTRDY) # (nSTOP & nDEVSEL & Time_out))= (nSTOP & nDEVSEL & Time_out & !nFRAME) # (!nSTOP & !nFRAME) # (nTRDY & Data_Phase_Timeout)

Terminate Phase 1

Continue Phase 1End Phase 1

Continue Phase 2

Continue Phase 2

End Phase 2

End Phase 2

13. Appendix D

Operating the Display Node

Setting up the display node for operation is a straightforward task. Simply insert the add-in

card into an unused 5V PCI slot. The card should not be connected to the Desk Area

Network before the PC is powered up. This is because voltages applied to the inputs of a

CMOS device that isn’t powered can cause a reverse bias effect that is capable of

destroying the device.

When the computer is turned on, the display node can be connected to the network and the

DAN operating system can be run. This is the dangui executable on the diskette in

Appendix E . Next, the desired display resolution, frame rate etc is chosen and the start

button can be pressed. The camera node must also be included in the network to source the

video data.

14. Appendix E

The following diskette contains the display node’s design information as well as the

software to operate the UQ DAN. The following is included on the disk:

• Altera Directory

→ mas_tar.tdf - the display node’s major design file

→ mas_tar.acf - the compiler options and resource assignments used in the design

→ parity.tdf - the parity generating module for the PCI interface

→ atmlink.gdf - the design file for the ATMlink layer

→ xmitter.gdf - the ATMlink layer’s cell transmitter

→ receiver.gdf - the ATMlink layer’s cell receiver

→ bytecnt.tdf - a byte counter module

→ cellcnt.tdf - a module to count cells in the FIFOs

• DAN GUI Directory

→ pcibios - code to call the PCI BIOS routines

→ pci_cnfg - module to access the PCI bus’ configuration registers

→ pcidisp - performs functions required to configure and access the display node

→ pci - routines to access devices on the PCI bus

→ crdtst - software to test the PCI conformance of a PCI device

→ Other software modules to implement the UQ DAN’s graphical user interface

• PCB Directory

→ pci.pcb - the Traxedit PCB layout file

→ pci.lib - the Traxedit library file of components used in pci.pcb

Video Over an ATM Desk Area Network - pdfs.semanticscholar.org fileiii Acknowledgments I would like...

Documents

Transcript of Video Over an ATM Desk Area Network - pdfs.semanticscholar.org fileiii Acknowledgments I would like...