Video Over an ATM Desk Area Network - pdfs.semanticscholar.org fileiii Acknowledgments I would like...
-
Upload
vuongtuong -
Category
Documents
-
view
214 -
download
0
Transcript of Video Over an ATM Desk Area Network - pdfs.semanticscholar.org fileiii Acknowledgments I would like...
Video Over an ATM
Desk Area Network
Brendan Behan
Supervised By Dr Mark F. Schulz
The University of Queensland
Department of Electrical and Computer Systems
Engineering
Undergraduate Thesis 1996
ii
20/44 Brisbane St,
TOOWONG QLD 4066
1st November, 1996
The Dean,
Faculty of Engineering,
The University of Queensland,
ST LUCIA QLD 4072
Dear Professor Simmons,
In partial fulfilment of the requirements of the Bachelor of Engineering Degree (Honours) in
Electrical Engineering (Computer Systems), I submit for evaluation the thesis entitled
“Video Over an ATM Desk Area Network”.
Yours faithfully,
Brendan Behan.
iii
Acknowledgments
I would like to thank the other two members of the UQ DAN project team, David Gregory
and Thillainathan (Ted) Aravinthan for their support in realising an operational Desk Area
Network. The eternal optimism of David Gregory was always inspiring at times when the
situation appeared bleak.
Thanks must also go to Dr Mark Shulz for providing the means to allow the UQ DAN team
to pursue this topic, and to Len Payne for his guidance throughout the year and the use of
his equipment.
Finally, I am indebted to Guillan Fava and Kieran Behan for their patience and willingness
to preview this document, and the suggestions that they made.
B.B.
iv
Abstract
The use of multimedia data in computing applications has increased exponentially since its
emergence. In many cases, the increase is such that traditional computers struggle to handle
the volume of data associated. This thesis attempts to show through demonstration, that
such problems can be alleviated by basing the design of the computer around a high speed
network.
The Desk Area Network (DAN) is physically small, yet fast, Asynchronous Transfer Mode
(ATM) network that can be used to replace the bus of a traditional workstation. Such a
replacement provides a great improvement in the systems ability to handle multimedia data
as it removes the bottleneck associated with a bus architecture. ATM was chosen as the
network technology as it is capable of handling a variety of data types, and it allows
concurrent data transfers between devices.
This thesis outlines the design and implementation of a display node for the University of
Queensland Desk Area Network. The display node is one of three devices in the UQ DAN
project. Its purpose is to display full motion, colour video that is placed onto the network
by the UQ DAN’s camera node. A Peripheral Component Interconnect (PCI) bus interface
was constructed to allow this video signal to be displayed on a computer screen.
v
Table of Contents
1. Introduction..................................................................................................................1
2. ATM and the DAN.......................................................................................................5
2.1 Synchronous Communication....................................................................................5
2.2 Asynchronous Communication..................................................................................6
2.3 Asynchronous Transfer Mode...................................................................................7
2.3.1 ATM Protocol Specifics.....................................................................................7
2.3.2 Categorisation of Service Requirements.............................................................9
2.4 Desk Area Network..................................................................................................9
2.4.1 The Cambridge DAN.......................................................................................10
2.4.2 The VuNet Desk Area Network.......................................................................12
2.4.3 DAN Simplifications........................................................................................13
3. The University of Queensland DAN..........................................................................14
3.1 UQ DAN’s ATM Protocol......................................................................................14
3.2 The OSI Model.......................................................................................................15
3.3 The DAN Physical Layer........................................................................................16
3.4 The DAN Data Link Layer.....................................................................................18
3.5 Operation of the UQ DAN......................................................................................19
4. The Display Node.......................................................................................................21
4.1 Display Node Requirements....................................................................................21
4.2 Implementation Technology....................................................................................22
4.3 Receiver Structure..................................................................................................23
4.3.1 Processing Control Cells..................................................................................25
4.3.2 Pixel Address Calculations...............................................................................25
4.3.3 Processing Data Cells.......................................................................................27
4.4 The Transmitter’s Structure....................................................................................28
vi
5. The PCI Interface.......................................................................................................29
5.1 Computer Buses.....................................................................................................29
5.2 The PCI Bus...........................................................................................................31
5.3 The UQ DAN’s PCI Interface.................................................................................34
5.3.1 Address Decoding............................................................................................36
5.3.2 Configuration Registers....................................................................................37
5.3.3 Software Support.............................................................................................40
6. Results.........................................................................................................................42
6.1 Display Node Implementation.................................................................................42
6.1.1 Board Implementation......................................................................................43
6.2 The Physical Layer’s Performance..........................................................................44
6.3 PCI Interface Performance......................................................................................44
6.3.1 Average Grant Latency....................................................................................45
6.3.2 Response Time of Video Card..........................................................................45
6.4 Receiver Performance.............................................................................................46
6.4.1 Maximum Achievable Resolution.....................................................................47
6.5 UQ DAN’s Performance.........................................................................................48
7. Conclusions.................................................................................................................51
8. References...................................................................................................................53
9. Bibliography...............................................................................................................55
10. Appendix A...............................................................................................................56
11. Appendix B...............................................................................................................59
12. Appendix C...............................................................................................................60
13. Appendix D...............................................................................................................61
14. Appendix E...............................................................................................................62
vii
List of Figures
Figure 1.1 A Traditional Bus System and the Desk Area Network.....................................1
Figure 1.2 The Desk Area Network Developed at UQ.......................................................3
Figure 2.1 Division of Transmission Time as Frames and Timeslots...................................5
Figure 2.2 ATM Cell Format.............................................................................................8
Figure 2.3 The Cambridge Desk Area Network...............................................................11
Figure 2.4 The VuNet Switch and Operational DAN.......................................................12
Figure 3.1 The University of Queensland DAN................................................................14
Figure 3.2 The OSI Seven Layer Model and ATM...........................................................16
Figure 3.3 Physical Layer Port Connections.....................................................................18
Figure 4.1 Overview of the Display Node’s Structure......................................................21
Figure 4.2 Comparison of Maximum Bus Throughputs....................................................22
Figure 4.3 Flowchart of Receiver's Operation..................................................................24
Figure 4.4 Format of Control Cell Information.................................................................25
Figure 4.5 Pixel Address Calculation...............................................................................26
Figure 4.6 The Transmit State Machine...........................................................................28
Figure 5.1 Configuration of Devices on the PCI Bus........................................................31
Figure 5.2 The Required and Optional PCI Signals..........................................................32
Figure 5.3 Timing of Interface Signals in a Write Transaction..........................................33
Figure 5.4 The Display Node's Datapath..........................................................................35
Figure 5.5 Timing of Address Decoding..........................................................................36
Figure 5.6 Defined PCI Configuration Registers..............................................................38
Figure 6.1 The UQ DAN’s Display Node........................................................................43
Figure 6.2 The Wire-Wrapping of Pin Connections..........................................................43
Figure 6.3 Measured Response of the Video Card...........................................................46
Figure 6.4 The UQ DAN’s Operation on Demonstration Day, 1996.................................49
Figure 6.5 UQ DAN's Interface in Patchwork Mode........................................................50
viii
Table of Nomenclature
AAL ATM Adaptation Layer
ATM Asynchronous Transfer Mode
BIOS Basic Input Output System
BISDN Broadband Integrated Services Digital Network
CLP Cell Loss Priority
CMOS Complementary Metal Oxide Semiconductor
CPU Central Processing Unit
CRC Cyclic Redundancy Check
DAN Desk Area Network
EPLD Electronically Programmable Logic Device
FIFO First-In First-Out
HEC Header Error Check
I/O Input / Output
ISA Industry Standard Architecture
ISDN Integrated Services Digital Network
Mb/s Megabits per second - 1,000,000 bits/sec
MIT Massachusetts Institute of Technology
OSI Open Systems Interconnect
PCI Peripheral Component Interconnect
PQFP Plastic Quad Flat Pack
PTI Payload Type Identifier
QoS Quality of Service
STM Synchronous Transfer Mode
UNI User-Network Interface
UQ University of Queensland
VCI Virtual Circuit Identifier
VESA Video Electronics Standards Association
VPI Virtual Path Identifier
Page 1
Chapter One
1. Introduction
‘Multimedia’ is one of the latest buzz words in the computing field, and its arrival has
brought about a whole new generation of computing applications. Day to day computing is
being revolutionised by the accessibility of the internet and the emergence of multimedia
applications that are able to bring the internet to life. Video-conferencing, interactive video
and internet phone systems are all examples of multimedia applications that enable people to
communicate with distant acquaintances, or increase their knowledge through the use of
their computer. However, even though the types of data that people are processing has
changed dramatically over the past decade, the computers used to perform this processing
have remained the same.
This thesis investigates a novel computer architecture that is destined to make as big an
impact on traditional computer design as multimedia has on the computer industry in
general. The architecture in question is known as the ‘Desk Area Network’, or DAN for
short. The Desk Area Network was born out of the necessity to build a computer that was
better equipped to handle multimedia data. Conventional computers strain under the load
of having to display a full motion video signal and play the accompanying audio necessary
to take part in a video conference. The DAN, however, is able to handle both of these data
streams with ease, and still dedicate all of its processing power to other applications.
BUS
CPUMain
Memory
VideoCard
NetworkInterface
Figure 1.1 A Traditional Bus System and the Desk
Introduction
Page 2
Area Network
The key to the DAN’s success is that it replaces the ‘bus’ architecture of a traditional
computer with a high-speed network. Figure 1.1 shows the configuration of both systems
and highlights the limitation that bus architectures incur. This is that the system bus can
only be used by one device at a time. In the example illustrated, the CPU isn’t able to fetch
data from memory while a video image is coming from the network to the video card.
In the Desk Area Network, devices are effectively removed from within the computer and
connected directly to a network that allows simultaneous communication between multiple
devices. Removing the computer’s bus consequently removes the single biggest bottleneck
that faces the movement of multimedia data in the computer. The choice of network
technology to implement the DAN is critical to the system’s performance as not all
networks are capable of allowing simultaneous communication.
The Desk Area Network constructed uses an Asynchronous Transfer Mode (ATM)
network to connect individual devices. ATM is a high speed network protocol that is
increasing in popularity worldwide due to its acceptance as the basis of the next generation
of telecommunication networks by the International Telecommunications Union (ITU).
Networks that implement the ATM protocol receive the benefits of its ability to handle
many different forms of data very efficiently. Also, the use of ATM switching technology
to connect devices together provides the simultaneous communication between devices
sought, a feat which can not be achieved in bus systems, or on ring based networks like
Ethernet.
To demonstrate the concept of a Desk Area Network, an ATM network was developed that
incorporated a camera node (Gregory96) to source a form of multimedia data, an ATM
switch (Aravinthan96) to implement the ATM network, and a display node to receive the
multimedia data and display it to the users. The interconnection of these devices is
illustrated in Figure 1.2. This thesis discusses the design, implementation and resulting
performance of the DAN’s display node and the physical transmission system of the ATM
network. The display node was required to accept the digital video from the camera and
display it to the users without any CPU intervention. This goal was achieved by designing
Introduction
Page 3
and implementing a PCI bus interface to the PC through which the display node could write
the video data.
Thesis Structure
Some background information essential to understanding the concept of the DAN and the
design decisions made during the course of its implementation is provided in Chapter Two.
This includes basic theory on the manner in which ATM operates and a discussion of
existing DAN implementations.
The third chapter presents a detailed description of the Desk Area Network implementation
at the University of Queensland, and the simplifications that were able to be made due to the
nature of the DAN. The principle of layering network protocols is then discussed before
providing a detailed analysis of the design of the ATM network’s physical layer. Following
this is a brief description of the data link layer, and a discussion of the network’s operation.
Chapter Four of this thesis covers the design of the DAN’s display node. In particular, it
provides an overview of the different components involved in the display node’s operation,
and a closer examination of two vital components: the ATM cell receiver and the cell
transmitter. It also describes the interaction of these components with the computer
interface.
The computer interface in the UQ DAN’s display node was achieved via the Peripheral
Component Interconnect (PCI) bus. The fifth chapter of this document is dedicated to a
Figure 1.2 The Desk Area Network Developed at UQ
Camera Node ATM Switch DisplayNode
Introduction
Page 4
basic discussion of the operation of this bus architecture and the reasons for its selection as
the computer interface for the display node. This is followed by the design choices made in
implementing the PCI interface for the DAN.
The performance metrics of individual components of the display node are stated in Chapter
Six, along with a summary of the performance of the DAN as a whole. The final chapter
provides a summary of the design project and the work remaining before a ‘complete’ DAN
is realised.
Page 5
Chapter Two
2. ATM and the DAN
This chapter begins with a discussion of synchronous and asynchronous communication
networks, highlighting the differences between the two and the advantages offered by an
asynchronous system. The ATM protocol is then described and its ability to handle
multimedia traffic streams noted. Finally, the concept of a Desk Area Network is explored
and existing implementations are outlined.
2.1 Synchronous Communication
The telecommunication networks that form a ubiquitous part of today’s society are based
on synchronous transmission systems. These systems were designed to handle voice grade
data optimally. Synchronous networks were designed to take advantage of the constant bit
rate of audio signals and the large delays between transmissions, to maximise the utilisation
of a communication link. The transmission time of a connection link is divided into fixed
size frames, that are comprised of a fixed number of equal sized timeslots, shown in Figure
2.1. When a user establishes a connection, that user is allocated a time slot within the frame
in which data can be transmitted. This timeslot is reserved for the user in every frame that is
transmitted, for the duration of the connection. This enables the network to determine
which connection is currently transmitting by examining the timeslot in the current frame.
Consequently, the time allocation is distributed to users in a round-robin fashion, and is
‘synchronised’ for each user to a position in the transmitted frame.
***Time
Frame 1 Frame 2
Slot 1 Slot 2 Slot 1 Slot 2Slot N Slot N… … Slot 1
* Synchronisation Point for User 2
Figure 2.1 Division of Transmission Time as Frames and Timeslots
ATM and the DAN
Page 6
This system is extremely efficient when handling voice grade signals due to their constant
bandwidth requirements. Problems arise, however, when less traditional data transmissions
such as file transfers are also using the system. A file transfer is very ‘bursty’ in its
bandwidth requirements. When the file is ready to transfer it will transfer at a very high
rate, but it then may experience a considerable delay before it is ready to transmit the next
file block. To allow the synchronous network to cater for the burst rates of the file transfer,
the transfer may be allocated several timeslots in the frame. The problem that synchronous
networks encounter is that in the idle periods between transfers, the file transfer is still
allocated its quota of timeslots in every frame. These timeslots must then go unused as the
network is not able to allocate them to other users. Asynchronous communication systems
were devised to overcome this inefficient use of bandwidth.
2.2 Asynchronous Communication
Asynchronous communication systems were developed to overcome the inadequacies of
traditional telecommunication networks. Such systems dispense with the frame structure
that inhibits transmissions in synchronous networks, and allocate timeslots to users on an ‘as
required’ basis. This approach ensures that no timeslots are wasted due to their allocation
to data sources during periods in which the data source is not prepared for transmission.
Another limitation of synchronous networks is that the maximum number of connections
that the network can support is equal to the number of timeslots in the frame. In this
situation, one timeslot is reserved for each connection and the network is saturated. With
asynchronous networks, however, it is possible to support more connections than an
equivalent synchronous network. This is accomplished through a statistical analysis of the
data streams and use of the assumption that not all bursty data sources will burst at the
same time. Additional buffering must be added to the network to allow for the contingency
when multiple sources do burst at the same time. The network utilisation is obviously much
higher in the asynchronous network, making it a far more efficient communication system.
A consequence of the system’s asynchronous nature, however, is that it is no longer
possible to use the time that data was transmitted to determine the connection to which that
data belongs. For this reason, a ‘tag’ is attached to all data sent through the network to
ATM and the DAN
Page 7
identify its respective connection. The network is responsible for associating the
connection’s source and destination to its tag when the connection is established.
2.3 Asynchronous Transfer Mode
Asynchronous Transfer Mode, or ATM, is a communication protocol that incorporates the
features of asynchronous networks. The number of ATM network implementations is
increasing rapidly due to its acceptance for the Broadband Integrated Services Digital
Network (BISDN) by the European telecommunications standards committee ITU,
previously known as CCITT. BISDN is a high bandwidth digital network that is being
established as the major network standard of the future for carrying, not only voice data,
but also computer and High Definition TV traffic.
The basic unit of information exchange in ATM is referred to as the ‘cell’. An ATM cell is
a 53 byte packet of data as shown in Figure 2.2. Of the 53 bytes, the first five bytes
comprise the tag field known as the ‘header’, and the remaining 48 bytes constitute the data
being sent in the cell, referred to as the cell’s ‘payload’. It is the small cell size that permits
ATM to function effectively with a variety of different data types. This is because a small
cell size allows the network to multiplex the data with a fine granularity. This ensures that
urgent data won’t be stalled in the network waiting for a large block of file transfer to finish
transmitting, effectively allowing a priority scheme to be established.
2.3.1 ATM Protocol Specifics
ATM is a connection oriented protocol that guarantees in-order delivery of cells. Delivery
is unreliable, however, which requires the use of additional hardware or software to support
this feature. The ATM format for the User-Network Interface (UNI) has only recently been
standardised (ATM95) by the ATM Forum and many of the fields contained in the header
remain ill defined. The format of the cell is shown in Figure 2.2. The tag field used is split
into two components: the Virtual Path Identifier (VPI) and the Virtual Circuit Identifier
(VCI). The VPI field in the header is used to determine the virtual path of a cell between
two endpoints in the network. This value is only valid between two ATM switches in the
network and is remapped by each switch before being sent on. The virtual circuit identifier
ATM and the DAN
Page 8
is used by the endpoints to determine the connection to which the cell belongs. Many VCIs
can be multiplexed onto the same virtual path through the network.
The Generic Flow Control (GFC) bits are undefined and must be set to zero. A Cell Loss
Priority (CLP) bit is included to provide information to determine which cells should be
discarded when the network becomes congested. This bit is cleared for higher priority cells.
The fifth byte in the header is the Header Error Check (HEC) which contains an eight bit
CRC check of the header to attempt to detect bit errors in the routing information.
The final three bit field is the Payload Type Identifier (PTI). Of these three bits, the first bit
is set to indicate whether the cell contains control information, or cleared when the payload
contains standard data. The second bit in this field is reserved (set to zero) and the final bit
is used for the ATM Adaptation Layer (AAL). As all data must be encapsulated into the 48
byte payload of a cell for transmission, a system must be incorporated to handle the
Segmentation and Reassembly (SAR). The ATM Adaptation Layer achieves this purpose.
The significance of the AAL bit depends upon which of the many adaptation layers defined
is being used. The AAL5 adaptation layer standard is regarded as being the simplest and
most efficient adaptation layer to implement (Greaves94). In this standard, the AAL bit is
GFC
Cell Payload
48 bytes
VPI
VCI
VCI
CLPPTI
HEC
VPI
VCI
7 6 5 4 3 2 1
1
2
3
4
5
6
8
53
.
.
.
Bit Position
Byte
Cell Header
GFC - Generic Flow ControlVPI - Virtual Path IdentifierVCI - Virtual Circuit IdentifierPTI - Payload Type IdentiferCLP - Cell Loss PriorityHEC - Header Error Check
Figure 2.2 ATM Cell Format
ATM and the DAN
Page 9
set to indicate that the cell received is the final cell in a block of data. This final cell
contains a count of the number of cells transmitted in the AAL5 block and a 32 bit Cyclic
Redundancy Check (CRC) of all the cells in the block. The cell count is provided to
determine whether the correct number of cells was received from the network and the CRC
is used to detect bit errors and cell reordering errors.
2.3.2 Categorisation of Service Requirements
There are typically two dimensions used to classify the service requirements of different
types of data transfers. These are the data’s sensitivity to delay and it’s sensitivity to loss.
Audio or video communication streams, for example, can tolerate the occasional cell being
lost while travelling through the network but cannot tolerate variable length delays in
transmission. This is because there is a low likelihood of an observer noticing the slight
deterioration in sound or video quality when a cell is lost, but a high probability of user
annoyance when sound segments are delayed by varying amounts. Such data streams are
also termed ‘jitter sensitive’. File transfers, however, are unaffected by delays through the
system but the loss of a single cell can render the received file useless.
ATM is capable of catering for all data types by providing a ‘Quality of Service’ (QoS)
parameter that can be adjusted for different transmissions. The Quality of Service
parameter is akin to a contract that is negotiated between the user and the network when a
connection is established. The user must specify details about the connection’s destination,
peak and average bandwidth requirements, and cell loss and delay requirements at this time.
If the network can provide the service quality that the user requires then the connection is
established, otherwise negotiations either continue for a lesser connection or no network
connection is established at all. The QoS guarantees that ATM can provide to many
different traffic types, is one of the primary reasons why ATM is ideal for use on the Desk
Area Network.
2.4 Desk Area Network
The Desk Area Network is a multimedia network intended to replace the bus in a traditional
workstation. Traditional buses have the limitation that they can only be used by a single
ATM and the DAN
Page 10
device at any time. This limitation becomes far more apparent when dealing with
continuous multimedia streams such as video or audio. Such data streams require very little
or no processing but require large amounts of data transfer and, consequently are termed
‘I/O intensive’. Attempts have been made to create ‘multimedia network interfaces’
(Blair93, Hopper90) that attempt to bear the bulk of this traffic from the workstation’s bus.
The specialised network interfaces developed take the form of autonomous peripherals that
must be connected directly to the network. The DAN architecture avoids the necessity of
additional multimedia network interfaces by removing the I/O bottleneck that the traditional
workstation bus imposes.
Implementing a Desk Area Network involves removing all components from the
workstation and connecting them directly to the ATM network. That is, the workstation’s
CPU, memory, display, storage devices, and any multimedia peripherals are all removed
from the workstation and become nodes on the DAN. The term ‘multimedia peripheral’
refers to any device, such as a camera, that can source or sink continuous multimedia data.
By connecting these devices directly to the ATM switch, different pairs of devices are able
to communicate with each other simultaneously. This means that the switch can route a
video stream directly from the camera to the display at the same time as sending a file from
the storage device to memory. It also means that I/O intensive streams that overwhelm the
traditional workstation can now be moved between endpoints on the network without any
CPU intervention. Two different DAN implementations exist to demonstrate the potential
of this architecture.
2.4.1 The Cambridge DAN
The configuration of the Cambridge Desk Area Network (Barham95) is illustrated in Figure
2.3. This figure clearly shows the manner in which different streams can pass through the
switch simultaneously. Each interface to the ATM switch was capable of supporting full
duplex data transfers at 100Mb/s. With this data rate, the DAN was able to support two
live video streams, one 48kHz audio stream, one transfer of a processed image and a
considerable amount of memory traffic. It was also recognised that, for the DAN to be a
feasible approach to implementing a multimedia workstation, the cost of the network
interface for each device must not be substantially greater than the cost of producing an
ATM and the DAN
Page 11
equivalent bus interface. Cambridge’s implementation of the DAN proved this to be the
case.
The DAN research group categorised the devices attached to the DAN into one of three
classes depending upon their ability to perform network functions. The three classifications
used are: dumb nodes, supervised nodes and smart nodes. Dumb nodes are only capable of
handling data and loading internal configuration registers from a well defined configuration
cell. Supervised nodes posses a certain degree of local processing power to enable them to
perform limited network operations. These nodes still require a more complete processing
node to establish connections in the network. The third class, the smart nodes, have
sufficient processing power to perform the network management functions for, not only
Figure 2.3 The Cambridge Desk Area Network
ATM and the DAN
Page 12
themselves, but for dumb and supervised nodes also. All data and network management
operations on the DAN are performed predominantly in hardware.
2.4.2 The VuNet Desk Area Network
VuNet is the DAN implementation by a research group at Massachusetts Institute of
Technology (MIT). Although the VuNet’s network architecture is equivalent to that of the
Cambridge DAN where multimedia devices are connected directly to an ATM network, the
implementation differs. The VuNet Desk Area Network implements as much functionality
as possible in software (Houth95). This approach was taken to allow the performance of
the network to increase with time, as the performance of future generations of workstations
increases. The ATM switch used for the VuNet project is shown in along with the DAN
created by inserting multimedia devices into this switch.
Figure 2.4 The VuNet Switch and Operational DAN
Although the VuNet’s performance will increase with upgrades in processor performance,
the software approach has yielded throughputs far below design specifications. Each
network interface in this DAN implementation was capable of 500 Mb/s full duplex data
transfers, five times that of the Cambridge DAN. The sustained throughput of data
achieved during testing was reduced to 37 Mb/s due to software contention for resources
and process switching. This throughput is less than that achieved using the specialised
network interfaces of the Pandora system (Hopper90).
ATM and the DAN
Page 13
2.4.3 DAN Simplifications
The two DAN implementations described utilised certain simplifying assumptions to aid the
implementation. The first of these was that the Desk Area Network occupied a small
physical space. Consequently, it was feasible to transmit the network data in parallel, a
solution that is typically too expensive to implement in larger networks. The Cambridge
DAN used eight bit wide data paths and the VuNet either 32 or 64 bit data paths. Parallel
transmission of data reduces the network clocking frequency required to achieve a certain
throughput by the degree of parallelism implemented, and results in simpler and cheaper
network interfaces.
A second simplification was that all devices connected to the DAN can be trusted not to
behave to the detriment of the DAN, just as any device connected to the bus of a
workstation can be assumed to be non-hostile. This greatly reduces the amount of security
precautions required when compared to a similar Local Area Network. When using this
assumption with one final assumption that the network topology is static for any period of
operation, a range of services from access control and fairness policies to congestion
control and topology discovery can be ignored. The majority of these network operations
are substantially complex to require a microprocessor to be included on the interface.
Reductions in the complexity and production costs of the network interfaces for the devices
can then be achieved by not supporting these operations.
Page 14
Chapter Three
3. The University of Queensland DAN
The Desk Area Network being implemented at the University of Queensland is
architecturally much simpler than that at either Cambridge or MIT. To demonstrate the
benefits of a Desk Area Network, a four port switch was developed along with a source and
sink of video data. A diagrammatic overview of this system is given in Figure 3.1. The
ATM Network developed was capable of 100 Mb/s full duplex communication.
Discussed first in this chapter are the deviations from ATM specifications of the UQ DAN’s
ATM protocol. The concept of layered communications protocols is then explained, as well
as how ATM fits into this picture. The UQ DAN’s physical and data link layers are then
described before the UQ DAN’s operation is explained.
Camera
Node
Display Node
ATM Switch
Figure 3.1 The University of Queensland DAN
3.1 UQ DAN’s ATM Protocol
Using the Desk Area Network assumption of reliability, the ATM protocol used on the UQ
DAN was modified to use 52 byte cells. This was achieved by removing the Header Error
Check byte from the cell header. This feature is computationally expensive and extremely
difficult to implement in hardware. It is also unnecessary if it assumed that no errors will be
encountered in data transmissions through the network.
A second simplification was made to ease routing. Whereas a traditional ATM network
requires dynamic connection establishment and Quality of Service negotiation, the DAN
The University of Queensland DAN
Page 15
was capable of assigning these parameters statically. The static connection requirement is
analogous to the traditional computer bus in that, if a sound card is installed when the
computer is booted, it can be assumed that the card will be present for the duration of the
computers operation. Similarly, any device connected to the port of a switch can be
assumed to stay connected for the duration of the networks operation. Addressing of
devices in the network was then simplified by using the port number on the switch that
connects to the destination device, as the VPI in the cell header. Also, every device
connected to the DAN has a very well defined bandwidth requirement. The QoS available
to a device will subsequently not change during the network’s operation since new devices
cannot be added during the DAN’s operation and the current traffic patterns are invariant.
3.2 The OSI Model
The OSI (Open Systems Interconnection) model developed by the International Standards
Organisation provides a theoretical basis to the construction of a communication protocol.
It uses a seven layered system where each layer provides a slightly greater level of
abstraction than the underlying layer. The layers of this model are shown in Figure 3.2.
Also shown in this figure is the ATM protocol which does not fit cleanly into the OSI
model. The ATM protocol best fits into the level two data link layer, but as it provides
support for end-to-end connection, flow control and routing at the cell level, it also
incorporates some of the functionality of higher level OSI layers (Ebrahim92).
The ATM Adaptation Layer sits above the ATM layer and best coincides with level three of
the OSI model as it performs message reassembly from incoming cells. The fourth layer of
the OSI model, the transport layer, is the first layer in the model that guarantees reliable
delivery of information. Since the ATM protocol does not provide this functionality,
additional support (typically software) must be added at this level.
The University of Queensland DAN
Page 16
Application
Presentation
Session
Transport
Network
Data Link
Physical1
2
7
6
5
4
3 AAL
ATM
Physical
Figure 3.2 The OSI Seven Layer Model and ATM
3.3 The DAN Physical Layer
The physical layer of the OSI model represents the layer at which electrical signals are
transmitted through the network and are detected as bits by the receiver. It is at this layer
that the throughput measures are quoted in network implementations. A 100 Mb/s full
duplex network was required for the UQ DAN. This was achieved using eight bit wide data
paths for each direction of data transfer. Whereas this data rate would require at least a
100 MHz serial communication link, the parallel implementation required only a 12.5 MHz
network clock frequency to achieve the same performance.
Noise immunity is a serious consideration when transmitting information at these
frequencies. Schmidt95 states that experience with high speed SCSI devices suggests that
shielded twisted-pair cable should be used for any external connections operating at greater
than 5 MHz. Twenty-five pair twist and flat cable was consequently used for the network
transmission medium to limit the cross-talk between transmitted signals. Each twisted pair
of the cable was assigned one signal line and a ground line.
A second important consideration with dealing with high frequency communications is the
transmission line effects of the cables. These effects cause signals that travel down the cable
to be reflected back if there is a discontinuity in the resistance at the cable’s termination.
Superimposition of reflected signals with the true signal can cause errors in bit detection.
To combat these effects, a standard passive terminator was added to the end of the signal
line of each pair. This terminator involves a 220Ω pull-up resistor and a 330Ω pull down
The University of Queensland DAN
Page 17
resistor. The parallel combination of these two resistances produces a terminating
resistance of 132Ω, which is close enough to the 105Ω characteristic impedance of the
twisted pair cable to minimise reflections (Horowitz89).
With the design of the signal path completed only the drivers and receivers for the network
interfaced needed to be chosen. Having brought the network frequency back to 12.5 MHz
through the use of parallel data transmission, inexpensive 74LS series devices were suitable
for this purpose. The twisted pair cables used had a capacitance of 51.5pF/m, or
approximately 52pF for the one metre lengths between devices. The 74LS240 inverting
octal line driver was tested by the manufacturer with a load of 45pF. The rated times were
therefore approximately valid due to the similar operating conditions. The receiver was
implemented with 74LS14 hex inverting Schmitt triggers. These devices have a typical
hysteresis level of 0.8V which again improve noise immunity.
At 12.5 MHz, the network clock had a cycle time of 80ns. The line drivers had a maximum
propagation delay of 18ns measured at full load, and the receivers had a maximum
propagation delay of 22ns. Adding in the expected 8ns propagation delay through the one
metre twisted-pair cable, the maximum propagation delay of the signal between devices was
calculated at 48ns. This was still than the clock cycle time of the network but greater than
the half cycle time, which prevents the network from being clocked by a single central
clock. To avoid clock skew through the network, each port was assigned a twisted-pair
over which it would transmit its own network clock. All data coming into a receiver would
then be synchronised to the received network clock with negligible clock skew. The
network clocks transmitted by all the devices were of the same frequency but differed in
phase.
The final design of the physical layer for both device ports and switch ports is given in
Appendix A. These port designs differed for device and switch ports to prevent the
necessity of twisting the network cables that connect the transmit port of one device to the
receive port of another device. An illustration of such a connection is shown in Figure 3.3.
A parity signal was also included with each data port to provide a limited degree of error
detection. The ‘Connect’ signal was provided on each port to power an LED when a
connection is established with another port.
The University of Queensland DAN
Page 18
8
8Rx_Data
Rx_Pari ty
Rx_Val id
Rx_Clock
Tx_Data
Tx_Pari ty
Tx_Val id
Tx_Clock
Rx_Data
Rx_Pari ty
Rx_Val id
Rx_Clock
Tx_Data
Tx_Pari ty
Tx_Val id
Tx_Clock
Connect Connect
T w isted Pair #S w itch PortDevice Port
1-8
9
10
11
13
15
16
17
18-25
Figure 3.3 Physical Layer Port Connections
3.4 The DAN Data Link Layer
The remaining signals in the 25 pair cable were assigned for use by the data link layer,
referred to as the ATMlink layer (Gregory96). The ‘Valid’ signals used by the data link
layer were asserted by a transmitting port to allow the receiver to detect incoming cells.
The transmitter holds this signal asserted for the 52 byte transmission. The receiver then
used the valid signal as an enable to latch incoming data.
The ATMlink was also responsible for providing a cell level abstraction to the higher level
components of the system. Both the inputs and outputs of each port were buffered using
receive and transmit FIFOs respectively. Placing a layer of buffering between the device
and the network allowed the devices to operate independently of the network clock. The
added buffering also enabled the ATMlink to provide the cell level abstraction by holding
the deassertion of a ‘FIFO Empty’ signal until at least one complete cell was available in the
receive FIFO. The ATMlink prevented system components from reading bytes of
incomplete cells by controlling the Enable inputs of the FIFOs.
The University of Queensland DAN
Page 19
3.5 Operation of the UQ DAN
The interaction of devices on the Desk Area Network depended largely on the category in
which different devices fall. Using Cambridge’s nomenclature to classify devices in the UQ
DAN, the video node was regarded as a dumb node and the display was regarded as a smart
node. The video node’s purpose in the network was to satisfy requests for video data. All
the information required by the video to satisfy the request was contained in the request
itself. This included network considerations such as the VPI and VCI to which the video
data was to be sent as well as video parameters such as frame rate, resolution, colour,
contrast, etc. The display node achieved its ‘smart’ status by containing enough processing
power to assemble this information to configure the video node’s operation.
The fundamental unit of video data transferred across the network was the video line. The
video line was an arbitrary length row of pixels extracted from the scan-line of the video
signal. The number of pixels contained in this line depended upon the horizontal resolution
of the requested video stream. The UQ DAN used a variation of the AAL5 standard to
perform the segmentation of scan-lines into ATM cells at the video node, and the
reassembly from cells back to scan-lines at the display node. The video node prefixes the
transmission of a scan-line with a control cell. The control cell detailed whether an odd or
even field was about to be sent and the line number in that field. It also specified the
number of data cells that comprise that scan-line. The final data cell in the scan-line block
had its AAL bit asserted to indicate that it was the final cell. A CRC of the transmitted
scan-line was not included in this cell due to the non-critical nature of the data and the
reliability assumption of the DAN.
In summary, for video data to be sent across the network, the display node generated a
request and sent it to the switch with the VPI set to the port number of the video node.
Upon receiving the cell, the switch examined the VPI and routed the cell to the video
node’s port for transmission. The video node then received the cell and processed the
request by setting internal registers with the parameters the cell contains. The first line of
the video source was then scanned and digitised by the video node to meet the required
parameters. A control cell to specify the line number and scan-line cells was then sent to
the switch using the return VPI provided in the request cell. The switch examined this VPI,
The University of Queensland DAN
Page 20
which was the port number of the display node, and routed the data to the display. The
display used the control cell to calculate the pixel address of the scan-lines start point and
latched the number of data cells per line. The video node then sent the digitised scan-line to
the display node one cell at a time, which then displayed the image on the screen.
Page 21
Chapter 4
4. The Display Node
The design of the display node was segmented into several independent components. These
were the PCI interface, the ATM data-link layer, a cell receiver and a cell transmitter. Of
these, all but the data-link component were implemented as finite state machines. The
relationship between these components of the display node is shown in Figure 4.1.
ATMLink PCI
Interface
CellReceiver
CellTransmitter
ATMNetwork
PCIBus
Display Node
Figure 4.1 Overview of the Display Node’s Structure
4.1 Display Node Requirements
The DAN display node’s function was to accept the digitised images from the video node
and display them to the user. Being a ‘smart’ node, it was also responsible for configuring
other devices in the network. Two requirements of the display node’s design were that it
must be able to sink data at the network rate of 100Mb/s and that it must be able to display
the images without any CPU intervention.
Rather than attempting to implement a stand-alone display controller, the display node was
designed to use the monitor of a PC. The required data rate of the display node limited the
computer interface options to either a Peripheral Component Interconnect (PCI) or VESA
Local (VL) bus. Figure 4.2 provides a graphical illustration of the maximum throughputs
that can be achieved using different bus architectures. A PCI interface was chosen, as the
The Display Node
Page 22
PCI bus was a truly platform and processor independent bus. This then allowed the
creation of a display node that was capable of operating on a greater number of systems.
CPU independent operation could be achieved using this bus due to its bus mastering
capability. Bus mastering involves a device, known as the ‘bus master’, claiming ownership
of the bus and reading data from or writing data to an address that it specifies.
ISA 8bit
ISA 16bit
EISA
MCA
VL
PCI
0 20 40 60 80 100 120 140
Maximum Throughput
ISA 8bit
ISA 16bit
EISA
MCA
VL
PCI
Bu
s A
rch
itec
ture
Maximum Throughput of Different Bus Architectures
Figure 4.2 Comparison of Maximum Bus Throughputs
4.2 Implementation Technology
The display node’s implementation technology was chosen before detailed design
commenced. The MAX 7000 family of Electronically Programmable Logic Devices
(EPLDs) from Altera Corporation were found to be suitable technology. The MAX 7000
family provides EPLDs with from 600 to 5000 useable gates and 44 to 256 flip-flops on a
single chip. A consistent propagation delay equal to the speed grade of the device is
experienced through every logic element. MAX 7000 devices are available in speed grades
from 5ns to 20ns.
The choice of implementation technology became the single greatest restriction during the
design phase, as it limits the number of bits of storage available and the complexity of logic
expressions used. Appendix B provides more information on the use of the MAX 7000
EPLDs and the associated Max Plus II software.
The Display Node
Page 23
4.3 Receiver Structure
A flowchart of the cell receiving state machine is shown in Figure 4.3. This state machine
remained in its idle state, Receive_Wait, until the ATMlink layer indicated that the receive
FIFO was no longer empty. This occured when the ATMlink layer deasserted its rx_empty
signal. Care had to be taken when interfacing to the ATMlink as all of the logic elements it
contained were either asynchronous or synchronised to the network clock. To avoid timing
problems, the rx_empty signal needed to be latched to synchronise it to the display node’s
clock. The latched signal nRx_Empty_reg1, could then be used in the receive state machine
without encountering setup or hold time difficulties.
When the receive state machine left its Receive_Wait state to begin processing the incoming
cell, it must first pass through the Set_Rx_Enable state. This state was introduced to satisfy
the setup time of the FIFO’s read enable signal. The additional state was only required
prior to the first read from the FIFO as an extra byte was read at this time. The Altera
devices then always had the next byte available on their inputs, on subsequent reads.
Special attention was then required at the end of the cell to ensure that a 53rd byte would
not be read. This was achieved by deasserting the FIFOs read enable signal whenever the
count of bytes read from the FIFO equalled 52.
The receive state machine behaved differently when processing a control than when
processing a data cell. After the first four bytes of the cell, the cell header, had been read
out of the FIFO, the receive state machine entered the Decode_Header state to determine
whether the cell was a control or data cell. This was done by inspecting the control bit of
the Payload Type Identifier in the most significant byte of the header. If this bit was set, the
cell being received was a control cell and the receive state machine moved to the
Receive_Control state. Otherwise, the state machine progressed to the Reset_Cell_Num
state to process a data cell.
1 Signal names prefixed by an ‘n’ are active low signals.
The Display Node
Page 24
Receive_Wait
Set_Rx_Enable
Receive_Header
Decode_Header
Receive_Control
Set_Line_Addr
Dump_Cell
Reset_Cell_Num
Read_Word
Request_Write
Write_Data
nRx_Empty_reg
msbyte_loading
control_cell
msbyte_loading cell_num = 0
msbyte_loading
request granted
cell_loaded cell_loaded
Y
N
Y
N
N
Y
N
Y
Y
NN
Y
Y
N
Y
N
Y
N
address set
Y
N
Figure 4.3 Flowchart of Receiver's Operation
The Display Node
Page 25
4.3.1 Processing Control Cells
The Receive_Control state was used to latch the information the video node sent about the
next line it would transmit. The format of the next four bytes that contained this
information is shown in Figure 4.4. When the three bytes of pertinent information had been
read into the 32 bit receive register, internal to the Altera device, the receive state machine
extracted the required information. The number of data cells in the following scan line was
latched into the four bit cell_number register. The Cell Error, Data Error and Parity Error
bits were provided by the video node for diagnostic purposes only and were disregarded by
the display node.
Line Number8 bits
CellError
Ev/OdBit
DataError
ReservedLine #MSB
Reserved # Data Cells / Line4 bits
0x00
5
6
7
8
Byte Number
PixelError
Figure 4.4 Format of Control Cell Information
The absolute line number of the following line was also assembled at this time from the first
two bytes of control information. The Ev/Od bit in the control information determined
whether the line number referred to the even or odd field of the image. Fields occur
because the video node transmitted frames in an interlaced format. This meant that all the
even numbered scan-lines, called the even field, are sent first and then all the odd numbered
scan-lines, called the odd field, follow.
4.3.2 Pixel Address Calculations
Once the control information had been latched at the end of the Receive_Control state, the
state machine progresses to the Set_Line_Addr state. It was here that the state machine
calculated the pixel address of the start of the next line. The pixel address was broken into
three components that had to be summed together. The first of these components was the
base address of the video card in memory. To facilitate the simple operation of the
The Display Node
Page 26
receiver’s addressing scheme, the video card was programmed to implement linear
addressing. This meant that all video memory could be accessed exactly like main memory.
When linear addressing is implemented, the base address of the video card is the address of
the first pixel.
Base Address
Pixel Address
Line StartAddress Pixel Offset
Line Offset
Figure 4.5 Pixel Address Calculation
The final two components in assembling a pixel address were the offset from the base
address to the start of the current line and the offset of the pixel from the start of that scan-
line, as shown in Figure 4.5. These are the two components of the pixel address that were
set in Set_Line_Addr state. As a control cell precedes the beginning of a scan-line, the next
pixel data should be written at the beginning of the new line, and the pixel offset variable
must be reset.
Two variables were used in the setting of a line address: line_count and line_offset. The
line count variable stored the line number of the scan-line that was currently being
addressed. Setting the address of the beginning line then involved incrementing the
line_count variable and adding the length of a display line to the line_offset variable until the
line_count variable equaled the line number supplied in the control cell. When this
condition was true, the line_offset variable was guaranteed to hold the offset from the base
address to the start of the line by virtue of its synchronisation with line_count. Also, if the
line_count variable was greater than the target line number, both it and line_offset were
reset to begin the counting from zero.
The Display Node
Page 27
4.3.3 Processing Data Cells
Processing of data cells was performed differently to that of control cells. After decoding
the header to determine that the cell was indeed a data cell, the receive state machine
progressed to the Reset_Cell_Num state. This state was used to reset the cell_number
variable if the data cell received was the final data cell in the scan-line. In this case, the
AAL5 bit of the Payload Type Identifier in the cell would be set. If the data cell was not the
final cell in the scan-line, the cell_number variable was decremented instead. This variable
then effectively stored the number of data cells remaining in the current scan-line. If this
variable equalled zero when a data cell was received, the receive state machine would
discard the data cell by moving from the Decode_Header state to the Dump_Cell state.
Counting cells in this manner ensures that if the final data cell of a scan-line, and the control
cell at the beginning of the next line, are both lost through the network, then no data would
be written to the screen until the next control cell was received to adjust the counters.
The writing of data to the screen was a simplistic task from the receiver’s viewpoint. The
PCI interface wrote data to video memory four bytes at a time and performed this action
twice in every write transaction. Hence, the receiver read in a double word (four bytes) of
data into the 32 bit receive register and moved to the Request_Write state. The PCI
interface then issued a request to the system for permission to write data and remained in
this state until the request was granted. When granted, the PCI interface then addressed the
next pixel location in video memory to which the data was to be written, whilst the receiver
copies the contents of the receive register to another 32 bit register. The receiver was
therefore able to read in the next double word of data while the PCI interface was writing
the current double word. When the PCI interface had finished writing the data and the
receiver had finished loading the next double word, the PCI interface was prompted to
complete the transaction by writing the next four bytes. The receiver waited for the second
write operation to complete before returning to the Read_Word state read remaining data
or to the Receive_Wait state if the entire cell has been processed.
The Display Node
Page 28
4.4 The Transmitter’s Structure
The transmitter had a much simpler implementation than the receiver. This state machine
was only required to write the double words of data received by the PCI interface into the
transmit FIFO. The least significant byte of the double word was written to the FIFO first
to maintain a consistent byte ordering. The flow chart of the Transmit state machine is
shown in Figure 4.6. Two paths were required from the Tran_Byte3 state as it was possible
that another double word was received during this state. If this is the case, the state
machine returns to the Tran_Byte0 state to output the new double word to the transmit
FIFO. Otherwise, the state machine returns to the idle state.
Transmit_Wait
dword_received
Tran_Byte0
Tran_Byte1
Tran_Byte2
Tran_Byte3
dword_receivedY N
Y
N
Figure 4.6 The Transmit State Machine
The ATMlink layer counts the number of bytes written into the transmit FIFO by this state
machine. When fifty-two bytes had been written, the ATMlink asserted the Valid signal to
transmit the cell to the switch. The transmitter consequently assumed that the data being
received has been formatted into 52 byte cells by the system software.
Page 29
Chapter 5
5. The PCI Interface
This chapter discusses bus architectures and the place of PCI in this context. It discusses
the advantages and disadvantages of PCI buses and the motivations for using PCI in
implementing the DAN's display node. A discussion of the design of an ATM - PCI
interface follows, and the chapter is concluded with remarks about the implementation of
this type of interface.
5.1 Computer Buses
The bus of a computer is an interconnection device that allows the different components of
the computer to interact. Physically, the bus is merely a series of parallel wires to which a
limited number of connectors can attach. The earliest computer buses like the Industry
Standard Architecture, or ISA, bus were completely under the control of the computer’s
CPU. The only possible form of communication on the bus was between the CPU and
another device, and it always took the form of a processor initiated read or write.
Interrupts allowed the devices residing on the bus to signal the CPU that some specific
action was required. This prevented the CPU from continually polling the devices checking
for the appropriate conditions that indicated action was necessary. However, a large
latency is still encountered in servicing the interrupt and a substantial overhead is incurred in
having to switch to the appropriate service routine to perform the function.
A more sophisticated solution to this problem is to use Direct Memory Access (DMA). A
DMA transfer involves a device reading or writing to or from a contiguous block of
memory. The transfer between the device on the bus and system memory still must be
initialised in software. Once initialised, though, the device can request additional reads and
writes at any time, until the transfer has completed. In PC systems, up to half of the
The PCI Interface
Page 30
processor cycles can be allocated to performing the DMA transfer which dramatically
increases the throughput that can be achieved by the device. Again this approach generally
requires the use of interrupts and software such as device drivers to service the interrupts
and initialise each DMA transfer.
Bus architectures that support bus mastering avoid these problems by allowing the devices
attached to the bus to initiate data transfers. All devices attached to these buses fall into
two categories: masters and slaves, also known as initiators and targets respectively. An
arbitration scheme must be implemented to support multiple bus masters. When a bus
master wishes to use the bus, it issues a request to the bus arbitrator. Several bus masters
can issue requests simultaneously which then forces the bus arbiter to implement some form
of scheduling, usually on a priority basis, to avoid bus contention. After applying the
scheduling algorithm, the bus arbiter grants the request of one of the bus masters to allow
that device to use the bus. When that bus master has completed the transaction, the bus
becomes idle and the arbiter is able to process the next set of requests and grant the use of
the bus to another device.
As control of this resource in now no longer handled by the operating system, transactions
on the bus can be conducted without any CPU intervention. Any bus master is therefore
able to communicate with any slave without having to go through the processor. A bus
mastering implementation is a more expensive solution due to the extra hardware required
to perform arbitration. It does, however, promote a much more efficient communication in
that the latency for a device to read or write data is reduced, as is the frequency that the
processor must be interrupted or switched to another task. These were the reasons that a
bus mastering architecture was chosen for the display node. The high bandwidth
requirement of the video data could not be met by a 16 bit ISA bus. Also the delay
sensitive nature cannot be easily satisfied in an implementation that requires software
support.
The PCI Interface
Page 31
5.2 The PCI Bus
The PCI bus is one such architecture that provides bus mastering capability. It was
developed by Intel corporation to extend the advantages offered by the VESA Local bus to
devices other than video graphics adaptors. Multiplexed address and data lines allow PCI
devices to maintain a relatively low pin count, with 49 pins required for a bus master and 47
pins required for the slave. The additional two pins required by the master are the request
and grant lines.
Every transaction on the PCI bus is a ‘burst transfer’. A burst transfer consists of an
address phase followed by one or more data phases. The PCI specification (PCISIG95)
allows one data phase to be completed on each bus cycle during a burst transfer. On a
33 MHz implementation this produces a peak throughput of 132 Mbytes per second. The
PCI specification also details 32 or 64 bit PCI implementations in frequencies ranging up to
66 Hz which enable an absolute maximum transfer rate of 528 Mbytes/sec. The
12.5 Mbytes/sec requirement of the display node can be easily satisfied on any of these PCI
systems.
CPU PCIAdaptor
PCI BUS
ISABridge
ISA Bus
DisplayNode
ATMNetwork
VideoController
CRT
Memory
Figure 5.1 Configuration of Devices on the PCI Bus
The display node’s PCI interface implemented both a master and a target state machine and
therefore implemented all 49 of the required signals. The master state machine was required
to write the video information received from the ATM network to the video controller and
The PCI Interface
Page 32
the target state machine was required to accept data written to the display node from the
CPU. The video controller illustrated in Figure 5.1 is a PCI slave device. Consequently,
the display node’s PCI interface was able to address this slave device and write the pixel
data directly to the video controller’s memory independently of the CPU. Figure 5.1 also
illustrates that the CPU was only connected to the PCI bus through a specialised PCI
adaptor. This device was connected to the CPU’s secondary cache to control any memory
references made by the CPU to either the memory subsystem or to a device on the PCI bus.
Very few signals were required to implement the PCI protocol. The six ‘interface control’
signals shown in Figure 5.2 provide all the information necessary to control the outcome of
a bus transaction. Signals in this figure that have their names prefixed with an ‘n’ character
are active low signals. To explain the use of the interface signals an example of a write
transaction on the PCI bus is given in Figure 5.3. It is assumed that in the cycles before
those illustrated in Figure 5.3, the master requested the bus arbitrator for the use of the bus
Figure 5.2 The Required and Optional PCI Signals
The PCI Interface
Page 33
and was just granted permission.
The transaction begins in clock cycle one with the master asserting the nFRAME signal and
driving the address of the target it wishes to write to onto the bus. On the rising edge of
clock two, all target devices on the bus sample the nFRAME signal asserted and realise that
a valid address is on the bus. They latch this address and compare it to their own base
address stored in an internal register. Also in the second clock cycle, the master drives the
data it wishes to write onto the AD bus and the byte enables for the data onto the nCBE
bus. The byte enable signals indicate which bytes of data on the 32 bit AD bus contain valid
data. The master asserts the initiator ready (nIRDY) signal to indicate that that valid data is
present and that it is ready to conclude the first data phase of the transaction. The master
leaves the nFRAME signal asserted, indicating that this is not the final data phase in the
transaction.
CLK
nFRAME
nIRDY
nTRDY
nDEVSEL
1 2 3 4 5 6
Address/Data
nCBE
Address
Command
Data 1
Byte Enables 1
undefined
undefined
Data 2
BE 2
undefined
undefined
Transaction Begins
Data Phase Completes
Figure 5.3 Timing of Interface Signals in a Write Transaction
In this example, the target in question realised that it was the target of the transaction by the
end of the second clock cycle. It consequently asserted its device selected (nDEVSEL)
signal on the rising edge of clock cycle three to inform the master that it has decoded the
address. It also asserted the target ready (nTRDY) signal to indicate that it is also ready to
take part in the first data phase. The data phase therefore completes on the rising edge of
clock cycle four as both the master and the target are ready, nIRDY and nTRDY both
asserted. At this point the target latched the data and byte enables, and both devices
The PCI Interface
Page 34
deasserted their respective ready signals as they prepared for the next data phase. A device
typically deasserts its ready signal if it needs to insert a wait cycle into the transaction. This
could be to allow the master to fetch more data or to allow the target to respond to a full
buffer condition. If neither device in this example deasserted its ready signal, the second
data phase would be completed at the end of clock cycle four.
By the rising edge of the fifth clock cycle both devices were again ready to take part in
another data phase and so asserted their ready signals. The master drove the data it wished
to write onto the AD bus and the byte enables for the data onto the nCBE bus. The master
also deasserted the nFRAME signal at this time to indicate that this will be the final data
phase of the transaction. At the rising of clock cycle six, both devices sample both the
nIRDY and nTRDY signals asserted and realise that the second data phase has completed.
The target latches the data that the master was driving and ends the transaction by
deasserting the nDEVSEL signal when it deasserts its ready signal.
5.3 The UQ DAN’s PCI Interface
An efficient PCI interface is one that takes advantage of burst transactions to minimise the
overhead associated with arbitration latency, the address phase and the address decoding of
the target device. This was made difficult due to the different data rates of the ATM
network and the PCI bus. A data phase on the PCI bus requires four bytes to be written at
a time. Fewer bytes can be written by invalidating the unwanted bytes via the nCBE bus but
again this is an inefficient use of the bus bandwidth.
As there was only a byte wide data path from the Altera devices to the receive FIFO, and
this FIFO was clocked using the PCI clock, data can only be sourced from the FIFO at one
quarter the rate that the PCI bus can sink the data. This made burst transfers difficult to
achieve as 32 bit registers were too expensive in terms of the available resources of the
EPLDs to implement and a certain degree of buffering was required to overcome the
differences in data rates.
The display node designed was capable of burst transactions with two data phases. This
was achieved by using the delay encountered in the address phase, address decoding and
The PCI Interface
Page 35
first data phase of the transaction to load the second double word. The first double word
was moved to the transmit register, Tx_reg, while the second double word was loaded.
This could not be done until the master state machine had taken control of the PCI bus as a
processor write to the display node could occur in the meantime, overwriting the contents
of the transmit register. The datapath of the display node, shown in Figure 5.4, illustrates
the manner in which the data loaded from the receive FIFO can be moved around the
internal storage elements to allow a burst transfer.
PCI Bus
A_D[31..0] CBE[3..0] PAR
A_D_oe C_BE_oe PAR_oe
A_D_Reg[31..0] AD[31..28]
Rx_reg[31..0]
Receive FIFO
Rx_Data[7..0]
Tx_reg[31..0]
Transmit FIFO
Tx_Data[7..0]
ATM Network
C_BE_reg PAR_regC_BE
Altera Device
pixel_addr
Figure 5.4 The Display Node's Datapath
The amount of time spent in prelude to the first data phase depended largely on the address
decoding speed of the target. Assuming that the video controller required one clock cycle
The PCI Interface
Page 36
to decode its address phase, this approach would mean that the fourth byte of the new
double word would be loading when the first data phases completes. A single wait cycle
was required in this case to load the new data into the A_D_reg register so that it could be
placed on the bus. A mandatory wait cycle was therefore inserted between the master state
machine’s data phases to reduce the conditional logic required. The master was then
prepared to finish the second data phase on the next cycle. The display node is
consequently able to write two double words in three cycles giving a peak transfer rate of
88 Mbytes/sec when using a 33 MHz PCI clock. The design files for the PCI interface are
provided in Appendix E and a detailed state transition diagram of the master state machine
is provided in Appendix C.
5.3.1 Address Decoding
Every PCI device must have a programmable base address register to enable dynamic
relocation of that device’s address space. This is required in accordance with the ‘Plug and
Play’ goals of the PCI bus. When a valid address is placed on the bus, the device compares
that address to the contents of its base address register to determine if it is the target of the
PCI transaction. The specification allows devices up to three clock cycles to perform this
function and uses the time taken to classify devices as being either fast, medium or slow
address decoders. The timing of asserting the nDEVSEL signal for different classes of
devices is shown in Figure 5.5. If the transaction is not claimed within the three cycle
period, it is claimed by the subtractive decoder which assumes that the address is for the
ISA bus.
CLK
nFRAME
nIRDY
nTRDY
nDEVSEL Fast Medium Slow Subno responseacknowledge
1 2 3 4 5 6 7
Figure 5.5 Timing of Address Decoding
The display node joined the majority of commercial PCI products, including the video
controller interfaced by the DAN, as being classified as a medium speed decoder. This class
The PCI Interface
Page 37
of devices latch the address at the beginning of clock cycle two, in Figure 5.5, when
nFRAME was sampled asserted and were ready to assert the nDEVSEL signal by the rising
edge of clock cycle three. The number of bits that the device must decode depends on the
size of the address space it implements. That is, if the device required a 2 Gbyte address
space, it would only have to decode the upper one bit in the 32 bit address. Whereas if a
one byte address space were required, the device would be forced to decode all 32 address
bits.
To minimise the number of storage bits used by the base address register, and the logic
required to perform address comparisons, the display node was assigned an unnecessarily
large address space. The 256 Mbyte address space allocated to the display node required
only four bits of storage in the base address register and a second four bit register,
AD[31..28], to latch the upper lines of the address bus for the comparison.
5.3.2 Configuration Registers
Systems that incorporate a PCI bus have three address spaces. As well as the memory and
I/O address spaces implemented by the majority of workstations, the PCI bus additionally
implements a configuration space. The configuration space contains 64 double words of
data to configure devices. The first 16 of these double words are PCI specific, and are
shown in Figure 5.6, while the latter 48 are for device specific purposes. Of the 16 double
word PCI configuration header, all devices on the bus must implement at least the
mandatory configuration registers shown. These registers allow devices present on the bus
to be configured and enabled or disabled as required. It is necessary that all devices on the
bus respond to configuration accesses at all times in order for the BIOS to determine system
requirements and configure devices during startup. The system motherboard includes a
unique line, the IDSEL signal, to each device on the bus for the purpose of addressing that
device’s configuration space. A physical communication path is required as devices have no
address space assigned that they can decode during a system reset. For a detailed
description of the PCI configuration registers and configuration transactions, refer to
Shanley95.
The PCI Interface
Page 38
Byte3 2 1 0
00
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
Double WordNumber
Device ID Vendor ID
Status Register Command Register
Class CodeRevision
IDHeaderType
LatencyTimer
CacheLine SizeBIST
Base Address 0
Base Address 1
Base Address 2
Base Address 3
Base Address 4
Base Address 5
CardBus CIS Pointer
Subsytem ID Subsystem Vendor ID
Expansion ROM Base Address
Reserved
Reserved
Max_Lat Min_GntInterrupt
PinInterrupt
Line
Required Configuration Register
Figure 5.6 Defined PCI Configuration Registers
The majority of the mandatory configuration registers shown in Figure 5.6 were able to be
hardcoded in the display node’s implementation. Hardcoded values were stored in the
EPLDs in the connecting logic between registers and therefore do not consume any bits of
storage in the devices. Only the Status and Command Registers required programmable
bits. Of the hardcoded register values, he Vendor ID and Device ID registers were the only
registers to have a non-zero value. Assigning a zero value to the Class Code register
The PCI Interface
Page 39
implied that the device was developed before class codes were introduced. This was done
as the logic required to implement handling of configuration accesses could be greatly
reduced by handling as many of the registers as possible in an identical fashion, by returning
a value of zero.
The number of bits of the command and status registers implemented was also minimised to
reduce the amount of resources consumed on the Altera devices. Only two bits of the
command register and five bits of the status register were implemented. These were:
Command Register
• Bit 1 - ‘Memory Access Enable’ When set, this bit indicates that the device is to decode
memory addresses on the bus to respond to memory accesses.
• Bit 2 - ‘Master Enable’ This bit must be set for a device to act as a bus master.
Status Register
• Bits 9 & 10 - ‘Device Select Timing’ These bits encode the slowest timing of the
assertion of the nDEVSEL signal. This field is hardcoded to 01b to indicate that the
display node is a medium speed decoder.
• Bit 12 - ‘Received Target Abort’ This bit is set by the bus master whenever a bus
transaction is terminated by the target before it reaches successful completion.
• Bit 13 - ‘Received Master Abort’ This bit is set by the bus master whenever it is forced
to terminate a bus transaction because no target decoded the address presented in the bus
master’s address phase.
• Bit 15 - ‘Detected Parity Error’ This bit should be set by a device whenever it detects a
parity error on the bus.
A final simplification made to reduce implementation complexity was to avoid checking
parity. The PAR, signal on the PCI bus is used to generate even parity over the
address/data and byte enable buses. All PCI devices must generate parity when data is
driven onto the bus, but not all devices need check the parity when receiving data. The
display node was consequently designed only to produce parity for master writes and reads
from the configuration space. Parity errors are not checked on data written to the target.
The PCI Interface
Page 40
This design choice benefits the display node’s implementation as parity checking of received
data would require either duplication of the logic intensive parity module, or at least much
more complex inputs to the module. The latter option may not be suitable if the logic is so
complex that it must be broken into several layers. In that case, the time to produce a valid
parity signal may not meet the timing requirements of the PCI specifications, again making a
second parity module necessary.
5.3.3 Software Support
One of the primary goals of the PCI specification was to support a ‘Plug and Play’ (PnP)
ideology. A PnP system allows cards present on the bus to be automatically detected and
configured at system startup by the system BIOS. There are currently two approaches that
a PnP BIOS can take (Shanley95b). The first is to determine the requirements of all devices
on the bus and use this information to configure devices present with appropriate address
ranges and interrupt pins etc to ensure conflict free operation. The second approach
preferred by newer BIOS implementations is to configure only those devices required to
bootstrap the operating system, and to then let the operating system configure the remaining
devices. In the latter approach, the BIOS still examines every device on the bus but writes
all the information into a data structure known as the Extended System Configuration Data
(ESCD) structure. A PnP operating system, like Windows 95™ will then examine this
structure and use the information to configure devices itself. This has the advantage that
resources can be allocated to devices in a consistent manner each time the system is booted,
and that the configuration is free of the ‘bugs’ that are currently plaguing many PnP BIOSs.
As DOS is not a PnP operating system, code had to be written to access the configuration
registers of the devices on the PCI bus. Configuration accesses can be achieved either
through one of the configuration mechanisms defined in the PCI specification, or through
use of the PCI BIOS (PCISIG94). The code written to perform either of these access
methods is provided on diskette in Appendix E . The configuration routines were used to
examine the requirements of every device, and so no use was made of the ESCD when
configuring the display node. The device was merely assigned an unused region of the
system’s memory space and then had its Master Enable and Memory Enable bits in the
Command Register set. Additional code was produced to check the device’s adherence to
The PCI Interface
Page 41
the PCI specification in implementing the configuration header. This code is presented on
the digital medium of Appendix E .
Page 42
Chapter Six
6. Results
This chapter describes the final implementation of the DAN’s display node and the degree
of performance that was achieved by individual components and the DAN as a whole.
Where appropriate, possible areas of improvement of the display node’s design are noted.
6.1 Display Node Implementation
The final implementation of the display node was achieved using two Altera EPLDs and the
network interface devices of Appendix A on a two layer PCB board. The Altera devices
had to be chosen carefully to ensure that they were able to fit the hardware description
language design file and that they met the PCI timing and electrical specifications. Two
Altera Max 7000 EPM7192SQC160-10 devices were chosen. These devices are 160 pin
plastic quad flat pack packages, that each have 192 bits of internal storage and 10ns gate
delays.
Although the individual PCI devices were PCI compliant in their timing and electrical
characteristics, the display node could not be made to meet the electrical specifications with
these devices. The specification stated that at most one load can be placed on any line of
the PCI bus. This constraint could not be met as the design had to be distributed over the
two devices due to its size, and some of the PCI bus signals were required by both devices.
The resultant double loading of bus signals had little effect on the testing environment as the
PCI bus was only loaded with the display node and the video card. If all card slots on the
bus were in use, the additional loading could cause the system’s integrity to be violated.
This problem could only have been overcome by using the larger ‘Flash’ series of Altera
devices, or the Field Programmable Gate Array (FPGA) type devices that, due to their
density, are more likely to fit the entire design onto a single device.
Results
Page 43
6.1.1 Board Implementation
The input and output pins of the Altera devices change each time the project is recompiled
after an alteration. Wire-wrapping was consequently chosen to implement pin connections.
This method allowed changes to be made relatively easily and permitted construction on a
two-layer Printed Circuit Board (PCB) of a design that would otherwise require a multi-
layer board. As the PCB was relatively free of tracks because of the wire-wrapped
connections, large power and ground planes were also able to be added to the board. These
planes help to reduce noise on the board that accompanies any high speed digital circuit.
Figure 6.1 shows the display node and Figure 6.2 illustrates the wire-wrapped
implementation of this PCI card.
Figure 6.1 The UQ DAN’s Display Node
Figure 6.2 The Wire-Wrapping of Pin Connections
The PCI signals originating from the connector were an exception to the wire-wrap
approach and were joined to the circuit by soldering one end of a wire-wrap wire to the bus
Results
Page 44
connector and wrapping the other end to the destination pin. This approach was chosen
due to the density of the PCI signals in the grid of pads formed. In retrospect, this may not
have been necessary but fewer problems were experienced in this region than on the
connection of wire-wrap pins to signal pads.
6.2 The Physical Layer’s Performance
The physical layer implemented to accomplish the network communication performed
beyond all expectations. No problems were experienced in running the network at its rated
frequency of 12.5MHz. In an attempt to determine the physical layer’s fail point, the
network clock frequency was increased to 25MHz, which was the highest frequency that
could be obtained using available clock generators. Data transmissions occurred at this
network frequency with no occurrences of bit errors. This implies that the physical layer
could be implemented for a Desk Area Network rated at 200Mb/s without modification.
6.3 PCI Interface Performance
The PCI interface of the display node performed as intended. The operation of the target
state machine experienced no difficulties operating at any speed up to the maximum of
33MHz. The master state machine worked without fault up to 25MHz but became
unreliable at 33MHz. When operating at this speed, the master worked but would ‘lock up’
after random periods of operation. The cause of this problem was not isolated but appeared
to be the result of a setup time violation that arose due to the shorter clock periods.
Measurements of the display node’s performance were made difficult to obtain due to
excessive loading experienced when logic analyser probes were placed on the pins of the
Altera devices. The Altera devices were not able to drive both the wire-wrapped
connection to an input pin as well as the additional load of the probes. Erratic behaviour of
the circuit was occasionally witnessed when measurement devices were added.
Results
Page 45
6.3.1 Average Grant Latency
Timing analysis of the PCI interface’s write cycles were investigated over many
transactions. In the transactions witnessed, the smallest latency from the assertion of the
request signal to being granted use of the bus was two clock cycles. The largest latency, in
this same period, was 32 clock cycles. Larger delays in the assertion of the grant signal can
usually be attributed to the bus being used by another device. The average delay
experienced by the PCI interface waiting for nGNT to be asserted was five clock cycles at
25 MHz, or approximately 200 ns.
6.3.2 Response Time of Video Card
When writing data to the Cirrus Logic GD54N30 video card through the PCI interface,
measurements of the timing revealed that the video card itself produced a major bottleneck
in the transmission of information. The write cycle measured using a logic analyser is
reproduced in Figure 6.3 for a bus clock frequency of 25MHz. When compared to the
timing diagram of Figure 5.3 that closely portrays the display node’s abilities, it is obvious
that much larger delays are being experienced in actual operation.
The video card inserts approximately eight wait states into each data phase, or
approximately 320ns. This wait state delay measured was relatively consistent for different
bus cycle rates. The transaction in Figure 5.3 that theoretically should take six bus cycles,
took 24 bus cycles. The entire transaction depicted in Figure 6.3 took 960ns to complete.
Preliminary software tests of the video card’s linear addressing were conducted by
continuously colouring a 640 x 480 pixel square in a tight software loop. The maximum
frame rate that was obtained during these tests was 21.35 frames per second. This relatively
lethargic frame rate was originally attributed to software inadequacies. However, this frame
rate corresponds to an average of 610ns between double word transfers. As the CPU-PCI
bridge used in the test computer was only capable of single data phase transactions, the
calculated inter-transaction delay is more likely to be the result of the delays inserted by the
video card. Hence, even with the slightly better performance offered by the display node’s
burst transactions, it does not appear that the goal of a 640 x 480 pixel video frame,
refreshed 25 times a second can be achieved with the video card used.
Results
Page 46
6.4 Receiver Performance
The level of performance that could be extracted from the ATM cell receiver was
diminished due to the video card imposed delays. To support the 100Mb/s data rate of the
network links, the receiver had to be able to process cells in an average time of 4.5µs. The
average time for the receiver to process a cell was measured to be 7µs. Control cells were
an exception and were able to be processed in less than 3µs, but these cells occur much less
CLK
nFRAME
nIRDY
nTRDY
nDEVSEL
1 2 3 4 5 6
nREQ
nGNT
CLK
nFRAME
nIRDY
nTRDY
nDEVSEL
13 14 15 16 17 18
nREQ
nGNT
7 8 9 10 11
19 20 21 22 23 24
12
Wait Cycles
0 ns 200 ns 400 ns
600 ns 800 ns 960 ns
Figure 6.3 Measured Response of the Video Card
Results
Page 47
frequently than the data cells that were processed very slowly. Figure 6.3 implies that
approximately 1µs of processing time was spent on each PCI transaction. As six of these
transactions were required for each data cell, a lower limit was imposed on the receiver’s
cell processing time that was greater than the time required to meet the specifications.
One change that could be made to the receiver to slightly decrease this processing time
would be to move the second double word from the receive register (Rx_reg) to the
transmit register (Tx_reg) for the second data phase as well as the first. The additional data
movement would cause one additional clock cycle delay before the assertion of nIRDY for
the second data phase, but would allow the first double word of the next transaction to be
loaded while waiting for the second data phase to complete. This approach of further
overlapping data read and write times can reduce the cell processing time by decreasing the
delay between one transaction ending and the request for the next transaction. Using the
transmit register for the source of pixel data for both data phases has the secondary
advantage of reducing internal connections in the Altera devices, and may improve the
compiler’s ability to fit the project.
6.4.1 Maximum Achievable Resolution
Due to the properties of the raster video signal processed by the camera node, a delay of
64µs is experienced between the start of video line transmissions. Using this information
with the knowledge of the average cell processing time, the maximum horizontal resolution
that can be supported by the display node can be calculated. Assuming an average of 7µs to
process a cell, nine complete cells can be processed in the 64µs inter-line time period.
However, the first of these cells is a control cell that only requires 3µs to process. This
leaves 5µs between the end of the ninth cell and the beginning of the next line. Another five
transactions from an incomplete cell could conceivable be processed in this time, but this
feature is not implemented on the display node.
With a maximum of eight data cells being sent per line, a maximum horizontal resolution of
384 pixels can be achieved before cells begin to accumulate. Cell accumulation in the
Receive FIFO will eventually lead to cell loss and picture quality degradation when the
Receive FIFO becomes full. If the eight wait cycles per data phase were removed from the
Results
Page 48
write transactions, the Receiver would be able to process data cells in approximately 3µs
and achieve the full 640 x 480 pixel image at 25 frames per second.
The display node was shown to work at a 640 x 480 pixel resolution by attaching a loop
back network cable to the display node. Any data transmitted by the display node is then
looped back to the display node’s receive port and is processed. Test software was then
written to utilise the loop-back configuration to send the display node an image with larger
inter-cell delays. This test procedure verified the ability of the display node to handle
images of all sizes.
6.5 UQ DAN’s Performance
The UQ DAN’s performance at the completion of the project was below the group’s initial
expectations but encouraging as a proof of concept for the Desk Area Network
architecture. Problems in implementing the ATM switch (Aravinthan96) could not be
resolved, but network performance could still be ascertained from the point to point
connection of the camera node and display node. The best full motion video image that was
able to be produced by the UQ DAN was at a resolution of 48 x 480 pixels and 25 frames
per second, which equates to a data rate of 576 kbytes/sec. This data rate was far below
the UQ DAN’s specifications but was comparable to, if not better than, that which can be
achieved on Ethernet networks or ISA buses. The horizontal resolution was limited to that
of one data cell’s payload due to problems experienced with cell delineation.
Cell delineation occurs when additional bytes are present in the network FIFOs. A one byte
offset in the header information of a cell causes the receiver to completely misconstrue that
cell’s contents and leads to data cells being treated as control cells and vice versa. When a
data cell is mistakenly recognised as a control cell, line and cell numbers are set using the
random bit patterns that occur in bytes five to eight of that cell. The display node’s output
is then an incoherent mess of randomly placed control and data cells.
In the UQ DAN, these spurious bytes were occurring somewhere between the camera
node’s cell transmitter and the display node’s cell receiver. The exact cause of the extra
bytes that lead to the occurrence of cell delineation was never precisely isolated. Closer
Results
Page 49
examination found that additional bytes were occurring in the video node’s Transmit FIFO.
However, the problem only ever occurred when more than one data cell was sent per line.
A 48 pixel, or one data cell payload, wide image was consequently the largest size video
image that could be obtained in native mode.
Figure 6.4 The UQ DAN’s Operation on Demonstration Day, 1996
The UQ DAN’s operating system implemented a ‘patchwork’ display mode to overcome
this problem. This involved requesting 48 pixel wide strips of the image one frame at a
time. Each request for a strip of data was then offset by 48 pixels so that the sequence of
strips arriving at the display node would form a complete image when arranged correctly.
The operating system then copied the received strip from the left hand edge of the screen
where it was placed by the display node, to its correct place on the screen. It is this method
that was used to produce the image seen in Figure 6.4 used to demonstrate the UQ DAN on
the Department of Electrical Engineering’s undergraduate thesis demonstration day.
A screen capture of the UQ DAN’s operating system interface using this display mode is
given in Figure 6.5. This figure clearly shows the narrow 48 pixel strip to the very left of
the image that belongs towards image’s centre. This working strip was copied to the
correct position on the screen before the next strip was requested. The working strip was
Results
Page 50
then overwritten with the new request by the display node, and the software copied this new
strip to its respective position on the screen. A frame rate of approximately 3 frames per
second was achieved using this display mode at a resolution of 480 x 340 pixels. This
provided acceptable quality video for slow moving images and was comparable to many
existing video conferencing technologies.
Figure 6.5 UQ DAN's Interface in Patchwork Mode
Working Strip Patchwork Image
Page 51
Chapter Seven
7. Conclusions
Multimedia applications and their associated multimedia data streams are becoming a
commonplace addition to the everyday computing environment. This thesis described the
implementation of a novel computer architecture that was designed to handle these
multimedia data streams with a much greater efficiency. The Desktop Area Network
implements a multimedia workstation by replacing the workstation’s bus with a high speed
ATM network, to remove the I/O bottleneck that plagues the majority of existing systems.
The DAN developed at the University of Queensland demonstrated this philosophy through
the implementation of a working system. The system developed consisted of two
multimedia devices and an ATM switch to form the 100Mb/s network. The first device, the
camera node, sourced video data which was received by the display node and presented to
the user.
This thesis detailed the implementation of the UQ DAN’s display node. This node was
responsible for receiving the ATM cells from the camera node that contain the pixel
information, and extracting this information to enable the raw image data to be displayed to
the users. The pixel data received by the display node was presented on the display of a
standard PC by writing the information directly to that computers video memory. The CPU
independence achieved through this approach was in keeping with the basic principle of the
DAN.
A PCI interface to a computer was designed and implemented to provide the high
bandwidth, CPU independent interface required by the display node. This interface allowed
the display node to directly access the video memory of the computer’s video card via the
bus mastering ability of the PCI bus. The display node was capable of displaying 640 x 480
pixel, 256 colour images at 25 frames per second.
Conclusions
Page 52
The display node implemented was proved to be functionally correct but was constrained in
the upper level of operation it could provide. This constraint was enforced by the limited
ability of the video card to accept data. It was shown that the video card used was not
capable of accepting the full-size video images at a rate of 25 frames per second. This
constraint was never imposed during operation of the UQ DAN, however, as the image size
that could be achieved was limited by problems in the network. These problems prevented
images wider than 48 pixel being displayed in true motion.
Future Work
With a fully operational camera node and display node, the implementation of a complete
Desk Area Network is very feasible. The goal of producing a complete DAN would require
the addition of an audio node to complement the existing devices. A signal processing node
could also be added to the system to enable the video or audio signals that the DAN
receives to be ‘massaged’. Video conferencing could then be achieved by duplicating the
DAN and networking the two systems together using an ATM local or wide area network.
This configuration would allow the benefits of the Desk Area Network’s distributed nature
to be fully investigated.
Many enhancements can also be made to the DAN display node. One such enhancement
would be to enable the display node to handle multiple video streams. This feature would
be especially pertinent in group video conferencing situations where more than two parties
are required to communicate concurrently. The addition of multiple data streams requires
that support for multiple windows also be added. This addition will increase the processing
power required by the display node considerably, especially for overlapping windows.
Page 53
8. References
Aravinthan96 T. Aravinthan, “100Mbps ATM DAN Switch”, UQ Undergraduate Thesis,
December 1996
ATM95 ATM Forum, “ATM User Network Interface Specification v3.1”,
http://www.atmforum.com, January 1995
Barham95 P.Barham, M. Hayter, D. McAuley, and I. Pratt, “Devices on the Desk Area
Network”, IEEE Journal on Selected Areas in Communications, Vol. 13
No. 4 May 1995, pp722-732
Blair93 G. Blair, A. Campbell, G. Coulson, F. Garcia, D. Hutchinson, A. Scott and
D. Shepherd, “A Network Interface Unit to Support Continuous Media”,
IEEE Journal on Selected Areas in Communications, Vol. 11 No. 2
February 1993, pp264-275
Ebrahim92 Z. Ebrahim, “A Brief Tutorial on ATM”, March 1992,
www-ipg.umds.ac.uk/~dlgh/teaching/atm-tutorial.html
Greaves94 D. Greaves, D. McAuley, L. French, and E. Hyden, “Protocol and Interface
for ATM LANs”, March 1994, The Blue Book,
http://www.cl.cam.ac.uk/Research/SRG/bluebook/11/protocol_and_interface
/protocol_and_interface.html
Gregory96 D. Gregory, “Digital Video over an ATM Desk Area Network”, UQ
Undergraduate Thesis, December 1996
Hopper90 A. Hopper, “Pandora - An Experimental System for Multimedia
Applications”, Operating Systems Review, Vol 24 No.2 April 1990
Horowitz89 P. Horowitz, and W. Hill, “The Art of Electronics”, 2nd Edition, Cambridge
University Press, 1989
Houth95 H. Houth, J. Adam, M. Ismert, C. Lindblad, and D. Tennenhouse, “The
VuNet Desk Area Network: Architecture, Implementation, and
Experience”, IEEE Journal on Selected Areas in Communications, Vol. 13
No. 4 May 1995, pp710-721
PCISIG95 The PCI Special Interest Group, “PCI Local Bus Specification”, Revision
2.1, June 1995
References
Page 54
PCISIG94 The PCI Special Interest Group, “PCI BIOS Specification”, Revision 2.1,
August 1994
Schmidt95 F. Schmidt, “The SCSI Bus and IDE Interface”, Addison Wesley, 1995
Shanley95 T. Shanley, and D. Anderson, “PCI System Architecture”, Third Edition,
Mindshare Inc., 1995
Shanley95b T. Shanley, “Plug and Play System Architecture”, Mindshare Inc., 1995
Page 55
9. Bibliography
Altera Data Book, 1996
B.Britton, and E. Cook, “Design an FPGA-Based PCI Bus Interface”, Electronic Design,
Vol. 43 Iss. 5, pp100-105
B. Davie, “The Architecture and Implementation of a High-Speed Host Interface”, IEEE
Journal on Selected Areas in Communications, Vol. 11 No. 2,
February 1993, pp228-241
C. Geber, “Peripheral Component Interconnect (PCI) Interface with the Quicklogic
QL16x24B FPGA”, WESCON/94 Idea/Microelectronics Conference
Record, pp568-573
D. Gordon, “The Clock Generation Board and BackPlane V2.0”,
ftp://cell-relay.indiana.ed/pub/cell-relay/docs/ftp.cl.cam.ac.uk/93-2
I. Leslie, and D. McAuley, “EISA ATM Interface Card”, The Green Book,
http://www.cl.cam.ac.uk/Research/SRG/GreenBook
A. Light, “Design a PCMCIA Add-In Card for the PCI Bus”, Electronic Design, Vol. 42
Iss. 24, pp140-146
T. Moors, and A. Cantoni, “ATM Receiver Implementation Issues”, IEEE Journal on
Selected Areas in Communications, Vol. 11 No. 2, February 1993,
p254-263
Motorola, “Fast and LS TTL Data”, Rev 5
I. Pratt, “The DAN Frame Store”, The Green Book,
http://www.cl.cam.ac.uk/Research/SRG/GreenBook
K. Ramakrishnan, “Performance Considerations in Designing Network Interfaces”, IEEE
Journal on Selected Areas in Communications, Vol. 11 No. 2, February
1993, pp203-219
W. Stevens, “Unix Network Programming”, Prentice Hall Software Series, 1990
Page 56
10. Appendix A
UQ DAN Device and Switch Port Schematics
This appendix provides the circuit schematics of the physical layer ports for UQ DAN
switch and devices. These schematics were drafted using Protel Advance Schematic3 for
Windows™ and are provided on the diskette in Appendix E .
Page 59
11. Appendix B
Altera MAX 7000 Development Environment
The MAX 7000 devices used in the construction of the display node were programmed
using the proprietary software package, MAX Plus II, developed by Altera. A design file
can be entered into MAX Plus II in a variety of formats including: hardware description
language, graphical depiction of circuitry or input-output waveform specification. The
majority of the display node’s design was written in Altera Hardware Description Language
(AHDL). This programming interface is similar to conventional programming and offers a
great amount of flexibility.
Once the design was complete, the MAX Plus II software was used to compile the project
to minimise logic functions and find the optimal manner in which to fit the design into the
specified MAX 7000 device. The software also included a comprehensive simulation
facility in which circuit operation of the design could be tested before actual
implementation. The simulator was found to be very accurate in its timing analysis
compared to that of the implemented circuit and was an extremely useful tool. When the
design had been simulated completely, the MAX Plus II software was used to program the
devices with the design.
The MAX Plus II compiler also provides a number of options to determine the logic that is
synthesised. These options had to be tweaked to fit the design files of the display node
provided in Appendix E . The ‘Register Packing’ and ‘Automatic I/O Cell Registers’
options had to turned on for a good fit. The most dramatic benefits, however, were
received by using the ‘One Hot State Machine Encoding’ option. This option assigns a
unique bit to each state in every state machine. The availability of unique bits greatly
reduces the size of logic expressions, and consequently reduces the number of Shareable
Expanders required by the design. It was found, though, that once the state assignments
had been produced, they should be placed directly into the project’s design file if the pin or
logic cell assignments are to be kept.
Page 60
12. Appendix C
State Transition Diagram for PCI Master
Bus_Busy_M Idle_M
Dr_Bus
M_addr
M_addr2
M_data
M_data2
Backoff_M
Turn_ar_M
nFRAME & nREQ & !nGNT
nFRAME & nGNT
nFRAME & !nREQ & !nGNT
!nFRAME
nFRAME & nIRDY
!(nFRAME & nIRDY)
!nGNT & nREQ
!nGNT & !nREQ
= nSTOP & ((nDEVSEL & !Time_out) # (!nDEVSEL & nTRDY & !Data_Phase_Timeout)
!nFRAME
nFRAME
nTRDY & Data_Phase_Timeout
= nSTOP & !nDEVSEL & (nTRDY # nIRDY) & !Data_Phase_Timeout= (!nTRDY & !nIRDY) # (!nSTOP # nDEVSEL)
End Phase 1
Terminate Phase 1
Continue Phase 1
= nFRAME & (!(nSTOP & nTRDY) # (nSTOP & nDEVSEL & Time_out))= (nSTOP & nDEVSEL & Time_out & !nFRAME) # (!nSTOP & !nFRAME) # (nTRDY & Data_Phase_Timeout)
Terminate Phase 1
Continue Phase 1End Phase 1
Continue Phase 2
Continue Phase 2
End Phase 2
End Phase 2
Page 61
13. Appendix D
Operating the Display Node
Setting up the display node for operation is a straightforward task. Simply insert the add-in
card into an unused 5V PCI slot. The card should not be connected to the Desk Area
Network before the PC is powered up. This is because voltages applied to the inputs of a
CMOS device that isn’t powered can cause a reverse bias effect that is capable of
destroying the device.
When the computer is turned on, the display node can be connected to the network and the
DAN operating system can be run. This is the dangui executable on the diskette in
Appendix E . Next, the desired display resolution, frame rate etc is chosen and the start
button can be pressed. The camera node must also be included in the network to source the
video data.
Page 62
14. Appendix E
The following diskette contains the display node’s design information as well as the
software to operate the UQ DAN. The following is included on the disk:
• Altera Directory
→ mas_tar.tdf - the display node’s major design file
→ mas_tar.acf - the compiler options and resource assignments used in the design
→ parity.tdf - the parity generating module for the PCI interface
→ atmlink.gdf - the design file for the ATMlink layer
→ xmitter.gdf - the ATMlink layer’s cell transmitter
→ receiver.gdf - the ATMlink layer’s cell receiver
→ bytecnt.tdf - a byte counter module
→ cellcnt.tdf - a module to count cells in the FIFOs
• DAN GUI Directory
→ pcibios - code to call the PCI BIOS routines
→ pci_cnfg - module to access the PCI bus’ configuration registers
→ pcidisp - performs functions required to configure and access the display node
→ pci - routines to access devices on the PCI bus
→ crdtst - software to test the PCI conformance of a PCI device
→ Other software modules to implement the UQ DAN’s graphical user interface
• PCB Directory
→ pci.pcb - the Traxedit PCB layout file
→ pci.lib - the Traxedit library file of components used in pci.pcb