synergy.ac.insynergy.ac.in/intranet/CLASSNOTES/classnoteesd.docx · Web viewSynergy Institute of...

Synergy Institute of Engineering and Technology, Dhenkanal

Department of Computer Science

Course: Embedded System (PECS 5408)

Class: B. Tech – 8th semester

Faculty-S. Abinash Asst. Prof, CSE

Lecture Notes

Lecture 1-

Topic- Application and characteristics of embedded systems.

Define Embedded System An embedded system is a combination of computer hardware and software, and per-hapsadditional mechanical or other parts, designed to perform a specific function. A good ex-ample is the microwave oven. Almost every household has one, and tens of millions of them are usedevery day, but very few people realize that a processor and software are involved in thepreparation of their lunch or dinner.

Differentiate between general-purpose computers and embedded system.This is in direct contrast to the personal computer in the family room. It too is comprised ofcomputer hardware and software and mechanical components (disk drives, for exam-ple).However, a personal computer is not designed to perform a specific function. Rather, it is able todo many different things. Many people use the term general-purpose computer to make thisdistinction clear. As shipped, a general-purpose computer is a blank slate; the manufac-turer does not know what the customer will do with it. One customer may use it for a network file server, another may use it exclusively for playing games, and a third may use it to write the next great American novel. Frequently, an embedded system is a component within some larger system.For example, modern cars and trucks contain many embedded systems. One embedded system controls the anti-lock brakes, other monitors and controls the vehicle's emissions, and a third displays information on the dashboard. In some cases, these embedded systems are connected by some sort of a communications network, but that is certainly not a requirement. If an embedded system is designed well, the existence of the processor and software could be completely unno-ticed by a user of the device. Such is the case for a microwave oven, VCR, or alarm clock. In some cases, it would even be possible to build an equivalent device that does not contain the processor and software. This could be done by replacing the combina-tion with a custom integrated circuit that performs the same functions in hardware. How-

ever, a lot of flexibility is lost when a design is hard-coded in this way. It is much easier, and cheaper, to change a few lines of software than to redesign a piece of custom hard-ware.

Applications of embedded systems-Consumer Electronics,House Hold appliances, Office Automation, Business Equipments, Automo-biles, Communications, Toys, Avionic Systems (Defense),Medical Computer Peripherals

Examples:1. Digital WatchAt the end of the evolutionary path that began with sundials, water clocks, and hourglasses is thedigital watch. Among its many features are the presentation of the date and time (usually to thenearest second), the measurement of the length of an event to the nearest hundredth of a second,and the generation of an annoying little sound at the beginning of each hour. As it turns out,these are very simple tasks that do not require very much processing power or memory. In fact,the only reason to employ a processor at all is to support a range of models and features from asingle hardware design.

The typical digital watch contains a simple, inexpensive 8-bit processor. Because such smallprocessors cannot address very much memory, this type of processor usually contains its ownon-chip ROM. And, if there are sufficient registers available, this application may not requireany RAM at all. In fact, all of the electronics-processor, memory, counters and real-time clocksarelikely to be stored in a single chip. The only other hardware elements of the watch are theinputs (buttons) and outputs (LCD and speaker).

The watch designer's goal is to create a reasonably reliable product that has an extraordinarilylow production cost. If, after production, some watches are found to keep more reliable time thanmost, they can be sold under a brand name with a higher markup. Otherwise, a profit can still bemade by selling the watch through a discount sales channel. For lower-cost versions, thestopwatch buttons or speaker could be eliminated. This would limit the functionality of the watchbut might not even require any software changes. And, of course, the cost of all this developmenteffort may be fairly high, since it will be amortized over hundreds of thousands or even millionsof watch sales.

2. Video Game PlayerWhen you pull the Nintendo-64 or Sony Play station out from your entertainment center, you arepreparing to use an embedded system. In some cases, these machines are more powerful than thecomparable generation of personal computers. Yet video game players for the home market arerelatively inexpensive compared to personal computers. It is the competing requirements of highprocessing power and low production cost that keep video game designers awake at night (andtheir children well-fed).

The companies that produce video game players don't usually care how much it costs to developthe system, so long as the production costs of the resulting product are low-typically around ahundred dollars. They might even encourage their engineers to design custom processors at adevelopment cost of hundreds of thousands of dollars each. So, although there might be a 64-bitprocessor inside your video game player, it is not necessarily the same type of processor thatwould be found in a 64-bit personal computer. In all likelihood, the processor is highlyspecialized for the demands of the video games it is intended to play.Because production cost is so crucial in the home video game market, the designers also usetricks to shift the costs around. For example, one common tactic is to move as much of thememory and other peripheral electronics as possible off of the main circuit board and onto thegame cartridges. This helps to reduce the cost of the game player, but increases the price of eachand every game. So, while the system might have a powerful 64-bit processor, it might have onlya few megabytes of memory on the main circuit board. This is just enough memory to bootstrapthe machine to a state from which it can access additional memory on the game cartridge.

Lecture-2

Topic- Overview of Processors and hardware units in an embedded system

Diagram of generic embedded system diagram.

All embedded systems also contain some type of inputs and outputs. For example, in amicrowave oven the inputs are the buttons on the front panel and a temperature probe, and theoutputs are the human-readable display and the microwave radiation. It is almost always the casethat the outputs of the embedded system are a function of its inputs and several other factors(elapsed time, current temperature, etc.). The inputs to the system usually take the form ofsensors and probes, communication signals, or control knobs and buttons. The outputs aretypically displays, communication signals, or changes to the physical world.

Typical Embedded System Design

Processor- Pro Embedded processors can be broken into two broad categories. Ordinary micro-processors (μP) use separate integrated circuits for memory and peripherals. Microcontrollers (μC) have on-chip peripherals, thus reducing power consumption, size and cost. In contrast to the personal computer market, many different basic CPU architectures are used, since software is custom-developed for an application and is not a commodity product installed by the end user. Both Von Neumann as well as various degrees of Harvard architectures are used. RISC as well as non-RISC processors are found. Word lengths vary from 4-bit to 64-bits and beyond, although the most typical remain 8/16-bit. Most architectures come in a large number of different variants and shapes, many of which are also manufactured by several different companies.

Numerous microcontrollers have been developed for embedded systems use. General-purpose microprocessors are also used in embedded systems, but generally require more support cir-cuitry than microcontrollers.

Lecture- 3

Topic- General purpose processors, Microcontrollers

Processors in an ES General purpose processorMicroprocessorMicrocontrollerEmbedded processorDigital signal processorMedia processor

Application specific system processor Multiprocessor system using GPP and ASSP GPP core or ASIP core integrated into either an ASIC or a VLSI circuit or anFPGA core integrated with processor unit in a VLSI chip.

ExplanationA processor has two essential units

Program Flow Control Unit (CU) Execution Unit (EU)

Processors The central processing unit is the most important component in an embedded system. It exists in an integrated manner along with memory and other peripherals. Depending on the type of applications the processors are broadly classified into 3 major categories

1. General Purpose Microprocessors

2. Microcontrollers

3. Digital Signal Processors

For more specific applications customized processors can also be designed. Unless the demand is high the design and manufacturing cost of such processors will be high. Therefore, in most of the applications the design is carried out using already available processors in the market. However, the Field Programmable Gate Arrays (FPGA) can be used to implement simple customized processors easily. An FPGA is a type of logic chip that can be programmed. They support thousands of gates which can be con-nected and disconnected like an EPROM (Erasable Programmable Read Only Mem-ory). They are especially popular for prototyping integrated circuit designs. Once the de-sign is set, hardwired chips are produced for faster performance. General Purpose Processors A general purpose processor is designed to solve problems in a large variety of applica-tions as diverse as communications, automotive and industrial embedded systems. These processors are generally cheap because of the manufacturing of large number of units. The NRE (Non-recurring Engineering Cost: Lesson I) is spread over a large number of units. Being cheaper the manufacturer can invest more for improving the VLSI design with advanced

optimized architectural features. Thus the performance, size and power consumption can be improved. Most cases, for such processors the design tools are provided by the manufacturer. Also the supporting hardware is cheap and easily available. However, only a part of the processor capability may be needed for a specific design and hence the over all embedded system will not be as optimized as it should have been as far as the space, power and reliability is concerned.

Pentium IV is such a general purpose processor with most advanced architectural fea-tures. Compared to its overall performance the cost is also low. A general purpose processor consists of a data path, a control unit tightly linked with the memory. (Fig. 4.1) The Data Path consists of a circuitry for transforming data and storing temporary data. It contains an arithmetic-logic-unit(ALU) capable of transforming data through operations such as addition, subtraction, logical AND, logical OR, inverting, shifting etc. The data-path also contains registers capable of storing temporary data generated out of ALU or related operations. The internal data-bus carries data within the data path while the ex-ternal data bus carries data to and from the data memory. The size of the data path indi-cates the bit-size of the CPU. An 8-bit data path means an 8-bit CPU such as 8085 etc. The Control Unit consists of circuitry for retrieving program instructions and for moving data to, from, and through the data-path according to those instructions. It has a pro-gram counter(PC) to hold the address of the next program instruction to fetch and an In-struction register(IR) to hold

the fetched instruction. It also has a timing unit in the form of state registers and control logic. The controller sequences through the states and generates the control signals necessary to read instructions into the IR and control the flow of data in the data path. Generally the address size is specified by the control unit as it is responsible to commu-nicate with the memory. For each instruction the controller typically sequences through several stages, such as fetching the instruction from memory, decoding it, fetching the operands, executing the instruction in the data path and storing the results. Each stage takes few clock cycles.

Microcontroller Just as you put all the major components of a Desktop PC on to a Single Board Com-puter (SBC) if you put all the major components of a Single Board Computer on to a sin-gle chip it will be called as a Microcontroller. Because of the limitations in the VLSI de -sign most of the input/output functions exist in a simplified manner. Typical architecture of such a microprocessor is shown in Fig.

Digital Signal Processor (DSP) These processors have been designed based on the modified Harvard Architecture to handle real time signals. The features of these processors are suitable for implementing signal processing algorithms. One of the common operations required in such applica-tions is array multiplication. For example convolution and correlation require array multi -plication. This is accomplished by multiplication followed by accumulation and addition. This is generally carried out by Multiplier and Accumulator (MAC) units. Some times it is

known as MACD, where D stands for Data move. Generally all the instructions are exe-cuted in single cycle.

The MACD type of instructions can be executed faster by parallel implementation. This is possible by separately accessing the program and data memory in parallel. This can be accomplished by the modified architecture shown in Fig. 4.3. These DSP units gen-erally use Multiple Access and Multi Ported Memory units. Multiple access memory al-lows more than one access in one clock period. The Multi-ported Memory allows multi-ple addresses as well Data ports. This also increases the number of access per unit clock cycle.

Lecture- 4

Topic- ARM-based Systems on a Chip (SoC)

ARM7TDMI ArchitectureThe ARM7TDMI is a 3-stage pipeline, 32-bit RISC processor. The processor architec-tureis Von Neumann load/store architecture, which is characterized by a single data and ad-dress busfor instructions and data. The CPU has two instruction sets, the ARM and the Thumb in-structionset. The ARM instruction set has 32-bit wide instructions and provides maximum perfor-mance.Thumb instructions are 16-bits wide and give maximum code-density. Instructions oper-ate on 8-,16-, and 32-bit data types. The CPU has seven operating modes (see Operating Modes on page17). Each operating mode has dedicated banked registers for fast exception handling. Theprocessor has a total of 37 32-bit registers; including 6 status registers (see Registers).The THUMB ConceptThe ARM7TDMI processor employs a unique architectural strategy known as THUMB,which makes it ideally suited to high-volume applications with memory restrictions, orapplications where code density is an issue. The key idea behind THUMB is that of a superreducedinstruction set. Essentially, the ARM7TDMI processor has two instruction sets:• The standard 32-bit ARM set• A 16-bit THUMB setThe THUMB set‘s 16-bit instruction length allows it to approach twice the density ofstandard ARM code while retaining most of the ARM‘s performance advantage over atraditional 16-bit processor using 16-bit registers. This is possible because THUMB codeoperates on the same 32-bit register set as ARM code. THUMB code is able to provide up to65% of the code size of ARM, and 160% of the performance of an equivalent ARM pro-cessorconnected to a 16-bit memory system.THUMB’s AdvantagesTHUMB instructions operate with the standard ARM register configuration, allowingexcellent interoperability between ARM and THUMB states. Each 16-bit THUMB in-structionhas a corresponding 32-bit ARM instruction with the same effect on the processor model.The major advantage of a 32-bit (ARM) architecture over a 16-bit architecture is itsability to manipulate 32-bit integers with single instructions, and to address a large ad-dress spaceefficiently.When processing 32-bit data, a 16-bit architecture will take at least two instructions to

perform the same task as a single ARM instruction. However, not all the code in a pro-gram willprocess 32-bit data (for example, code that performs character string handling), and someinstructions, like Branches, do not process any data at all. If a 16-bit architecture only has 16-bitinstructions, and a 32-bit architecture only has 32-bit instructions, then overall the 16-bitarchitecture will have better code density, and better than one half the performance of the 32-bitarchitecture. Clearly 32-bit performance comes at the cost of code density. THUMB breaks thisconstraint by implementing a 16-bit instruction length on a 32-bit architecture, making theprocessing of 32-bit data efficient with a compact instruction coding. This provides far betterperformance than a 16-bit architecture, with better code density than a 32-bit architec-ture.THUMB also has a major advantage over other 32-bit architectures with 16-bit instruc-tions.This is the ability to switch back to full ARM code and execute at full speed. Thus criticalloops for applications such as• Fast interrupts• DSP algorithmscan be coded using the full ARM instruction set, and linked with THUMB code. The overhead ofswitching from THUMB code to ARM code is folded into sub-routine entry time. Variousportions of a system can be optimized for speed or for code density by switching be-tweenTHUMB and ARM execution as appropriate.

Block Diagram

ARM instruction Set for Embedded SystemsThe ARM instruction set differs from the pure RISC definition in several ways that make

the ARM instruction set suitable for embedded applications:

■ Variable cycle execution for certain instructions—Not every ARM instruction exe-cutes in asingle cycle. For example, load-store-multiple instructions vary in the number of execu-tioncycles depending upon the number of registers being transferred. The transfer can oc-cur onsequential memory addresses, which increases performance since sequential memory accessesare often faster than random accesses. Code density is also improved since multiple registertransfers are common operations at the start and end of functions.■ Inline barrel shifter leading to more complex instructions—The inline barrel shifter is ahardware component that preprocesses one of the input registers before it is used by aninstruction. This expands the capability of many instructions to improve core perfor-mance andcode density.■ Thumb 16-bit instruction set—ARM enhanced the processor core by adding a sec-ond 16-bitinstruction set called Thumb that permits the ARM core to execute either 16- or 32-bitinstructions. The 16-bit instructions improve code density by about 30% over 32-bit fixed-lengthinstructions.■ Conditional execution—An instruction is only executed when a specific condition has beensatisfied. This feature improves performance and code density by reducing branch in-structions.■ Enhanced instructions—The enhanced digital signal processor (DSP) instructions wereadded to the standard ARM instruction set to support fast 16×16-bit multiplier operations andsaturation.These instructions allow a faster-performing ARM processor in some cases to replace thetraditional combinations of a processor plus a DSP. These additional features have made theARM processor one of the most commonly used 32-bit embedded processor cores.Many of the top semiconductor companies around the world produce products basedaround the ARM processor.

Lecture- 5

Topic- Application-Specific Circuits (ASICs)

ASIC

An application-specific integrated circuit (ASIC) is an integrated circuit (IC) custom-ized for a particular use, rather than intended for general-purpose use. For example, a chip designed to run in a digital voice recorder or a high-efficiency Bitcoin miner is an ASIC. Application-specific standard products (ASSPs) are intermediate between ASICs and industry standard integrated circuits like the 7400 or the 4000 series.

As feature sizes have shrunk and design tools improved over the years, the maximum complexity (and hence functionality) possible in an ASIC has grown from 5,000 gates to over 100 million. Modern ASICs often include entire microprocessors, memory blocks including ROM, RAM, EEPROM, flash memory and other large building blocks. Such an ASIC is often termed a SoC (system-on-chip). Designers of digital ASICs often use a hardware description language (HDL), such as Verilog or VHDL, to describe the func-tionality of ASICs.

Field-programmable gate arrays (FPGA) are the modern-day technology for building a breadboard or prototype from standard parts; programmable logic blocks and program-mable interconnects allow the same FPGA to be used in many different applications. For smaller designs or lower production volumes, FPGAs may be more cost effective than an ASIC design even in production. The non-recurring engineering (NRE) cost of an ASIC can run into the millions of dollars.

Types of ASIC

Standard-cell designs

1. A team of design engineers starts with a non-formal understanding of the re-quired functions for a new ASIC, usually derived from requirements analysis.

2. The design team constructs a description of an ASIC (application specific inte-grated circuits) to achieve these goals using an HDL. This process is analogous to writing a computer program in a high-level language. This is usually called the RTL (register-transfer level) design.

3. Suitability for purpose is verified by functional verification. This may include such techniques as logic simulation, formal verification, emulation, or creating an equivalent pure software model. Each technique has advantages and disadvan-tages, and often several methods are used.

4. Logic synthesis transforms the RTL design into a large collection of lower-level constructs called standard cells. These constructs are taken from a standard-cell library consisting of pre-characterized collections of gates (such as 2 input nor, 2 input nand, inverters, etc.). The standard cells are typically specific to the

planned manufacturer of the ASIC. The resulting collection of standard cells, plus the needed electrical connections between them, is called a gate-level net list.

5. The gate-level netlist is next processed by a placement tool which places the standard cells onto a region representing the final ASIC. It attempts to find a placement of the standard cells, subject to a variety of specified constraints.

6. The routing tool takes the physical placement of the standard cells and uses the netlist to create the electrical connections between them. Since the search space is large, this process will produce a “sufficient” rather than “globally optimal” solu-tion. The output is a file which can be used to create a set of photo masks en-abling a semiconductor fabrication facility (commonly called a 'fab') to produce physical ICs.

7. Given the final layout, circuit extraction computes the parasitic resistances and capacitances. In the case of a digital circuit, this will then be further mapped into delay information, from which the circuit performance can be estimated, usually by static timing analysis. This, and other final tests such as design rule checking and power analysis (collectively called signoff) are intended to ensure that the device will function correctly over all extremes of the process, voltage and tem-perature. When this testing is complete the photo mask information is released for chip fabrication.

Gate-array design

Gate-array design is a manufacturing method in which the diffused layers, i.e. transis-tors and other active devices, are predefined and wafers containing such devices are held in stock prior to metallization—in other words, unconnected. The physical design process then defines the interconnections of the final device. For most ASIC manufac-turers, this consists of from two to as many as nine metal layers, each metal layer run -ning perpendicular to the one below it. Non-recurring engineering costs are much lower, as photolithographic masks are required only for the metal layers, and production cycles are much shorter, as metallization is a comparatively quick process.

Gate-array ASICs are always a compromise as mapping a given design onto what a manufacturer held as a stock wafer never gives 100% utilization. Often difficulties in routing the interconnect require migration onto a larger array device with consequent in-crease in the piece part price. These difficulties are often a result of the layout software used to develop the interconnect.

Pure, logic-only gate-array design is rarely implemented by circuit designers today, hav-ing been replaced almost entirely by field-programmable devices, such as field-pro-grammable gate arrays (FPGAs), which can be programmed by the user and thus offer minimal tooling charges non-recurring engineering, only marginally increased piece part cost, and comparable performance. Today, gate arrays are evolving into structured ASICs that consist of a large IP core like a CPU, DSP unit, peripherals, standard inter -faces, integrated memories SRAM, and a block of reconfigurable, uncommited logic.

This shift is largely because ASIC devices are capable of integrating such large blocks of system functionality and "system-on-a-chip" requires far more than just logic blocks.

In their frequent usages in the field, the terms "gate array" and "semi-custom" are syn-onymous. Process engineers more commonly use the term "semi-custom", while "gate-array" is more commonly used by logic (or gate-level) designers.

Full-custom design

By contrast, full-custom ASIC design defines all the photolithographic layers of the de-vice. Full-custom design is used for both ASIC design and for standard product design.

The benefits of full-custom design usually include reduced area (and therefore recurring component cost), performance improvements, and also the ability to integrate analog components and other pre-designed — and thus fully verified — components, such as microprocessor cores that form a system-on-chip.

The disadvantages of full-custom design can include increased manufacturing and de-sign time, increased non-recurring engineering costs, more complexity in the computer-aided design (CAD) system, and a much higher skill requirement on the part of the de-sign team.

For digital-only designs, however, "standard-cell" cell libraries, together with modern CAD systems, can offer considerable performance/cost benefits with low risk. Auto-mated layout tools are quick and easy to use and also offer the possibility to "hand-tweak" or manually optimize any performance-limiting aspect of the design.

This is designed by using basic logic gates, circuits or layout specially for a design.

Lecture- 6

Topic- Levels of hardware modeling

Behavioral level Register-Transfer Level Gate Level Switch level

Behavioral or algorithmic Level This level describes a system by concurrent algorithms (Behavioral). Each algorithm it-self is sequential meaning that it consists of a set of instructions that are executed one after the other. ‘initial’, ‘always’ ,‘functions’ and ‘tasks’ blocks are some of the ele-ments used to define the system at this level. The intricacies of the system are not elab-orated at this stage and only the functional description of the individual blocks is pre-scribed. In this way the whole logic synthesis gets highly simplified and at the same time more efficient. Register-Transfer Level Designs using the Register-Transfer Level specify the characteristics of a circuit by op-erations and the transfer of data between the registers. An explicit clock is used. RTL design contains exact timing possibility, operations are scheduled to occur at certain times. Modern definition of a RTL code is "Any code that is synthesizable is called RTL code". Gate Level Within the logic level the characteristics of a system are described by logical links and their timing properties. All signals are discrete signals. They can only have definite logi -cal values (`0', `1', `X', `Z`). The usable operations are predefined logic primitives (AND, OR, NOT etc gates). It must be indicated here that using the gate level modeling may not be a good idea in logic design. Gate level code is generated by tools like synthesis tools in the form of netlists which are used for gate level simulation and for backend.

Switch Level This is the lowest level of abstraction. A module can be implemented in terms of switches, storage nodes and interconnection between them. However, as has been mentioned earlier, one can mix and match all the levels of abstraction in a design. RTL is frequently used for Verilog description that is a combination of behavioral and dataflow while being acceptable for synthesis. Instances A module provides a template from where one can create objects. When a module is invoked Verilog creates a unique object from the template, each having its own name, variables, parameters and I/O interfaces. These are known as instances.

The Design Flow This block diagram describes a typical design flow for the description of the digital design for both ASIC and FPGA realizations.

Specification

This is the stage at which we define the important parameters of the system that has to be designed. For example for designing a counter one has to decide its bit-size, whether it should have synchronous reset whether it must be active high enable etc. High Level Design This is the stage at which one defines various blocks in the design in the form of modules and instances. For instance for a microprocessor a high level representation means splitting the design into blocks based on their function. In this case the various blocks are registers, ALU, Instruction Decode, Memory Interface, etc. Micro Design/Low level design Low level design or Micro design is the phase in which, designer describes how each block is implemented. It contains details of State machines, counters, Mux, decoders, internal registers. For state machine entry you can use either Word, or special tools like State CAD. It is always a good idea if waveform is drawn at various interfaces. This is the phase, where one spends lot of time. A sample low level design is indicated in the figure below. RTL Coding In RTL coding, Micro Design is converted into Verilog/VHDL code, using synthesizable constructs of the language. Normally, vim editor is used, and conTEXT, Nedit and Emacs are other choices. Simulation Simulation is the process of verifying the functional characteristics of models at any level of abstraction. We use simulators to simulate the the Hardware models. To test if the RTL code meets the functional requirements of the specification, see if all the RTL blocks are functionally correct. To achieve this we need to write testbench, which generates clk, reset and required test vectors. A sample testbench for a counter is as shown below. Normally, we spend 60-70% of time in verification of design. Synthesis Synthesis is the process in which a synthesis tool like design compiler takes in the RTL in Verilog or VHDL, target technology, and constrains as input and maps the RTL to target technology primitives. The synthesis tool after mapping the RTL to gates, also does the minimal amount of timing analysis to see if the mapped design is meeting the timing requirements. (Important thing to note is, synthesis tools are not aware of wire delays, they know only gate delays). After the synthesis there are a couple of things that are normally done before passing the netlist to backend (Place and Route)

• Verification: Check if the RTL to gate mapping is correct. • Scan insertion: Insert the scan chain in the case of ASIC.

Place & Route Gate-level netlist from the synthesis tool is taken and imported into place and route tool in the Verilog netlist format. All the gates and flip-flops are placed, Clock tree synthesis and reset is routed. After this each block is routed. Output of the P&R tool is a GDS file, this file is used by a

foundry for fabricating the ASIC. Normally the P&R tool are used to output the SDF file, which is back annotated along with the gatelevel netlist from P&R into static analysis tool like Prime Time to do timing analysis. Post Silicon Validation Once the chip (silicon) is back from fabrication, it needs to be put in a real environment and tested before it can be released into market. Since the speed of simulation with RTL is very slow (number clocks per second), there is always a possibility to find a bug

Lecture- 7

Topic- VHDL

VHDL (VHSIC Hardware Description Language) is a hardware description language used in electronic design automation to describe digital and mixed-signal systems such as field-programmable gate arrays and integrated circuits. VHDL can also be used as a general purpose parallel programming language.

VHDL is a hardware description language which uses the syntax of ADA. Like any hardware description language, it is used for many purposes.For describing hardware.As a modeling language. For simulation of hardware.For early performance estimation of system architecture.For synthesis of hardware. For fault simulation, test and verification of designs.

The basic design element in VHDL is called an ‘ENTITY’.An ENTITY represents a template for a hardware block.It describes just the outside view of a hardware module –namely its interface with other modules in terms of inputand output signals.The hardware block can be the entire design, a part of it or indeed an entire “test bench”.A test bench includes the circuit being designed, blocks which apply test signals to it and those which monitor its output.The inner operation of the entity is described by anARCHITECTURE associated with it.

Lecture- 8

Topic- Sensors, A/D-D/A converters

Sensor

A sensor is a device that detects events or changes in quantities and provides a corre-sponding output, generally as an electrical or optical signal; for example, a thermocou-ple converts temperature to an output voltage. But a mercury-in-glass thermometer is also a sensor; it converts the measured temperature into expansion and contraction of a liquid which can be read on a calibrated glass tube.

Sensors are used in everyday objects such as touch-sensitive elevator buttons (tactile sensor) and lamps which dim or brighten by touching the base, besides innumerable applications of which most people are never aware. With advances in micro machinery and easy-to-use microcontroller platforms, the uses of sensors have expanded beyond the more traditional fields of temperature, pressure or flow measurement, for example into MARG sensors. Moreover, analog sensors such as potentiometers and force-sens-ing resistors are still widely used. Applications include manufacturing and machinery, airplanes and aerospace, cars, medicine and robotics.

A sensor's sensitivity indicates how much the sensor's output changes when the input quantity being measured changes. For instance, if the mercury in a thermometer moves 1 cm when the temperature changes by 1 °C, the sensitivity is 1 cm/°C (it is basically the slope Dy/Dx assuming a linear characteristic). Some sensors can also have an im-pact on what they measure; for instance, a room temperature thermometer inserted into a hot cup of liquid cools the liquid while the liquid heats the thermometer. Sensors need to be designed to have a small effect on what is measured; making the sensor smaller often improves this and may introduce other advantages. Technological progress allows more and more sensors to be manufactured on a microscopic scale as microsensors using MEMS technology. In most cases, a microsensor reaches a significantly higher speed and sensitivity compared with macroscopic approaches

Analog-digital convertersAn analog-to-digital converter (ADC, A/D or A2D) converts an analog signal to a digital signal,and a digital-to-analog converter (DAC, D/A or D2A) does the opposite. Such conver-sions arenecessary because, while embedded systems deal with digital values, an embedded system‘ssurroundings typically involve many analog signals. Analog refers to continuously-val-ued signal,such as temperature or speed represented by a voltage between 0 and 100, with infinite possiblevalues in between. "Digital" refers to

discretely-valued signals, such as integers, and in computing systems, these signals are encodedin binary. By converting between analog and digital signals, we can use digital proces-sors in ananalog environment.For example, consider the analog signal of Figure 3.1(a). The analog input voltage varies overtime from 1 to 4 Volts. We sample the signal at successive time units, and encode the currentvoltage into a 4-bit binary number. Conversely, consider Figure 3.1(b). We want to gen-erate ananalog output voltage for the given binary numbers over time. We generate the analog signalshown. We can compute the digital values from the analog values, and vice-versa, us-ing thefollowing ratio: Vmax is the maximum voltage that the analog signal can assume, n is the numberof bits available for the digital encoding, d is the present digital encoding, and e is the presentanalog voltage. This proportionality of the voltage and digital encoding is shown graphi-cally inFigure 3.1(c). In our example of Figure 3.1, suppose Vmax is 7.5V. Then for e = 5V, we have thefollowing ratio: 5/7.5 = d/15, resulting in d = 1010 (ten), as shown in Figure 3.1(c). Theresolution of a DAC or ADC is defined as Vmax/(2n-1), representing the number of voltsbetween successive digital encodings. The above discussion assumes a minimum volt-age of 0V.Internally, DACs possess simpler designs than ADCs. A DAC has n inputs for the digitalencoding d, a Vmax analog input, and an analog output e. A fairly straightforward circuit(involving resistors and an op-amp) can be used to convert d to e.E / Vmax = d / 2n – 1

ADCs, on the other hand, require designs that are more complex, for the following rea-son. Given

a Vmax analog input and an analog input e, how does the converter know what binary value toassign in order to satisfy the above ratio? Unlike DACs, there is no simple analog circuit tocompute d from e. Instead, an ADC may itself contain a DAC also connected to Vmax. TheADC "guesses" an encoding d, and then evaluates its guess by inputting d into the DAC, andcomparing the generated analog output e’ with the original analog input e (using an ana-logcomparator). If the two sufficiently match, then the ADC has found a proper encoding. So nowthe question remains: how do we guess the correct encoding? This problem is analo-gous to thecommon computer-programming problem of finding an item in a list. One approach is sequentialsearch, or "counting-up" in analog digital terminology. In this approach, we start with anencoding of 0, then 1, then 2, etc., until we find a match. Unfortunately, while simple, thisapproach in the worst case (for high voltage values) requires 2n comparisons, so it may be quiteslow. A faster solution uses what programmers call binary search, or "successive ap-proximation"in analog-digital terminology. We start with an encoding corresponding half of the maxi-mum.We then compare the resulting analog value with the original; if the resulting value is greater(less) than the original, we set the new encoding to halfway between this one and the maximum(minimum). We continue this process, dividing the possible encoding range in half at each step,until the compared voltages are equal. This technique requires at most n comparisons. However,it requires a more complex converter. Because ADCs must guess the correct encoding, theyrequire some time. Thus, in addition to the analog input and digital output, they include an input"start" that starts the conversion, and an output "done" to indicate that the conversion iscomplete.

Lecture- 9

Topic- Actuators, Interfacing using UART

Actuator

An actuator is a type of motor that is responsible for moving or controlling a mechanism or system.

It is operated by a source of energy, typically electric current, hydraulic fluid pressure, or pneumatic pressure, and converts that energy into motion. An actuator is the mecha-nism by which a control system acts upon an environment. The control system can be simple (a fixed mechanical or electronic system), software-based (e.g. a printer driver, robot control system), a human, or any other input.

Types of Actuator

Hydraulic

A hydraulic actuator consists of a cylinder or fluid motor that uses hydraulic power to fa-cilitate mechanical operation. The mechanical motion gives an output in terms of linear, rotary or oscillatory motion. Because liquid is nearly incompressible, a hydraulic actua-tor can exert considerable force, but is limited in acceleration and speed.

The hydraulic cylinder consists of a hollow cylindrical tube along which a piston can slide. The term double acting is used when pressure is applied on each side of the pis-ton. A difference in pressure between the two side of the piston results in motion of pis-ton to either side. The term single acting is used when the fluid pressure is applied to just one side of the piston. The piston can move in only one direction, a spring being fre-quently used to give the piston a return stroke.

Pneumatic

Pneumatic rack and pinions actuators for valve controls of water pipes

A pneumatic actuator converts energy formed by vacuum or compressed air at high pressure into either linear or rotary motion. Pneumatic energy is desirable for main en-gine controls because it can quickly respond in starting and stopping as the power source does not need to be stored in reserve for operation.

Pneumatic actuators enables large forces to be produced from relatively small pressure changes. These forces are often used with valves to move diaphragms and so affect the flow of liquid through the valve.

Electric

An electric actuator is powered by a motor that converts electrical energy to mechanical torque. The electrical energy is used to actuate equipment such as multi-turn valves. It is one of the cleanest and most readily available forms of actuator because it does not involve oil.

Thermal or magnetic (shape memory alloys)

Actuators which can be actuated by applying thermal or magnetic energy have been used in commercial applications. They tend to be compact, lightweight, economical and with high power density. These actuators use shape memory materials (SMMs), such as shape memory alloys (SMAs) or magnetic shape-memory alloys (MSMAs).

Mechanical

A mechanical actuator functions by converting rotary motion into linear motion to exe-cute movement. It involves gears, rails, pulleys, chains and other devices to operate. An example is a rack and pinion.

UARTA UART (Universal Asynchronous Receiver/Transmitter) receives serial data and stores it asparallel data (usually one byte), and takes parallel data and transmits it as serial data. Theprinciples of serial communication appear in a later chapter. Such serial communication isbeneficial when we need to communicate bytes of data between devices separated by longdistances, or when we simply have few available I/O pins. Principles of serial communi-cationwill be discussed in a later chapter. For our purpose in this section, we must be aware that wemust set the transmission and reception rate, called the baud rate, which indicates the frequencythat the signal changes. Common rates include 2400, 4800, 9600, and 19.2k. We must also beaware that an extra bit may be added to each data word, called parity, to detect trans-missionerrors -- the parity bit is set to high or low to indicate if the word has an even or odd number ofbits. Internally, a simple UART may possess a baud-rate configuration register, and twoindependently operating processors, one for receiving and the other for transmitting.

The transmitter may possess a register, often called a transmit buffer, that holds data to be sent.This register is a shift register, so the data can be transmitted one bit at a time by shift-ing at theappropriate rate. Likewise, the receiver receives data into a shift register, and then this data canbe read in parallel. Note that in order to shift at the appropriate rate based on the config-urationregister, a UART requires a timer. To use a UART, we must configure its baud rate by writing tothe configuration register, and then we must write data to the transmit register and/or read datafrom the received register. Unfortunately, configuring the baud rate is usually not as simple aswriting the desired rate (e.g., 4800) to a register. For example, to configure the UART ofan 8051, we must use the following equation:Baud rate = (2s mod / 32) *oscfreq / (12 *(256 - TH1))smod corresponds to 2 bits in a special-function register, oscfreq is the frequency of theoscillator, and TH1 is an 8-bit rate register of a built-in timer. Note that we could use a generalpurposeprocessor to implement a UART completely in software. If we used a dedicated general-processor,the implementation would be inefficient in terms of size. We could alternativelyintegrate the transmit and receive functionality with our main program. This would re-quire

creating a routine to send data serially over an I/O port, making use of a timer to control the rate.It would also require using an interrupt service routine to capture serial data coming fromanother I/O port whenever such data begins arriving. However, as with the timer func-tionality,adding send and receive functionality can detract from time for other computations. Knowing the

number of cycles that each instruction requires, we could write a loop that executed the desirednumber of instructions; when this loop completes, we know that the desired time passed. Thisimplementation of a timer on a dedicated general-purpose processor is obviously quite inefficientin terms of size. One could alternatively incorporate the timer functionality into a main program,but the timer functionality then occupies much of the program‘s run time, leaving little time forother computations. Thus, the benefit of assigning timer functionality to a special-pur-poseprocessor becomes evident.

Lecture- 10

Topic- USB, CAN bus

Universal Serial Bus (USB)The motivation for the Universal Serial Bus (USB) comes from three interrelatedconsiderations:Connection of the PC to the telephoneIt is well understood that the merge of computing and communication will be the basis for thenext generation of productivity applications. The movement of machine-oriented and hu-manorienteddata types from one location or environment to another depends on ubiquitous andcheap connectivity.Unfortunately, the computing and communication industries have evolved indepen-dently. TheUSB provides a ubiquitous link that can be used across a wide range of PC-to-tele-phoneinterconnects.Ease-of-useThe lack of flexibility in reconfiguring the PC has been acknowledged as the Achilles‘ heel to itsfurther deployment. The combination of user-friendly graphical interfaces and the hard-ware andsoftware mechanisms associated with new-generation bus architectures have made computersless confrontational and easier to reconfigure. However, from the end user‘s point of view, thePC‘s I/O interfaces, such as serial/parallel ports, keyboard/mouse/joystick interfaces, etc., do nothave the attributes of plug-and-play.Port expansionThe addition of external peripherals continues to be constrained by port availability. The lack ofa bidirectional, low-cost, low-to-mid speed peripheral bus has held back the creative proliferationof peripherals such as telephone/fax/modem adapters, answering machines, scanners, PDA‘s,keyboards, mice, etc. Existing interconnects are optimized for one or two point products. As eachnew function or capability is added to the PC, a new interface has been defined to ad-dress thisneed. The USB is the answer to connectivity for the PC architecture. It is a fast, bi-direc-tional,isochronous, low-cost, dynamically attachable serial interface that is consistent with therequirements of the PC platform of today and tomorrow.

Goals for the Universal Serial BusThe USB is specified to be an industry-standard extension to the PC architecture with a focus onComputer Telephony Integration (CTI), consumer, and productivity applications. The fol-lowingcriteria were applied in defining the architecture for the USB:_ Ease-of-use for PC peripheral expansion_ Low-cost solution that supports transfer rates up to 12Mb/s_ Full support for real-time data for voice, audio, and compressed video_ Protocol flexibility for mixed-mode isochronous data transfers and asynchronous mes-saging_ Integration in commodity device technology_ Comprehension of various PC configurations and form factors_ Provision of a standard interface capable of quick diffusion into product_ Enablement of new classes of devices that augment the PC‘s capability.

CONTROLLER AREA NETWORK (CAN) PROTOCOLThe Controller Area Network (CAN) protocol, developed by ROBERT BOSCH GmbH, offers acomprehensive solution to managing communication between multiple CPUs.The CAN protocol specifies versatile message identifiers that can be mapped to specific controlinformation categories.Communications may occur at a maximum recommended rate of 1 Mbit/sec (roughly a 40meter bus length).The protocol has found wide acceptance in automotive in-vehicle applications as well as manynon-automotive due to its low cost, high performance, and the availability of various CANprotocol implementations.In-vehicle networking protocols must satisfy unique requirements not present in othernetworking protocols such as those found in telecommunications and data processing. Theserequirements include a high level of error detection, low latency times and configurationflexibility.four primary benefits of CAN protocol provides1. First, a standard communications protocol simplifies and economizes the task ofinterfacing subsystems from various vendors onto a common network.2. Second, the communications burden is shifted from the host-CPU to an intelligentperipheral; the host-CPU has more time to run its system tasks.3. Third, as a multiplexed network, CAN greatly reduce wire harness size and eliminatespoint-to-point wiring.4. Lastly, as a standard protocol, CAN have broad market appeal which motivatessemiconductor makers to develop competitively priced CAN devices.An example of an application well-served by the CAN protocol is automotive networking

because many modules are inter-dependent. Sub-systems such as the engine, trans-mission, antilockbraking, and accident avoidance systems require the exchange of particular perfor-manceand position information within a defined communications latency. The engine transmits enginespeed and acceleration parameters to the transmission to allow smoother shifting. Per-haps thetransmission requests the engine to reduce fuel injection before a gear change.CAN is a CSMA/CD-A, or Carrier Sense Multiple Access by Collision Detection usingArbitration protocol. Through a multi-master architecture, prioritized messages of length 8 bytesor less are sent on a serial bus. Error detection mechanisms, such as a 15-bit CyclicalRedundancy Check (CRC), provide a high level of data integrity.The CAN 2.0 protocol was chosen by the SAE Truck & Bus Control and communica-tionsNetwork Subcommittee of the Truck & Bus Electrical Committee to support its `Recom-mendedPractice for Serial Control and Communications Vehicle Network CLASS C'' called the SAEJ1939 specification. The SAE CLASS C passenger car subcommittee is currently evalu-atingCAN, which is a candidate for its high speed networks. Products using CAN Version 2.0 arealready in production. The previous CAN specification, Version 1.2, has been success-fullyimplemented in passenger car, train and factory automation applications since 1989. CANVersion 2.0, which features an extended frame'' with a 29- bit message identifier, broad-ens theapplication base for this protocol by allowing J1850 message schemes to be mapped into theCAN message format. The Intel 82526 was the first implementation of the CAN proto-col, inproduction since 1989. The Intel 82527 is a follow-on to the 82526 which implements CANVersion 2.0, provides greater message handling capability and implements a more flexi-bleinterface to CPUs.

Lecture- 11

Topic- SRAM and DRAM, Flash memory

Memory TypesThere are many different types of memory. In this section we describe some of the morepopular memory devices found in ARM-based embedded systems. Read-only memory (ROM) is the least flexible of all memory types because it contains an image that is per-manently set at production time and cannot be reprogrammed. ROMs are used in high-volume devices that require no updates or corrections. Many devices also use a ROM to hold boot code. Above table Fetching instructions from memory. Instruction size 8-bit memory 16-bit memory 32-bit memory ARM 32-bit 4 cycles 2 cycles 1 cycle Thumb 16-bit 2 cycles 1 cycle 1 cycle Flash ROM can be written to as well as read, but it is slow to write so you shouldn‘t use it for holding dynamic data.

Its main use is for holding the device firmware or storing long term data that needs to bepreserved after power is off. The erasing and writing of flash ROM are completely soft-ware controlled with no additional hardware circuitry required, which reduces the manu-facturing costs. Flash ROM has become the most popular of the read-only memory types and is currently being used as an alternative for mass or secondary storage. Dy-namic random access memory (DRAM) is the most commonly used RAM for devices. It has the lowest cost per megabyte compared with other types of RAM. DRAM is dy-namic— it needs to have its storage cells refreshed and given a new electronic charge every few milliseconds, so you need to set up a DRAM controller before using the mem-ory. Static random access memory (SRAM) is faster than the more traditional DRAM, but requires more silicon area. SRAM is static—the RAM does not require refreshing. The access time for SRAM is considerably shorter than the equivalent DRAM because SRAM does not require a pause between data accesses. Because of its higher cost, it is used mostly for smaller high-speed tasks, such as fast memory and caches. Synchro-nous dynamic random access memory (SDRAM) is one of many subcategories of DRAM. It can run at much higher clock speeds than conventional memory. SDRAM syn-chronizes itself with the processor bus because it is clocked. Internally the data is fetched from memory cells, pipelined, and finally brought out on the bus in a burst. The old-style DRAM is asynchronous, so does not burst as efficiently as SDRAM.PeripheralsEmbedded systems that interact with the outside world need some form of peripheraldevice. A peripheral device performs input and output functions for the chip by connect-ing to other devices or sensors that are off-chip. Each peripheral device usually per-forms a single function and may reside on-chip. Peripherals range from a simple serial communication device to a more complex 802.11 wireless device. All ARM peripherals are memory mapped—the programming interface is a set of memory-addressed regis-ters. The address of these registers is an offset from a specific peripheral base address. Controllers are specialized peripherals that implement higher levels of functionality within an embedded system. Two important types of controllers are memory controllers and interrupt controllers.

Memory Controllers

Memory controllers connect different types of memory to the processor bus. On powerup a memory controller is configured in hardware to allow certain memory devices to be active. These memory devices allow the initialization code to be executed. Some memory devices must be set up by software; for example, when using DRAM, you first have to set up the memory timings and refresh rate before it can be accessed.

Lecture- 12

Topic- Real-Time Task Scheduling: Some important concepts

Real time systemOperating systems are software environments that provide a buffer between the user and the low level interfaces to the hardware within a system. They provide a constant interface and a set of utilities to enable users to utilize the system quickly and efficiently. They allow software to be moved from one system to another and therefore can make application programs hardware independent. Program debugging tools are usually in-cluded which speed up the testing process. Many applications do not require any oper-ating system support at all and run direct on the hardware. Such software includes its own I/O routines, for example, to drive serial and parallel ports. However, with the addi-tion of mass storage and the complexities of disk access and file structures, most appli-cations immediately delegate these tasks to an operating system. The delegation de-creases software development time by providing system calls to enable applicationsoftware access to any of the I/O system facilities. These calls are typically made by building a parameter block, loading a specified register with its location and then execut-ing a software interrupt instruction.

The TRAP instruction is the MC68000 family equivalent of the software interrupt and switches the processor into supervisor mode to execute the required function. It effec-tively provides communication path between the application and the operating system kernel. The kernel is the heart of the operating system which controls the hardware and deals with interrupts, memory usage, I/O systems etc. It locates a parameter block by using an address pointer stored in a predetermined address register. It takes the com-mands stored in a parameter block and executes them. In doing so, it is the kernel that drives the hardware, interpreting the commands passed toit through a parameter block. After the command is completed, status information is writ-ten back into the parameter block, and the kernel passes control back to the application which continues running in USER mode. The application will find the I/O function com-pleted with the data and status information written into the parameter block. The appli-cation has had no direct access to the memory or hardware whatsoever. These param-eter blocks are standard throughout the operating system and are not dependent on the actual hardware performing the physical tasks. It does not matter if the system uses an MC68901 multifunction peripheral or a 8530 serial communication controller to provide the serial ports: the operating system driver software takes care of the dependencies. If the parameter blocks are general enough in their definition, data can be supplied from almost any source within the system, for example a COPY utility could use the same blocks to get data from a serial port and copy it to a parallel port, or for copying data from one file to another. This idea of device independence and unified I/O allows soft-ware to be reused rather than rewritten. Software can be easily moved from one system to another.

Lecture- 13

Topic- Types of real-time tasks and their characteristics

Types of Real-Time Tasks Based on the way real-time tasks recur over a period of time, it is possible to classify

them into three main categories: periodic, sporadic, and aperiodic tasks. In the following, we discuss the important characteristics of these three major categories of real-time tasks.

Periodic Task: A periodic task is one that repeats after a certain fixed time interval. The precise time instants at which periodic tasks recur are usually demarcated by clock interrupts. For this reason, periodic tasks are sometimes referred to as clock-driven tasks. The fixed time interval after which a task repeats is called the period of the task. If Ti is a periodic task, then the time from 0 till the occurrence of the first instance of Ti (i.e. Ti(1)) is denoted by φi, and is called the phase of the task. The second instance (i.e.

Ti(2)) occurs at φi + pi. The third instance (i.e. Ti(3)) occurs at φi + 2 p∗ i and so on.

Formally, a periodic task Ti can be represented by a 4 tuple (φi, pi, ei, di) where pi is the

period of task, ei is the worst case execution time of the task, and di is the relative

deadline of the task. We shall use this notation extensively in future discussions.

To illustrate the above notation to represent real-time periodic tasks, let us consider the track correction task typically found in a rocket control software. Assume the follow-ing characteristics of the track correction task. The track correction task starts 2000 mil-liseconds after the launch of the rocket, and recurs periodically every 50 milliseconds then on. Each instance of the task requires a processing time of 8 milliseconds and its relative deadline is 50 milliseconds. Recall that the phase of a task is defined by the oc-currence time of the first instance of the task. Therefore, the phase of this task is 2000 milliseconds. This task can formally be represented as (2000 mSec, 50 mSec, 8 mSec, 50 mSec). This task is pictorially shown in Fig. 29.3. When the deadline of a task equals its period (i.e. pi=di), we can omit the fourth tuple. In this case, we can represent the task as Ti= (2000 mSec, 50 mSec, 8 mSec). This would automatically mean pi=di=50 mSec. Similarly, when φi = 0, it can be omitted when no confusion arises. So, Ti =

(20mSec; 100mSec) would indicate a task with φi = 0, pi=100mSec, ei=20mSec, and

di=100mSec. Whenever there is any scope for confusion, we shall explicitly write out the parameters Ti = (pi=50 mSecs, ei = 8 mSecs, di = 40 mSecs), etc.

A vast majority of the tasks present in a typical real-time system are periodic. The reason for this is that many activities carried out by real-time systems are periodic in na-ture, for example monitoring certain conditions, polling information from sensors at regu-lar intervals to carry out certain action at regular intervals (such as drive some actua-tors). We shall consider examples of such tasks found in a typical chemical plant. In a chemical plant several temperature monitors, pressure monitors, and chemical concen-tration monitors periodically sample the current temperature, pressure, and chemical concentration values which are then communicated to the plant controller. The in-stances of the temperature, pressure, and chemical concentration monitoring tasks nor-mally get generated through the interrupts received from a periodic timer. These inputs are used to compute corrective actions required to maintain the chemical reaction at a certain rate. The corrective actions are then carried out through actuators.

Sporadic Task: A sporadic task is one that recurs at random instants. A sporadic task Ti can be is represented by a three tuple:

Ti = (ei, gi, di)

where ei is the worst case execution time of an instance of the task, gi denotes the mini-

mum separation between two consecutive instances of the task, di is the relative dead-

line. The minimum separation (gi) between two consecutive instances of the task im-

plies that once an instance of a sporadic task occurs, the next instance cannot occur before gi time units have elapsed. That is, gi restricts the rate at which sporadic tasks

can arise. As done for periodic tasks, we shall use the convention that the first instance of a sporadic task Ti is denoted by Ti(1) and the successive instances by Ti(2), Ti(3),

etc. Many sporadic tasks such as emergency message arrivals are highly critical in na-

ture. For example, in a robot a task that gets generated to handle an obstacle that sud-denly appears is a sporadic task. In a factory, the task that handles fire conditions is a sporadic task. The time of occurrence of these tasks can not be predicted.

The criticality of sporadic tasks varies from highly critical to moderately critical. For example, an I/O device interrupt, or a DMA interrupt is moderately critical. However, a task handling the reporting of fire conditions is highly critical

Aperiodic Task: An aperiodic task is in many ways similar to a sporadic task. An aperiodic task can arise at random instants. However, in case of an aperiodic task, the minimum separation gi between two consecutive instances can be 0. That is, two or more instances of an aperiodic task might occur at the same time instant. Also, the deadline for an aperiodic tasks is expressed as either an average value or is expressed statistically. Aperiodic tasks are generally soft real-time tasks.

It is easy to realize why aperiodic tasks need to be soft real-time tasks. Aperiodic tasks can recur in quick succession. It therefore becomes very difficult to meet the deadlines of all instances of an aperiodic task. When several aperiodic tasks recur in a

quick succession, there is a bunching of the task instances and it might lead to a few deadline misses. As already discussed, soft real-time tasks can tolerate a few deadline misses. An example of an aperiodic task is a logging task in a distributed system. The logging task can be started by different tasks running on different nodes. The logging re-quests from different tasks may arrive at the logger almost at the same time, or the re -quests may be spaced out in time. Other examples of aperiodic tasks include operator requests, keyboard presses, mouse movements, etc. In fact, all interactive commands issued by users are handled by aperiodic tasks.

Lecture- 14

Topic- Task scheduling, Clock-Driven scheduling

Classification of Real-Time Task Scheduling Algorithms Several schemes of classification of real-time task scheduling algorithms exist. A popular scheme classifies the real-time task scheduling algorithms based on how the scheduling points are defined. The three main types of schedulers according to this classification scheme are: clock-driven, event-driven, and hybrid.

The clock-driven schedulers are those in which the scheduling points are determined by the interrupts received from a clock. In the event-driven ones, the scheduling points are defined by certain events which precludes clock interrupts. The hybrid ones use both clock interrupts as well as event occurrences to define their scheduling points.

A few important members of each of these three broad classes of scheduling algo-rithms are the following:

1. Clock Driven • Table-driven • Cyclic

2. Event Driven • Simple priority-based • Rate Monotonic Analysis (RMA) • Earliest Deadline First (EDF)

3. Hybrid • Round-robin

Scheduling algorithmsRate monotonicRate monotonic scheduling (RMS) is an approach that is used to assign task priority for a preemptivesystem in such a way that the correct execution can be guaranteed. It assumes that taskpriorities are fixed for a given set of tasks and are not dynamically changed during exe-cution. Itassumes that there are sufficient task priority levels for the task set and that the task set modelsperiodic events only. This means that an interrupt that is generated by a serial port pe -ripheral ismodeled as an event that occurs on a periodic rate determined by the data rate, for ex-ample.Asynchronous events such as a user pressing a key are handled differently as will be explainedlater.

The key policy within RMS is that tasks with shorter execution periods are given the highestpriority within the system. This means that the faster executing tasks can pre-empt the slowerperiodic tasks so that they can meet their deadlines. The advantage this gives the sys -tem designeris that it is easier to theoretically specify the system so that the tasks will meet their deadlineswithout overloading the processor. This requires detailed knowledge about each task and thetime it takes to execute. This and its periodicity can be used to calculate the processor loading.Deadline monotonic schedulingDeadline monotonic scheduling (DMS) is another task priority policy that uses the near-estdeadline as the criterion for assigning task priority. Given a set of tasks, the one with the nearestdeadline is given the highest priority. This means that the scheduling or designer must now knowwhen these deadlines are to take place. Tracking and, in fact, getting this information in the firstplace can be difficult and this is often the reason behind whydeadline scheduling is often a second choice compared to RMS.Priority guidelinesWith a system that has a large number of tasks or one that has a small number of prior -ity levels,the general rule is to assign tasks with a similar period to the same level. In most cases, this doesnot affect the ability to schedule correctly. If a task has a large context, i.e. it has more registersand data to be stored compared to other tasks, it is worth raising its priority to reduce the contextswitch overhead. This may prevent the system from scheduling properly but can be a worthwhileexperiment.

Lecture- 15

Topic- Earliest Deadline First (EDF) scheduling

Important examples of event-driven schedulers are Earliest Deadline First (EDF) and Rate Monotonic Analysis (RMA). Event-driven schedulers are more sophisticated than clock-driven schedulers and usually are more proficient and flexible than clock-driven schedulers. These are more proficient because they can feasibly schedule some task sets which clock-driven schedulers cannot. These are more flexible because they can feasibly schedule sporadic and aperiodic tasks in addition to periodic tasks, whereas clock-driven schedulers can satisfactorily handle only periodic tasks. Event-driven scheduling of real-time tasks in a uniprocessor environment was a subject of intense re-search during early 1970’s, leading to publication of a large number of research results. Out of the large number of research results that were published, the following two popu-lar algorithms are the essence of all those results: Earliest Deadline First (EDF), and Rate Monotonic Analysis (RMA). If we understand these two schedulers well, we would get a good grip on real-time task scheduling on uniprocessors. Several variations to these two basic algorithms exist.

Another classification of real-time task scheduling algorithms can be made based upon the type of task acceptance test that a scheduler carries out before it takes up a task for scheduling. The acceptance test is used to decide whether a newly arrived task would at all be taken up for scheduling or be rejected. Based on the task acceptance test used, there are two broad categories of task schedulers:

• Planning-based • Best effort

In planning-based schedulers, when a task arrives the scheduler first determines whether the task can meet its deadlines, if it is taken up for execution. If not, it is re-jected. If the task can meet its deadline and does not cause other already scheduled tasks to miss their respective deadlines, then the task is accepted for scheduling. Other-wise, it is rejected. In best effort schedulers, no acceptance test is applied. All tasks that arrive are taken up for scheduling and best effort is made to meet its deadlines. But, no guarantee is given as to whether a task’s deadline would be met.

A third type of classification of real-time tasks is based on the target platform on which the tasks are to be run. The different classes of scheduling algorithms according to this scheme are:

• Uniprocessor • Multiprocessor • Distributed

Uniprocessor scheduling algorithms are possibly the simplest of the three classes of algorithms. In contrast to uniprocessor algorithms, in multiprocessor and dis-tributed scheduling algorithms first a decision has to be made regarding which task needs to run on which processor and then these tasks are scheduled. In contrast to multiprocessors, the processors in a distributed system do not possess shared memory. Also in contrast to multiprocessors, there is no global up-to-date state information avail-able in distributed systems. This makes uniprocessor scheduling algorithms that as-sume central state information of all tasks and processors to exist unsuitable for use in distributed systems. Further in distributed systems, the communication among tasks is

through message passing. Communication through message passing is costly. This means that a scheduling algorithm should not incur too much communication overhead. So carefully designed distributed algorithms are normally considered suitable for use in a distributed system. In the following sections, we study the different classes of schedulers in more detail.

synergy.ac.insynergy.ac.in/intranet/CLASSNOTES/classnoteesd.docx · Web viewSynergy Institute of...

Documents

Transcript of synergy.ac.insynergy.ac.in/intranet/CLASSNOTES/classnoteesd.docx · Web viewSynergy Institute of...