Two Phase Non-overlapping Clock Driver

E C E N 4 3 0 3 D i g i t a l V L S I D e s i g n

Clocked Systems

A synchronous design style with a system clock is used in the vast majority of digital sys-tems. Asynchronous designs without a system clock must be carefully designed to guar-antee correct operation even though there can be considerable variation in the delay of circuit elements. Clocked systems, on the other hand, can be designed more easily by meeting simple constraints on the minimum and maximum delay. Specialized circuit ele-ments are used with the clock signal.

Fig. 7.1, p. 384 Latches and Flip-Flops

Clocked Circuit ElementsIn the following, the circuit implementations of latches and flip-flops are grouped accord-ing to the number of global clock lines needed. The trend in modern design has been to use implementations with the minimum number of global clock lines to save area and to limit the engineering effort needed to insure proper distribution of the clock signal to all parts of the chip.

True Single Phase Clock.

Static implementations: Full CMOS latches

D D Q

Q

clkclk

Q

Q

level high sensitive level low sensitive

Can synthesize as 2 complex CMOS gates plus inverter => no. of FETs = 2x6 + 2 = 14

Clocked Systems September 5, 2009 page 1 of 9


Full CMOS gate implementation of master-slave D-FF

D D

Q

Q

clkclk

Q

Q

rising edge sensitive falling edge sensitive

Can synthesize as 4 complex CMOS gates plus inverter => no. of FETs = 4x6 +2 = 26

Differential Pair master-slave D-FF: Fig. 7.29a, p. 413 is faster than full CMOS, but has about the same number of transistors if the fast asynchronous slave latch is used in Fig. 7.29b.

MUX Pass Transistor implementation is not practical for single phase clocking.

clk

D

Q

Q

reduced noise margin here

As we saw earlier, both the nFET and the pFET reduce the noise margin which is not tol-erable in modern processes.

Dynamic implementations: Dynamic registers and latches depend on charge storage on internal nodes in the

circuit for memory storage. If the charge leaks off between clock pulses, the memory state is lost. This puts a minimum frequency requirement on dynamic registers and latches. The increased leakage in deep sub-micron processes has reduced storage times to a µsec or less which means that the minimum operating frequency must be 1MHz or higher. Since current clock rates are in excess of 1GHz, this is not a major problem at present. However, since leakage is anticipated to be worse in the future, dynamic techniques could become impractical.



non-full swing pass transistor Fig. 7.17a, p. 404. The output is usually buffered as shown below since the output can be floating.

Q QD D

clk clk

level high sensitive latch level low sensitive latch

The necessity of using the pFET for the level low sensitive latch makes this implementa-tion unattractive, but it does have the minimum transistor count. Only 3 transistors are required for a latch and 6 for a flip-flop. Also, power consumption is a problem unless some of the techniques discussed previously are used to make it full swing.

True Single Phase Clock (TSPC) Fig. 7.30 p.414. These circuits are slower than other alternatives since the latch and flip-flop implementation require an extra level of logic. Also, the output usually needs buffering since it is a dynamic node.

Pulsed Latches: Pulsed latches are difficult to get to work correctly, and for this reason, have not

been used traditionally. With the emphasis on single global clock lines, pulsed latches have become more popular recently.

Any latch can be used with clock pulses as in the bottom of Fig. 7.2, p. 385. Signals are prevented from going through the latches except for the short period of time when the pulse is high. The trick is to keep the pulses short enough so that signals cannot propagate through more than one latch at a time. On the other hand, the pulse must be long enough to allow propagation of the signals through the latch.

Note that latches take the place of flip-flops. Flip-flops are not needed. Only level high sensitive latches are needed which makes a non-full swing implementation more attractive since pMOS pass FETs are not required.

φp

combinationallogic

Special care must be taken when driving the pulsed clock, φp, over long distances. Recall that the wires act like RC transmission lines which not only delays but spreads out any



edges. Over long clock lines, the edges of the clock pulse gradually merge and the whole pulse can disappear.

clk

One solution to this problem is to have a normal global clock with a square wave where the edges are much farther apart and do not grow together so easily. Then, each pulsed latch includes a local pulse generator. It is relatively straightforward to design the local pulse generator so that it produces pulses of the desired length triggered by an edge on the global clock. Pulse generators Fig. 7.22, p. 408; combined pulse generator and latch Fig. 7.23 (since Q is a dynamic floating node, do not use it without buffering it first).

Pseudo Single Phase Clock

Use of both clk and clk control signals allows the use of transmission gates to make full swing logic to reduce power consumption and improve noise margin. To avoid two global clock lines, clk is usually generated from clk with a local inverter as shown in Fig. 7.20, p. 406.

Static implementations: MUX transmission gate implementation for latches Fig. 7.17e,f,g,h p. 404; regis-

ters Fig. 7.19b p. 405

clk

D

Q

Q

clk

clk

level high sensitive latch

clk

D

Q

Q

clk

clk

level low sensitive latch

Can synthesize as 2 transmission gates plus 2 inverters => no. of FETs = 2x2 + 2x2 = 8 Note that this is a static implementation since charge storage is not used.

Clocked CMOS (C2MOS) Fig. 7.18, p. 405 is just a combined inverter and trans-mission gate. They can be used in static latches also, Fig. 7.17g,h p. 404.



“jamb” latch Fig. 7.17i p. 404 uses a “weak” feedback inverter fabricated with high impedance FETs (small width, possibly longer than minimum length).

Dynamic implementations: latches Fig. 7.17c,d p. 404, registers Fig. 7.19a

Latches synthesized as 1 transmission gate plus 1 inverter => no. of FETs = 2 + 2 = 4

True Two Phase Clock

Two non-overlapping clock signals can be used to insure that no signals can propagate through more than one latch at a time (middle of Fig. 7.2, p. 385). tnonoverlap is kept small compared with the clock period to avoid wasting time. It is very difficult to keep the two clock signals precisely aligned when tnonoverlap is short. To avoid two global clock lines, the two clock signals can be generated from a single global clock. The local two phase clocks can be shared between several nearby latches and/or registers.

clk

φ1

φ2

Static Implementations:

D

Q

Q

φ1

φ2

level high φ1 sensitive latch

D

Q

Q

φ1

φ2


Use different clock phases on alternate stages as shown in middle of Fig. 7.2, p. 385.

Dynamic Implementations:

D Q

φ1


D Q

φ2




Pseudo Two Phase Clocks

Use of both φ1, φ2 and φ1, φ2 control signals allows the use of transmission gates to make full swing logic to reduce power consumption and improve noise margin. To avoid four global clock lines, the four clock signals can be generated from a single global clock. The increased complexity necessary to properly synchronize 4 clock lines makes it necessary to share the local clock signals over several latches and registers.

clk

φ1

φ2

φ2

φ1

Static Implementations: Fig. 7.21, p. 407

Dynamic Implementations:

D Q

φ1


D Q

φ2


φ1 φ2

Register and Latch Timing with Two Phase Clocks

When using the two phase clocks, care must be taken to get the correct timing. φ1 φ2 φ1 φ2, , , are all distinctly different functions of time and must be connected correctly to the transmission gates. φ1 φ2, are non-overlapping high and therefore should be used to control the nFETs in the transmission gates. φ1 φ2, are non-overlapping low and therefore should be use to control the pFETs in the transmission gates.



φ1

t

φ2

φ1

t

t

φ2

t

pos edge φ2 reg

D Q positive edge sensitive

Q positive edge sensitiveD

φ1

φ1

φ2

φ2

φ1

φ1

φ2

φ2

φ1

φ2

t

t

pos edge φ1 reg



It is also necessary to specify which clock signal is used when classifying latches as high or low level sensitive and when classifying registers as positive or negative edge sensitive. Note that a positive edge φ1 sensitive register has a different behavior than a positive edge

φ2 sensitive register and the two should not be mixed.

Timing ConstraintsFig. 7.4, p. 387, Table 7.1, p. 386 Timing Diagrams

Note that latches have setup and hold requirements relative to the clock edge that turns them off, not the clock edge that turns them on.

We will examine the timing constraints necessary to insure correct operation of the three clocking schemes in Fig. 7.2, p. 385. The two phase clocking scheme is the traditional way of using latches. The pulse latch scheme has become more popular as a way to reduce the number of clock lines that must be distributed across the chip.

Flip-Flop Timing Constraints. The maximum delay of a logic block between two flip-flops is constrained as shown in fig. 7.5, and eq 7.1, 7.2, p. 388.

The minimum delay of a logic block between two flip-flops is constrained as shown in fig. 7.9, and eq 7.7, p. 393. Flip-flops are often designed so that the right side of the inequality is negative, meaning that the flip-flops can be connected directly together without any intervening combinational logic.

Two Phase Latch Timing Constraints. A flip-flop can be regarded as being composed of two series latches clocked with complementary signals (fig. 7.3, p. 386). There is no real need to put combinational logic only between latch pairs; combinational logic can be put between all latches (fig. 7.7, p. 391) with the timing requirements in eq. 7.4.

Note that the constraint on the maximum logic delay in eq. 7.4 does not replace the requirement for D1 to stabilize a setup time before the falling edge of φ1 and D2 to stabi-lize a setup time before the falling edge of φ2. In fig. 7.7, the D inputs stabilize near the beginning of the clock phase, but the circuit will still work correctly if the D inputs are delayed less than half of a clock period (fig.7.12, p.397).

The combinational logic in the first half of the clock period “borrows” time from the com-binational logic in the second half of the clock period. The φ2 latch can be “moved” in time to anywhere within the φ2 clock phase. Borrowing can always take place across the internal half cycle boundary. Borrowing is also possible across the clock period boundary if there are no loops in the circuit (for example, pipelines). We will make use of this tech-nique next semester to design high performance pipelines.

Fig. 7.13, p. 397 and eq. 7.10, p. 396 maximum borrowing.



The minimum delay of a logic block between two latches is constrained as shown in fig. 7.10, p. 395 and eq 7.8, p. 394. Increasing the tnonoverlap forces the right side of the ine-quality negative, meaning that the latches can be connected directly together without any intervening combinational logic.

Pulsed Latch Timing Constraints. The maximum delay of a logic block between two pulsed latches is constrained as shown in fig. 7.8, and eq 7.5, 7.6, p. 392. No borrowing is possible if the pulse width is short.

The minimum delay of a logic block between two pulsed latches is constrained as shown in fig. 7.11, p. 395, and eq 7.9, p. 394.


Two Phase Non-overlapping Clock Driver

Documents

Transcript of Two Phase Non-overlapping Clock Driver