DesignCon 2017...DesignCon 2017 PCIe Gen4 Standards Margin Assisted Outer Layer Equalization for...
Transcript of DesignCon 2017...DesignCon 2017 PCIe Gen4 Standards Margin Assisted Outer Layer Equalization for...
DesignCon 2017
PCIe Gen4 Standards Margin Assisted Outer Layer Equalization for Cross Lane Optimization in a 16GT/s PCIe Link
Mohammad S. Mobin, Broadcom Ltd
[md.mobin@ broadcom.com, 610.712.5829]
Haitao Xia, Broadcom Ltd
[haitao.xia@ broadcom.com, 408.433.4103]
Aravind Nayak, Broadcom Ltd
[aravind.nayak@ broadcom.com, 610.712.2767]
Gene Saghi, Broadcom Ltd
[gene.saghi@ broadcom.com, 719.533.7110]
Christopher Abel, Broadcom Ltd
[christopher.abel@ broadcom.com, 408.433.4072]
Lane Smith, Broadcom Ltd
[lane.smith@ broadcom.com, 610.712.2066]
Jun Yao, Broadcom Ltd
[[email protected], 408.433.8915]
Abstract PCIe Gen4 operating at 16GT/s is targeted for 28+dB channel insertion loss at Nyquist
frequency without resorting to any forward error correction (FEC), unlike other
contemporary standards [1-3] for storage and networking application. Predicting
challenges in system bring up; a Lane Margining feature is introduced in the PCIe
standard that gives a downstream port (host) access to the internal vertical and horizontal
EYE margin of connected upstream port (device) SerDes/re-timers. No explicit definition
is provided in the standard on how to exploit this information; keeping the door open to
innovations in system-level optimization of the SerDes transceiver. This paper introduces
an outer layer equalization scheme for managing SerDes inner layer equalization to
optimize overall system-level aggregate performance.
Author(s) Biography
Mohammad S. Mobin earned his PhD in electrical engineering from Southern Methodist
University. He also holds MSEE and BSEE from University of South Alabama and
Bangladesh University of Engineering and Technology. For last ten+ years M. S. Mobin
is involved with SerDes architecture definition, system modeling and simulation. He is
deeply involved with channel equalization and timing recovery techniques. Currently he
is a distinguished engineer at Broadcom Ltd. He has 100 US patents granted in his name;
he published various papers in IEEE transactions in Biomedical Engineering, and other
conferences. He represents Broadcom Ltd in PCIe EWG Standards Committee.
Haitao (Tony) Xia is Director of R&D at Broadcom Ltd, leading the research and
development of advanced read channel and Serdes architectures for data storage systems.
Dr. Xia served as Chairman of IEEE Data Storage Technical Committee and President of
Chinese American Information Storage Society (CAISS) in the past. Before his work at
Avago/LSI, Dr. Xia worked at Silicon Valley start-up, Linked-A-Media Devices, on
signal processing and coding in the area of magnetic recording channels and non-volatile
memories. Dr. Xia has published more than 20 articles in peer-reviewed
journals/conferences, and has more than 100 US patent granted to his name. Dr. Xia is an
IEEE Senior Member.
Aravind Nayak is a principal engineer with Broadcom Ltd in Allentown, PA. He holds
PhD (2004) and MS (2000) degrees in electrical engineering from the Georgia Institute of
Technology, Atlanta, GA, and B. Tech (1999) degree in electrical engineering from the
Indian Institute of Technology, Madras, India. His research interest include signal
processing for the magnetic recording read channel and SerDes applications.
Gene Saghi is a principal engineer with Broadcom Ltd. He earned a PhD from Purdue
University, a MEng degree from Cornell University, and a BS degree from Wichita State
University; all in electrical engineering. He has over 30 years of engineering experience
ranging from board-level design to ASIC design to teaching and research in electrical
engineering at the university level. Currently, he is a hardware architect working on IO
controllers and RAID-on-Chip controllers. He represents Broadcom Ltd on the PCI
Express Protocol Working Group committee.
Christopher J. Abel is a Director of Engineering at Broadcom Ltd,
responsible for analog and mixed-signal design for SerDes IP in the Data
Controller Division. He has focused on analog and mixed-signal IC design
for more than 20 years, and on SerDes design for the last 15 years. He
holds more than 20 US patents in the areas of analog design, data converters,
and SerDes. Chris received the Ph.D. degree in electrical engineering from The
Ohio State University, Columbus, OH, USA, in 1995.
Lane A. Smith is a Director of Engineering at Broadcom Ltd, responsible for storage
SerDes design and SAS/SATA Protocol design. He has over 25 years of engineering
experience ranging from design to management of several generations of modem and
Fibre Channel, SAS, SATA, PCIE SerDes designs. He has over 100 US patents granted
in his name in the area of modem, audio codec and SerDes design.
Jun Yao is currently a senior architect engineer with Broadcom Ltd in San Jose.
Before joining Avago, he worked as a postdoctoral researcher at Carnegie Mellon
University, PA. He obtained his PhD degree (2013) in electrical and electronic
engineering from Nanyang Technological University, Singapore, and bachelor degree
(2008) from Harbin Institute of Technology, China. His research interests include signal
processing, equalization, phase-locked loop, detection and decoding algorithms for the
hard disk drive (HDD) read channel and high-speed Serializer/Deserializer (SerDes)
communications.
Introduction
This paper introduces a concept of outer loop equalization for PCIe cross lane
transmitter/receiver (transceiver) optimization using PCIe Gen4 Lane Margin capability
introduced in the PCIe Gen4 specification [1]. Classical PCIe transceiver lane
optimization, referred to here as inner loop equalization, is done on a per-lane basis,
without paying heed to the condition of the neighboring lanes. The neighboring lane may
have excess operating margin and act as an aggressor against its neighbor. On the other
hand, a neighboring lane could be a victim and could benefit if the excess operating
margin of its aggressor neighboring lane could be reduced. The goal of the introduction
of outer loop equalization is to provide a means to holistically and robustly optimize a
system across all lanes, rather than being limited to individual lane optimization. We
present a brief introduction to PCIe Gen4 lane margin, a best-mode-usage model of the
scope of lane margin hardware/software capabilities, an application scenario of lane
margin for outer loop equalization, and potential risks associated with lane margin
capability and their mitigations. An overview of the sources of stress in a PCIe Gen4
system is presented. We discuss options for handling XTLK from mitigation point of
view not from equalization point of view [13-25]. We define the roles and boundaries of
inner and outer equalization loops and elaborate how they complement each other
towards the common goal of system level optimization. We demonstrate the expected
performance improvement that can be achieved using outer loop equalization.
PCIe Gen4 Lane Margin and an Application Model
Building systems centered on PCI Express (PCIe) Gen 3 that are reliable and can be
manufactured in high volume has proven to be difficult. PCIe Gen 4 systems will pose
even bigger challenges knowing:
• Channels are being pushed to operating limits by frequency doubling
• Aggressive channel loss specifications without introducing any error correction
scheme.
• A multitude of platforms and devices must be produced in high volumes and each
varies differently over the range of process, voltage, and temperature (PVT).
• While re-timers help, they currently lack controllability and observability.
• Experience has shown that determining link health in production systems should
be done while running actual traffic.
To address these challenges, PCI Express Gen 4 specifies and requires non-destructive
lane margining that takes place while the link is in L0. The key usage cases for lane
margining are:
• ASIC/board/system design
o Assessing ASIC/board/system signal integrity under operating condition
o Managing risk and cost trade-offs during development
• Manufacturing and system integration
o Maintaining process and component control with true operating EYE
margin feedback
o Catching subtle hardware defects during manufacturing
o Testing assembled and configured systems
• Add-in card/module qualification
o Testing of independently developed systems and modules after system
integration
o Ensuring electrical interoperability of an integrated system
• Problem diagnosis in the field
o Determining whether signal integrity is a root cause or not
o Remotely assessing signal integrity in systems displaying a problem
On the flip side, Lane margining implementation comes with great responsibility on
hardware and software vendors. It poses a vulnerable entry point for viruses that can
potentially shut down the entire PCIe echo system, by putting PCIe systems in to margin
mode, at a given time. Hardware and software protection through BIOS and timers needs
to be in place to combat such risks.
PCIe Lane Margining allows the determination of operating margin at every receiver
(Rx(A), Rx(B), Rx(C), Rx(D), Rx(E), and Rx(F)) from Downstream Port to Upstream
Port and back as shown in Figure 1 . The margin information includes both voltage and
time, in either direction from the current receiver operating position. Software controls
and obtains status information about a specific receiver by way of the Lane Margin and
Control Status register that corresponds to the port associated with the receiver. Retimers
do not contain the infrastructure to respond to configuration packets; so instead, control is
conveyed to a Retimer using Control SKP Ordered Sets in the downstream direction. The
Retimer returns status and error information using Control SKP Ordered Sets in the
upstream direction [1, 4-7].
Figure 1: Overview of PCIe Gen4 Lane Margin scheme
Control of Lane Margining takes the form of commands that direct the receiver to move
the sampling point a specified number of steps in time left or right, or a specified number
of steps in voltage up or down. Each receiver reports its capabilities in response to
software queries. These capabilities include Maximum Voltage Offset, Maximum Timing
Offset, Number of Voltage Steps, Number of Timing Steps, Timing Sampling Rate,
Voltage Sampling Rate, Maximum Lanes (maximum number of lanes that can be
margined at the same time), Independent Error Sampler, actual data samplers (indicates if
margining will produce errors in the data stream or not) , and so on. Figure 2 shows the
allowed ranges for the Maximum Timing Offset and the Maximum Voltage Offset. 500 mV
-500 mV
-50 mV
50 mV
.2 UI-.2 UI-.5 UI .5 UI
Max Voltage Offset
Max Voltage Offset
Max Timing OffsetMax Timing Offset
Figure 2: PCIe Two dimensional Gen4 Lane margin in voltage and horizontal direction
The PCIe Gen4 Base Specification makes allowances for receivers that contain an
independent data sampler in addition to the actual data sampler, or receivers that contain
only the actual data sampler. When an independent data sampler is present, errors are
detected and reported by the SerDes. In the absence of an independent data sampler,
errors are detected in the Link by counting the number of detected parity errors and the
number of entries in to the LTSSM Recovery state. While the specification allows
margining in terms of moving the data sample location, the actual margining method is
implementation specific. For example, timing/voltage margining can be achieved by
injecting an appropriate amount of stress/jitter to the data sample keeping it at its fixed
location or by adjusting the data sampler or an independent sampler phase and voltage
offset.
1. Start margining by sending `Step Margin to Timing Offset’ command, which is
written into control fields of `Lane Margin Control Status’ register.
2. Read status fields of `Lane Margin Control and Status’ register.
If Margin Status == 11b (Nak) & step count valid, fail receiver margining.
Else if Margin Status == 00b (Too many errors), fail receiver margining.
Else if Margin Status == 01b (Setup), read again every 1 msec.
If after 200 msec Margin Status == 01b, fail receiver margining.
Else if Margin Status == 01b (In progress) go to Step 3.
3. Wait desired amount of time for margining to happen while sampling status
fields of `Lane Margin Control and Status’ register periodically. If Margin Status ==
00b (Too many errors), fail receiver margining.
Else if Time reached and status field still 10b, go to Step 4.
No response
Fail
Fail
4. If more margining to do, Broadcast `No Command’.
5. If more margining to do, go to Step 1.
7. Margining failed – if previous step result
successful, previous margining step is the
receiver margin.
6. Broadcast `Go to Normal Settings’, `No
Command’, `Clear Error Log’, `No Command’
commands.
Done
No response
Start
Figure 3:PCIe Gen4 Lane Margin usage model flow diagram
The full blown margining process for a receiver would include timing margining in both
directions and voltage margining in both directions as shown in Figure 2. It should be
noted that support for voltage margining is optional. Figure 3 shows an example of a
typical flow diagram for the Lane Margining process in one direction for timing
margining. Each time through the flow, the timing offset is increased. Prior to this
process, software will set an Error Count Limit. During the Lane Margining process, if
the Error Count Limit is reached, Lane Margining is halted and the receiver is returned to
its pre-margining settings. The margin reported by software is the setting previous to the
setting that failed.
An example MAC-PHY interface is presented in Figure 4. The PHY does all of the
physical measurement of the EYE. The MAC does the protocol level communication
encapsulation and de-encapsulation. Usually, the command and status interface between
the MAC and PHY is implemented as defined in the Intel PIPE specification [8]. An
example decoded command and status interface to the PHY is shown in Figure 4 and is
detailed in the PCIe Gen4 specification.
Figure 4: An example device side MAC-PHY signal interface with detailed Logical Sub-
Block-SerDes interface
PCIe Gen4 System Stress Sources and Its
Mitigation Strategy
A very simple PCIe system is used to identify the stress sources in a system as shown in
Figure 5.
Figure 5: XTLK and coupling in the PCIe section of a system
The dominant sources of system level impairments/coupling/reflection in a PCIe Gen4
system can be identified as [9-11]:
(a) XTLK sources in a Gen4 system are in device packages, connectors, trace run
length and separation, transmitter amplitude, rise/fall time, transmitter de-
emphasis, board isolation between transmitter and receiver layers on opposite
sides
(b) Reflection sources in a Gen4 are at cable/trace junctions, via/via stub,
connectors, PCB imperfections, roughness, and termination
(c) Increased insertion loss in PCIe Gen4 reduces the dB differences between
Nyquist insertion loss and the base XTLK floor making a PCIe system
susceptible to XTLK induced errors.
(d) Un-Compensable insertion loss deviation due to periodic/a periodic nulls and
resonance
(e) Random noise/pulse width jitter and periodic jitter from various sources in the
system
A flexible reconfigurable PCIe system may have 8, 16, 32 etc. lanes or somewhere in
between to support high end graphics to low end application space. A PCIe controller
groups a set of 1xN1, 1xN2 etc. lanes to support multiple simultaneous operating devices.
Such Bifurcation of lanes in to multiple branches of a group of lanes creates interfaces for
simultaneous operating devices application space. Each application knows only about its
own lanes. Only the host has the global visibility of all lanes and it can initiate any XTLK
mitigation scheme using outer layer equalization introduced in this paper.
Due to physical layout of the lanes, from an edge connector to a device end or the host
end, some lanes will travel a longer electrical distance than the others. Unless electrical
distance is adjusted with wider and thinner traces the loss seen by one lane will be
different from the other lane. The longer lanes will have higher insertion loss compared
to the shorter lanes as shown in Figure 6 making shorter lanes (carrying un-attenuated
high energy signal) dominant aggressors compared to longer traces (carrying attenuated
weak signal).
Figure 6: Example XTLK between lanes at dense routing from edge connector to ASIC in
an AIC
Usually at a given Host/Device end the egress and ingress lanes are on opposite side of
the board to reduce coupling among transmitter and receiver lanes. But the intra lane
interaction along the run length in the Add in Card (AIC) or motherboard, at the
connector or at the package junction is unavoidable. The low loss lane carrying relatively
higher signal swing can be a high impact aggressor to its higher loss neighboring lane
that is carrying relatively lower signal swing.
The far end crosstalk (FEXT) from the link partner transmitters to the local receivers
travel along the physical traces and continues to couple with other lanes, but its high
frequency contents attenuates at the same rate as the channel loss along the path. As a
result the FEXT high frequency impact is diminished at the receiver input pin. On the
other hand the near end crosstalk (NEXT) from the local transmitter to local receiver
behaves differently. In a good design the transmitters and the receivers are on the
opposite sides of the board resulting in a good isolation and the average XTLK floor is
lower. But the spectral energy disparity between low and high frequency content is less.
Any significant presence of high frequency NEXT due to poor package isolation between
lanes impacts the already attenuated signal spectrum around Nyquist frequency as shown
in Figure 7.
There is a need for proper handling of the NEXT spectrum for system level optimization.
The power sum of the NEXT and FEXT is a function of the transmitter launch amplitude,
pre/post cursor de-emphasis, and slew rate of the transmitter signal. Adjustment of these
transmitter parameters are good candidate for the proposed outer layer equalization.
Based on the slew rate, signal spectrum can be at much higher frequency beyond Nyquist
frequency and make a system vulnerable to NEXT (with higher energy floor beyond
Nyquist frequency) and to a lesser degree to FEXT (with lower energy floor beyond
Nyquist frequency).
Figure 7: NEXT and FEXT example for an eight Lane system
In most high speed receivers a continuous time linear equalizer (CTLE) is used to offer
high pass filtering to flatten signal spectrum of the signal seen at the data samplers to
undo the low pass effect of the channel. This enhances channel output signal spectrum of
the desired signal around Nyquist but it also enhances the signal spectrum of the NEXT
(that has less relative attenuated high frequency spectrum) around Nyquist frequency.
Receiver design needs to have configurability in CTLE so that in low XTLK system, with
XTLK detected by suitable algorithms, it can open up CTLE high frequency boost range
for sufficient equalization done with CTLE with wide bandwidth to preserve CTLE group
delay behavior that aid in better CTLE adaptation behavior. This strategy will be CTLE
heavy and light in DFE utilization. On the other hand, if excess XTLK is detected then
the CTLE high frequency boost needs to be limited to minimize NEXT spectrum
amplification around Nyquist frequency and the BW needs to be reduced so that CTLE
does not amplify out of band NEXT spectrum. This strategy will be light in CTLE
contribution and heavy in DFE utilization without amplifying undesired XTLK. This
configuration will distort CTLE group delay behavior around Nyquist frequency and
proper steps needs to be taken to preserve desired adaptation algorithm behavior. There
are various techniques to sense XTLK level in a system. Explanation of those algorithms
are beyond the scope of this paper.
Introduction to Inner and Outer Loop
Equalization for System Level Optimization
Classical Lane by Lane SerDes transceiver optimization is system agnostic. A SerDes
optimizes the far end partner transmitter and its own local receiver on a per lane basis. It
pays no attention to its neighboring lane. Such lane by lane SerDes transceiver
equalization is called inner loop equalization as shown in Figure 8. A system agnostic
inner equalization may result in excess margin for lanes with shorter traces while lanes
with the longer traces may be performance limited, all within a single Link. In such a
scenario at least one weakest PCIe lane in a PCIe link will be a single point fail source.
Figure 8: Classical inner equalization loop does not optimize cross lane performance
A system agnostic lane by lane optimization is acceptable for lower data rate application.
However, at PCIe Gen4 data rates with spec limit insertion loss that approaches 30dB and
without significant XTLK floor reduction, the PCIe Gen4 inner loop equalization
approach will be challenged to meet the desired system target BER performance. Unlike
in other standards, PCIe Gen4 does not have any Forward Error Correction (FEC)
protection. Fortunately, PCIe Gen4 standardized a “Lane Margin” feature allowing a host
to detect the operating EYE margin of the repeaters or an end device at the normal L0
operating state [1]. The standardization of lane margin opens the door for many
innovative system level optimization methodologies. In this paper we address one
application of the PCIe Gen4 Lane Margin feature to trade EYE margin between high
margin lane and margin starved lanes through adjusting the transmitter amplitude, slew
rate, and pre/post cursor of aggressor lanes shown in Figure 9 using an outer loop
equalization scheme as shown in Figure 10.
Figure 9: Cross lane Transmitter control using outer loop equalization
The outer loop equalization helps XTLK-sensitive lanes with longer traces by adjusting
the TX amplitude, or the slew rate, or the TX pre/post emphasis of lanes with shorter
traces appropriately. This in turn reduces the XTLK floor in the system and helps the
operating EYE margin of more stressed lanes as shown in Figure 10.
Figure 10: Lane margin assisted outer loop equalization
Before the outer loop equalization, the long trace EYE margin was low and the short
trace EYE margin was excessive. The outer loop equalization detects current state of the
lane operating EYE margin using PCIe standardized scheme. The host instructs short
trace Lane(s) to increase the rise/fall time on both ends of the lane to reduce high
frequency contents in the signal spectrum beyond Nyquist frequency. Then the host
instructs the short trace Lane(s) to reduce the transmitter amplitude on both ends using
PCIe defined vendor specific messaging understood by both sides until long trace Lane(s)
operating margin improves and short trace Lane(s) still maintains healthy operating
margin. As a last resort the short trace de-emphasis can also be adjusted to reduce overall
system XTLK floor by reducing the transmitter output signal energy in the system and at
the same time allowing its link partner receiver not to apply excess CTLE high frequency
boost.
The flow diagram of the inner and outer equalization loop is presented in Figure 11.
Initially the system XTLK level is sensed using known algorithms on a per Lane basis. In
a high XTLK environment the Lane optimization is configured for DFE heavy
optimization scheme. In a low XTLK environment the Lane optimization is configured
for CTLE heavy optimization scheme. Using conventional equalization methods, each
lane will be optimized using back channel adaptation by a receiver at each end of the link
in conjunction with its link partner transmitter [12]. This level of equalization is overall
system performance agnostic.
Figure 11: Inner and outer equalization loop sequencing
After initial lane by lane equalization, host directed cross Lane optimization is performed
using the PCIe Gen4 margin scheme. A host will identify excess margin lanes and margin
starved lanes. Then, host controlled outer equalization will direct host and device side
transmitters to adjust the transmitter launch amplitude, boost, and slew rate such that
excess-margin lanes will give up some margin and margin starved lanes gain reasonable
operating margin. The idea is to adjust the TX amplitude, boost, and slew rate to
minimize the overall system XTLK contribution from high-margin lanes to help out
margin starved lanes gain sufficient operating margin due to reduced system impairment
floor obtained through margin-assisted outer equalization loop.
A qualitative view of such iterative outer loop equalization scheme is presented in Figure
12 using the optimization flow shown in Figure 11. The classical inner loop equalization
scheme reconfigures the receiver at each Lane and performs Lane based inner loop
equalization to optimize each lane. It then transitions to L0 normal operating state in
LTSSM and performs Lane by Lane margining to determine EYE margin of each lane. If
all lanes have good operating margin, the outer equalization loop ends. If low and high
margin lane is detected, then instruct high margin lane to reduce its amplitude on both
side of the Lane. This process will reduce system XTLK floor. Perform PCIe Lane
margin in all Lanes again. Repeat the Lane margin and transmitter adjustment process
until stressed Lane EYE margin becomes acceptable without degrading the EYE margin
of good Lane below acceptable margin threshold. In case a balanced system performance
is not reached the outer loop equalization cycles through transmitter slew rate and de-
emphasis adjustment as well. The order of transmitter parameter control is
implementation specific or specific to a system need. Ideally one would try adjusting the
slew rate first and then adjust the amplitude, and then adjust the de-emphasis of the
transmitter. A qualitative view of the stages of EYE balancing through the outer loop
equalization is presented in Figure 12.
Figure 12: Long and short channel operating EYE margin balancing with outer
equalization loop
Outer loop equalization implementation model
using Lane Margin
The Lane Margining commands and responses introduced in the PCI Express Gen4 Base
Specification include a vendor-defined command and response that can be used to control
outer-loop equalization. The relevant portion of the Margining Commands and
Corresponding Responses Table presented in the PCI Express Gen4 Base Specification is
shown in Table 1.
Table 1: Vendor-defined Margin Command as presented in the PCIe Gen4 Base Specification
Command Response
Margin
Command Margin Type
[2:0]
Valid
Receiver
Number(s)
[2:0]
Margin
Payload [7:0]
Margin Type
[2:0]
Margin
Payload [7:0]
Vendor
Defined
101b 001b through
101b
Vendor
Defined
101b Vendor
Defined
In the above table, the Valid Receiver Number field is interpreted as shown below. Refer
back to Figure 1 to see the relative locations of the Transmitters and Receivers within a
link that optionally includes Retimers. The value of Cmd[2:0] from Table 2 determines
whether the ultimate target of a command is a receiver or a transmitter.
Encoding Receiver Transmitter
001b Rx(A) Tx(B)
010b Rx(B) Tx(C)
011b Rx(C) Tx(D)
100b Rx(D) Tx(E)
101b Rx(E) Tx(F)
For outer-loop equalization, the vendor-defined entry is defined shown in Table 2.
Table 2: Vendor-defined margin command for outer loop equalization
Command Payload Bit
Definition
Description Response Payload Bit
Definition
Description
Payload[7:5] =
Cmd[2:0]
111b = Tx Amplitude
110b = Tx Slew Rate
101b = Pre Emphasis
100b = Post Emphasis
011b, 010b = Reserved
Payload[7:5] =
Status[2:0]
Status[2:0] =
011b = NAK
010b = In Progress
001b = Setup
000b = Idle/Finished
Command Payload Bit
Definition
Description Response Payload Bit
Definition
Description
001b = Perform Rx
Adaptation
000b = No Command
Payload[4] = Increase Specifies whether to
increase or decrease the
selected attribute. When
Cmd[2:0] is 100b
through 111b,
0b = Decrease
1b = Increase
Otherwise, set to 0b
Payload[4] = MaxValue 1b = Maximum value
(in positive or negative
direction) reached
0b = Maximum value
not reached
Payload[3:0] = Amt[3:0] Specifies amount of
increase/decrease
for Tx Amplitude, Tx
Slew Rate, Pre
Emphasis, or Post
Emphasis.
Otherwise set to 0000b
Payload[3:0] = Amt[3:0] When Cmd[2:0] is 100b
through 111b, Response
Payload[3:0] reflects
Command Payload[3:0].
Otherwise, Response
Payload[3:0] = 0000b
• For commands: Tx Amplitude, Tx Slew Rate, Pre Emphasis, and Post Emphasis,
the target of the command is a transmitter
• For command Rx Adaptation, the target of the command is a receiver
• When the amount of the specified increase or decrease takes the transmitter
beyond its maximum supported value, the Transmitter goes to its maximum value
and reports that it has reached its maximum value in Response Payload[4].
• As with lane margining described in the PCIe Gen4 Base Specification, the Host
controls outer loop equalization of its own transmitters and receivers using PCI
Configuration TLPs to write and read its Lane Margining at the Receiver
capability registers.
• As with lane margining described in the PCIe Gen4 Base Specification, the Host
controls outer loop equalization of the upstream port in the downstream
component using PCI Configuration TLPs to write and read the downstream
component Lane Margining at the Receiver capability registers.
• As with lane margining described in the PCIe Gen4 Base Specification, the Host
controls outer loop equalization of Retimer transmitters and receivers using
Control SKP Ordered Sets.
Outer loop equalization proceeds as follows:
1. System software determines which transmitter/receiver pairs within a link should
be adjusted.
2. System software sends commands that target the first set of transmitters (all at
the same address, but on different lanes) to increase/decrease Tx Amplitude, Tx
Slew Rate, Tx Pre Emphasis, and Tx Post Emphasis as needed.
3. System software polls the status associated with the commands until all targeted
transmitters return a NAK (indicating an error was encountered) or Idle/Finished
status.
4. System software then sends commands that target the receiver associated with
the targeted transmitters. The receivers are commanded to perform Rx
Adaptation.
5. System software polls the status associated with the commands until all targeted
transmitters return a NAK or Idle/Finished status.
6. Steps 2 through 5 are repeated until system software is finished making
adjustments.
7. Then, system software performs lane margining to determine if there is adequate
margin (refer to Figure 11). If the margin is now adequate, the process is
complete. Otherwise, these steps can be repeated with remaining transmitter
parameters.
Simulation Results
A simulation model of the inner and outer equalization study is presented in Figure 13,
Figure 13: Inner and outer EYE equalization simulation model
Simulation is conducted with long channel, medium, and short channels using a typical
active CTLE. The receiver is optimized stand alone at the pre-back-channel operation
phase. Then, during the back-channel operation phase, transmitter figure of merit (FOM)
optimization is performed by doing a grid search of the pre-cursors and post cursors.
Long-channel simulations were done with transmitter launch set to 1300mV and 800mV.
The vertical margin and horizontal margin contour plots are plotted in Figure 14 and
Figure 15. It is evident that 800mV launch has significantly relative low voltage margin
compared to 1300mV, indicating higher transmitter launch voltage is desirable for long
channel. After certain TX launch amplitude compression starts to kick in due to reduced
signal headroom in limited power supply analog front end.
Short-channel simulation is done with transmitter launch set to 1300mV and 800mV. The
vertical margin and horizontal margin contour plot shown in Figure 16 and Figure 17. They
show that even after reducing the transmitter amplitude to 800mV the operating margin is
far better than long-channel operating EYE margin.
Having EYE height margin beyond certain threshold does not buy any jitter tolerance
performance. In order to reduce the operating system power the transmitter launch
amplitude can be reduced for a short channel. This in turn reduces the system XTLK
floor due to reduced transmitter launch amplitude in short channel, benefiting stressed
long channel operating margin.
Thus the application of an outer-equalization loop has the potential to offer improved
long-channel performance and to reduce overall system operating power by lowering the
transmitter amplitude in shorter reach lanes.
Long Channel Simulations
Figure 14: Long channel horizontal (%UI) and voltage (peak mV) margin as a function of
CP1/CM1 for optimized receiver settings with high TX amplitude
Figure 15: Long channel horizontal (%UI) and voltage (peak mV) margin as a function of
CP1/CM1 for optimized receiver settings with low TX amplitude
Short Channel Simulation
Figure 16: Short channel horizontal (%UI) and voltage (peak mV) margin as a function
receiver CP1/CM1 for fixed optimized receiver settings with high TX amplitude
Figure 17: Short channel horizontal (%UI) and voltage (peak mV) margin as a function
receiver CP1/CM1 for fixed optimized receiver settings with low TX amplitude
A crosstalk sensitivity study is performed to evaluate the impact of transmitter amplitude,
transmitter slew rate, and transmitter pre and post cursor on EYE height and Width
margin at a victim receiver. In this study one NEXT channel closes to the victim receiver
and one FEXT channel closest to the victim receiver is used as shown in Figure 18.
Figure 18: Worst case NEXT and FEXT frequency response used in the simulation
Simulations were done with the PCIe reference receiver to quantify the XTLK impact at a
victim receiver due to transmitter amplitude, slew rate, and de-emphasis at the
aggressors. This study helps us in identifying the priority for adjusting the transmitter
amplitude, slew rate, and pre/post emphasis in XTLK mitigation scheme using outer
equalization layer exploiting the Lane Margin feature of the PCIe Gen4 transceivers. The
results of the study are presented next.
XTLK Path Transmitter Amplitude Sensitivity Study
The XTLK impact due to transmitter launch amplitude is determined by exciting the
selected NEXT and FEXT with varying transmitter amplitude with near perfect rise time
signal (sharp rise time is used to excite XTLK spectrum beyond Nyquist frequency). The
contour plot of the calculated RMS XTLK in mV is presented in Figure 19. Contour plot
shows that with increasing transmitter amplitude at aggressor transmitter the RMS XTLK
is increasing at the victim receiver data decision latch input. It also shows that the NEXT
XTLK impact is slightly higher than the FEXT XTLK as one would expect. In this S-
parameter example a NEXT aggressor transmitter launching at 825mV creates 2.1mV
RMS XTLK at victim receiver, while a FEXT aggressor transmitter launching at
1060mV creates 2.1mV RMS XTLK at victim receiver.
Figure 19: XTLK path transmitter amplitude impact on victim receiver
XTLK Path Slew Rate Sensitivity Study
The XTLK impact due to transmitter rise/fall time is determined by exciting the selected
NEXT and FEXT S-parameter by sweeping the 20%-80% rise fall time and by setting the
transmitter amplitude to 1000mV, pre and post emphasis to 0dB. The contour plot of the
calculated RMS XTLK in mV is presented in Figure 20. Sharp rise fall time creates high
frequency spectral contents in the transmitted signal beyond Nyquist frequency. The
contour plot clearly demonstrates increasing RMS XTLK with decreasing rise/fall time.
In case of FEXT the high frequency spectral contents will be attenuated as a function of
channel length induced insertion loss, but in case of NEXT from neighboring aggressors
the high frequency spectral energy does not get attenuated due to the close proximity of
the aggressor channels. The contour plots shows that 0.15UI rise/fall time from NEXT
injects 2.7mV RMS XTLK in to the victim receiver, while for FEXT 0.45UI rise time
injects the same amount of RMS XTLK in to the victim receiver. As a result, NEXT path
rise/fall time equalization through outer equalization has higher priority than the FEXT
path rise/fall equalization.
Figure 20: XTLK path transmitter slew rate impact on victim receiver
XTLK Path Transmitter Pre/Post De-Emphasis Sensitivity Study
The XTLK impact of the aggressor channel transmitter de-emphasis due to pre-cursor or
post-cursor equalization is presented in Figure 21. In general due to application of
transmitter de-emphasis the average signal amplitude out of the transmitter is reduced and
that helps in containing the system XTLK floor, as long as the rise/fall time of the
Nyquist signal is not very aggressive. For all practical purposes one would reduce the
rise/fall time to have effective equalization coming out if the transmitter, but care must be
given not to be too aggressive with rise/fall time in a system where XTLK sensitivity is
an issue due to isolation issues in a package, at the connectors, or on the PCB board itself.
Figure 21: XTLK path transmitter pre/post emphasis impact on victim receiver
Now we demonstrate the impact of the RMS mV XTLK on equalized signal EYE height
(noise margin) and width (jitter margin) using the PCIe Gen4 reference CTLE. In this
study we use XTLK generated using RMS mV FEXT and NEXT using 1000mV, 0.32UI
slew rate, and 0dB pre and 0dB post emphasis. We scale the NEXT and FEXT XTLK
between 0 to 5mV and determine EYE margin using the PCIe reference CTLE and tap
limited PCIe 2-tap DFE. We also sweep reference CTLE peak frequency between 3GHz
to 10GHz to demonstrate the effect of out of band NEXT/FEXT on EYE margin. In a
practical receiver CTLE peaking frequency will vary over process-voltage-temperature
(PVT) corner.
In Figure 22 and Figure 23 we present simulations results with 800mV and 1300mV to
evaluate the impact of RMS XTLK floor on EYE margin. We also show the impact of
CTLE peaking frequency positioning on EYE margin. Left hand side plots we show that
as the RMS XTLK increases the EYE height is decreasing. We also show that as the
peaking frequency is increasing the EYE height is also decreasing by not suppressing the
high frequency XTLK in this passive CTLE. Plotting differently, the right hand plots
show that optimal CTLE peaking frequency is at around 3GHz-5GHz. As the CTLE
peaking frequency is increasing the EYE height is decreasing allowing more XTLK to
come in band with the signal as indicated in Figure 22.
Figure 22: Effect of RMS XTLK and CTLE peaking frequency on EYE height at 800mV
transmitter amplitude
Figure 23: Effect of RMS XTLK and CTLE peaking frequency on EYE height at 1300mV
transmitter amplitude
Figure 24: NEXT has higher frequency content and has larger performance degradation
compared to FEXT
In Figure 24, the impact of NEXT and FEXT terms are separated to test performance
sensitivity to these individual crosstalk components. NEXT has relative greater amount of
high frequency content compared to FEXT as shown in Figure 18. In Figure 24, A1 (A2)
denotes eye height degradation due to FEXT (NEXT) of 3mV RMS level with CTLE
peak frequency of 10GHz and B denotes the extra loss with NEXT vs. FEXT. As CTLE
peak frequency increases beyond the optimal setting, signal mis-equalization component
increases equally for both the FEXT and NEXT cases whereas the crosstalk-related
component increases faster with NEXT. As a result, it would be better to reduce NEXT
component seen by the weakest link in the system by adjusting launch amplitude or
rise/fall time in the neighboring crosstalk contributors. Performance can also be improved
by better optimizing CTLE peak frequency in the victim path receiver.
In this study we have presented quantitative evaluation of the impact of the transmitter
amplitude, slew rate, and pre/post emphasis on the RMS XTLK floor in a system
consisting of a plurality of PCIe lanes. Through simulation we have demonstrated the
impact of the RMS XTLK floors vs. EYE margin at the victim receiver. We also showed
the impact of the CTLE peaking frequency in victim receiver EYE margin in presence of
XTLK. The outer equalization layer optimizes the overall system performance through
controlling the transmitter signal amplitude, slew rate, and pre and post de-emphasis such
that lanes with excess margin can operate at reduced transmitter amplitude and increased
rise/fall time for reducing system XTLK noise floor. It also allows increasing transmitter
amplitude and slew rate for the most stressed lanes if needed.
Conclusion PCIe Gen4 introduces lane margining as a required feature that gives a downstream port
access to the SerDes internal operating EYE margin. The implications of this feature are
far reaching and are expected to enable system-level equalization schemes for managing
XTLK, system operating margin tuning for high-volume manufacturing, and field
diagnosis and tuning of marginal systems through outer-layer equalization, on site or
remotely. A new wave of innovation is enabled to guide inner loop SerDes optimization
assisted by outer loop system optimization. In this study we explored the protocol aspect,
algorithm aspect, and performance aspect of the outer loop equalization using Lane
Margin scheme.
References [1] PCIe Gen4 standards draft 0.7 May, 2016, in approval state by the working group
[2] SAS4 standard sas4r06, Working draft American National Standard, Project T10/BSR
INCITS 534, revision 06, 11 May 2016
[3] IEEE Standard for Ethernet, IEEE Computer Society, IEEE Std 803.3TM, 2015
[4] PCIe Base: Margining capabilities, 19 May, 2016. Working Draft, Revision 4.0, 0.7 Maturity
Level.
[5] RX Margining Extended Capability, 14 January, 2016, Working Draft, Base 4.0 – 0.65
Maturity Level.
[6] Alternate RX Margin Proposal, Gerry Tablot, Dan Froelich, Debendra Das Sharma, PCI-SIG,
October 13, 2015.
[7] RX Margining during L0 in PCIe 4.0, PCI-SIG, February, 24, 2016.
[8] PHY Interface for the PCI Express, SATA, and USB 3.1 Architectures, version 4.3 (With
ongoing definition for PCIe Gen4 Margin Interface, unpublished as of November, 2016)
[9] Mohammad Mobin, et al, “SerDes Steady State Adaptation Challenges in Existing
SAS/SATA and Emerging PCIe Gen4/SAS4 Application and their Solutions with Pattern
Discriminator Constrained Adaptation”, DesignCon 2015.
[10] Mohammad Mobin, et al, “On the validity of lumped jitter approximation in the statistical
analysis of SerDes”, DesignCon 2013.
[11] Mohammad Mobin, et al, “Comparative Evaluation of 16 GT/s PCIe Gen-4 and 22.5 GT/s
SAS-4 Standards Evolution and their Impact on Future Systems and SerDes”, DesignCon 2016.
[12] Mohammad Mobin, et al, “TX back channel adaptation algorithm and protocol emulation
with application to PCIe, SAS, FC, and 10GBASE-KR”, DesignCon 2012
[13] Deutsch, Alina. "Electrical characteristics of interconnections for high-performance
systems." Proceedings of the IEEE 86.2 (1998): 315-357.
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=659489
[14] Buckwalter, James F., and Ali Hajimiri. "Cancellation of crosstalk-induced jitter." IEEE
journal of solid-state circuits 41.3 (2006): 621-632.
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1599531
[15] Pelard, Cattalen, et al. "Realization of multigigabit channel equalization and crosstalk
cancellation integrated circuits." IEEE journal of solid-state circuits39.10 (2004): 1659-1670.
http://www.bioee.ee.columbia.edu/courses/upload/Bibliography/pelard_jssc_2004.pdf
[16] Sham, Kin-Joe, et al. "FEXT crosstalk cancellation for high-speed serial link design." IEEE
Custom Integrated Circuits Conference 2006. IEEE, 2006.
[17] Nazari, Meisam Honarvar, and Azita Emami-Neyestanak. "A 15-Gb/s 0.5-mW/Gbps two-tap
DFE receiver with far-end crosstalk cancellation." IEEE Journal of Solid-State Circuits 47.10
(2012): 2420-2432. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6248710
[18] Lu, Jian-Hao, and Shen-Iuan Liu. "A merged CMOS digital near-end crosstalk canceller and
analog equalizer for multi-lane serial-link receivers."IEEE Journal of Solid-State Circuits 45.2
(2010): 433-446. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5405140
[19] Jung, Hae-Kang, et al. "A 4 Gb/s 3-bit parallel transmitter with the crosstalk-induced jitter
compensation using TX data timing control." IEEE Journal of Solid-State Circuits 44.11 (2009):
2891-2900. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5308594
[20] Lee, Seon-Kyoo, et al. "A 5 Gb/s single-ended parallel receiver with adaptive crosstalk-
induced jitter cancellation." IEEE Journal of Solid-State Circuits48.9 (2013): 2118-2127.
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6548094
[21] Oh, Taehyoun, and Ramesh Harjani. "A 6-Gb/s MIMO crosstalk cancellation scheme for
high-speed I/Os." IEEE Journal of Solid-State Circuits 46.8 (2011): 1843-1856.
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5871283
[22] Oh, Taehyoun, and Ramesh Harjani. "A 12-Gb/s multichannel I/O using MIMO crosstalk
cancellation and signal reutilization in 65-nm CMOS." IEEE Journal of Solid-State Circuits 48.6
(2013): 1383-1397. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6492120
[23] Hur, Youngsik, et al. "Equalization and near-end crosstalk (NEXT) noise cancellation for
20-Gb/s 4-PAM backplane serial I/O interconnections." IEEE transactions on microwave theory
and techniques 53.1 (2005): 246-255.
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1381695
[24] San K. Chhay , et al. “Crosstalk Mitigation in Dense Microstrip Wiring Using Stubby Lines”
Intel Corporation, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6703506
[25] Jimmy Hsu, et al. “Broad-side Crosstalk Mitigation in Dual-Stripline Designs” Intel
Corporation,
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6249107
View publication statsView publication stats