# **BOARD LEVEL** SIMULATION SPECIALISTS

ICD Stackup Planner - offers engineers/PCB designers unprecedented simulation speed, ease of use and accuracy at an affordable price

- 2D (BEM) field solver precision
- Characteristic impedance, edge-coupled & broadside-coupled differential impedance
- Unique field solver computation of multiple differential technologies per stackup
- Heads-up impedance plots of signal and dielectric layers
- User defined dielectric materials library over 16,250 materials up to 100GHz

<u>ICD PDN Planner</u> - analyze multiple power supplies to maintain low impedance over entire frequency range dramatically improving product performance

- Fast AC impedance analysis with plane resonance
- Definition of plane size/shape, dielectric constant & plane separation for each on-board power supply
- Extraction of plane data from the integrated Stackup Planner
- Definition of voltage regulator, bypass/decoupling capacitors, mounting loop inductance
- Frequency range up to 100GHz
- Extensive Capacitor Library over 5,250 capacitors derived from SPICE models





www.icd.com.au

# DDR3/4 Fly-by vs. T-topology Routing

### by Barry Olney

IN-CIRCUIT DESIGN PTY LTD AUSTRALIA

JEDEC introduced fly-by topology in the DDR3 specification for the differential clock, address, command and control signals. The advantage of fly-by topology is that it supports higher-frequency operation, reduces the quantity and length of stubs and consequently improves signal integrity and timing on heavily loaded signals. Fly-by topology also reduces simultaneous switching noise (SSN) by deliberately causing flight-time skew, between the address group and the point-to-point topology signals,



Figure 1: Double T-topology for clock/address/ command/control routing.

of the data groups. To account for this skew, the DDR3/4 controller supports write leveling. The controller must add the write leveling delays to each byte lane to maintain the strobe to clock requirement at the SDRAMs.

T-topology can be challenging to route, particularly double T-topology with four back-toback SDRAMs as in Figure 1, but it can be advantageous when using multi-die packages. The fly-by topology used in Figure 3 is much easier to route but does not work well with high-capacitance loads, such as LPDDR3 DDP (double die package) and QDP (quad die package) devices. IC fabricators basically arrange dies in parallel (as in Figure 2) to increase package density which can also increase input capacitance by up to four times. Excessive ring-back is often present in the first few nodes of the daisy chain.

This is the reason why the T-topology was developed. However, if you are supporting only SDP (single die package) devices, then the fly-by



Figure 2: SDP and multi-die DDP and QDP memory devices.

is the most straightforward approach. It doesn't matter which topology you use, though—both fly-by and double T-topologies should work fine. If you are using a DDP device, then double-T topology works better than fly-by in terms of delivering a better system margin.

During a write cycle, using the fly-by topology, data strobe groups are launched at separate intervals to coincide with the clock arriving at memory components on the SODIMM or PCB, and must meet the timing parameter between the memory clock and DQS defined as tDQSS of  $\pm$  0.25 tCK. The PCB design process can be simplified using the leveling feature of the DDR3/4. The fly-by, daisy chain topology increases the complexity of the controller design to achieve

h't leveling but fortunately, greatly improves performance and eases board layout for DDR3/4 designs.

It is not that you have to use fly-by write leveling, because it is a feature of DDR3 and DDR4, but rather that you have to use write leveling in order to allow fly-by routing. There is also no reason not to use the write leveling training for a T-topology in order to optimize the write strobe to clock timing. With this you can adjust slight differences in CA timings and avoid hard coding the skews that you normally have to manually take care of on the strobe to clock delay.

Fly-by topology is similar to daisy chain or multi-drop topology, but has very short stubs, to each memory device in the chain, to reduce the reflections. The double T-topology was used for DDR2 and had a downside in that the impedance discontinuities, due to branching along the traces, caused obvious margin losses. T-topology also tends to have overshoots, while the levels for fly-by are terminated and therefore do not reach the full swing voltage rails. Also, the length of the stubs has an effect on the maximum bandwidth of the transmission line. If you are employing high-frequency DDR4, then the bandwidth of the channel needs to be



Figure 3: Fly-by topology for clock/address/command routing.

The PCB Design Magazine • April 2016

40

as high as possible. With conventional T-topology, the trace stub is lengthened with an increase in the number of memory device loads. In some cases, there can be as many as eight memory devices connected to the processor. The resonant frequency or bandwidth is inversely proportional to the stub length.

$$fo = \frac{c}{4 x (stub length) x \sqrt{Er}}$$

where fo is the resonant frequency, c is the speed of light and Er is the dielectric constant

The clock traces should be routed to a longer delay than the strobe traces per byte lane. This is necessary because:

1. The write leveling is capable of adjusting the clock to write data strobe alignment over a wide range, assuming the clock trace has a longer delay than the strobe traces.

2. The read leveling is capable of adjusting the read data eye to read the data strobe over a wide range. The adjustment is per byte, so board skew between the data and data mask signals should be minimized. 3. There is no automatic training for aligning command/address signals to the clock, but a fixed offset is programmable, in the processor, and can be used if necessary. Skew between the clock and address/control signals should be minimized.

Designing a memory interface is all about timing closure. Each signal's timing needs to be compared to the related clock or strobe signal in such a way that the data can be captured on both the rising and falling edge of the strobe—hence the term double data rate (DDR). The increase of data rates, to 4266MT/s for DDR4, has made the timing margin associated with each rising and falling edge much tighter. Even though a direct successor is not currently planned, sources speculate that the 5<sup>th</sup> generation DDR5 will use a serial interface to eliminate the issues associated with parallel busses. Serial busses are easier to scale up and have fewer connections, making PCB design less demanding.

It seems that every datasheet or reference standard you read on DDR design quotes different allowances for the timing budget. At a basic level, the differential clock is the reference signal for the address/control and command

| Parameter                                  | Setup (ps) | Hold (ps) |
|--------------------------------------------|------------|-----------|
| Open window from simulations               | 456        | 631       |
| SDRAM setup and hold times from datasheets | 25         | 100       |
| Slew rate derating if >1V/ns               | 2.3        | 2.8       |
| Timing offset with respect to Vref CA      | 13         | 11        |
| SDRAM derating                             | 88         | 50        |
| Crosstalk                                  | 47         | 42        |
| Controller error - skew                    | 200        | 200       |
| Clock error - jitter                       | 30         | 30        |
| PCB routing tolerance                      | 10         | 10        |
| Margin                                     | 41         | 185       |

Table 1: Example of the overall DDR3-1066 timing budget allowances and resulting margin.

## DDR3/4 FLY-BY VS. T-TOPOLOGY ROUTING

|             | Pamer PD               |                                                   | 380           |                                                                                                                                                        |                              |                           |                               |                                |                                        |                          |                             |                              |                     |
|-------------|------------------------|---------------------------------------------------|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------|---------------------------|-------------------------------|--------------------------------|----------------------------------------|--------------------------|-----------------------------|------------------------------|---------------------|
|             | 80175                  | 010000000                                         | 01 mm   12    | Layer 14 Layer 16 Layer 18 Layer 10L N                                                                                                                 | 4000-13                      |                           |                               |                                |                                        |                          |                             |                              |                     |
| JNITS       |                        | cape   a cape   i                                 | o caper 1 is  | 3/7/20<br>Differential Pairs > 55/100                                                                                                                  | 016                          |                           | sella                         |                                |                                        | Total Board              | 5 Thickness: 6              | 5.32 mil                     |                     |
| layer<br>Vo | Via Span<br>Hole Diame | 5 Description<br>ter                              | Layer<br>Name | Meterial Type                                                                                                                                          |                              | 10 Signal Fl              |                               |                                |                                        |                          |                             |                              | 122                 |
|             | 8 4                    | Soldermask<br>8 Signal<br>Prepreg                 | Top L         | Liquid Photoimageable<br>Conductive<br>N4000-13, 106, Rc=75% (2,5GHz)                                                                                  | 40                           | Signal<br>Layer<br>Number | Layer<br>Inductance<br>(nHim) | Layer<br>Capacitance<br>(pFim) | Layer<br>Propagation<br>Velocity (m/s) | Trace Length<br>(inches) | Trace<br>Inductance<br>(nH) | Trace<br>Capacitance<br>(pF) | Flight Time<br>(ps) |
| 2           |                        | Plane<br>Core                                     | OND           | Conductive<br>N4000-13 : 106 : Re=68.3% (2.5GHz)                                                                                                       | 3.3                          | 1<br>3                    | 221<br>265                    | 132<br>160                     | 1.85e+8<br>1.63e+8                     | 2.3000<br>2.3000         | 12.91<br>15.48              | 7.71<br>9.35                 | 315.78<br>358.40    |
| 3           |                        | Signal<br>Prepreg                                 | MidLa         | Conductive<br>N4203-13EP. 2016; Rc=54% (2.50Hz)                                                                                                        | 3.60                         | 4 7                       | 260<br>260                    | 172<br>172                     | 1.58e+8<br>1.58e+8                     | 2.3000<br>2.3000         | 15,19<br>15,19              | 10.05<br>10.05               | 369.75<br>369.75    |
| 4           |                        | Signal<br>Core                                    |               | Conductive<br>N4000-13 : 1080/106 : Rc+56.9% (2                                                                                                        | 3.6                          | 0<br>10                   | 265<br>221                    | 160<br>132                     | 1.63e+8<br>1.85e+8                     | 2 3000<br>2 3000         | 15.48<br>12.91              | 9.35<br>7.71                 | 358.40<br>315.78    |
| 5           |                        | Plane<br>Prepreg<br>Prepreg<br>Prepreg<br>Prepreg | PWR_          | Conductive<br>N4000-13, 7628; Rc=44% (2 50Hz)<br>N4000-13, 7628; Rc=44% (2 50Hz)<br>N4000-13, 7628; Rc=44% (2 50Hz)<br>N4000-13, 7628; Rc=44% (2 50Hz) | 3.84<br>3.84<br>3.84<br>3.84 |                           | 1×1                           | lelative Signal Pro            | Total<br>pagation                      | 13.8000                  | 87.16                       | 54.22                        | 2087.86             |
| 6           |                        | Plane                                             | PWR_          | Conductive<br>N4000-13 ; 1080/106 ; Rc=56.9% (1                                                                                                        | 3.6                          |                           | ch layer's                    | 1-                             |                                        |                          |                             |                              |                     |
| 7           |                        | Signal<br>Prepreg                                 | MidLa.        | Conductive<br>N4203-13EP: 2016; Rc=54% (2.5GHz)                                                                                                        | 3.60                         | eme.v.a                   | i spana n                     | 4                              |                                        |                          |                             |                              |                     |
| 8           |                        | Signal<br>Core                                    | MidLa         | N4000-13 ; 106 ; Rc=68.3% (2.50Hz)                                                                                                                     | 3.3                          | 2                         | 0.71                          | 4 7                            |                                        |                          |                             |                              |                     |
| 9           |                        | Plane<br>Prepreg                                  |               | Conductive<br>N4000-13: 106; Re=75% (2.5GHz)                                                                                                           | 3.19                         | 2.63                      | 0.71                          | 8-                             |                                        |                          |                             |                              |                     |
| 10          |                        | Signal<br>Soldermask                              | Botto         | Conductive<br>Liquid Photoimageable                                                                                                                    | 4.0                          | 0.5                       | 1.38                          |                                | 100                                    | 20                       | 0                           | 300                          | 400                 |
| <           |                        |                                                   |               |                                                                                                                                                        |                              |                           |                               |                                | 100                                    | Relative Flig            | 58                          |                              |                     |

Figure 4: Relative signal propagation for each signal layer on a 10-layer DDR3 stackup.

signals. Whereas, the differential strobe is the clock for the data and data mask signals. The timing budget for the data byte lanes and the address group need to be determined and must be spread across the processor package, PCB interconnect and the SDRAM packages. The portion of the timing budget, consumed by the controller IC and SDRAM devices, is fixed and cannot be influenced by the PCB designer. The amount of timing budget remaining, after subtracting these fixed portions, is all that is left for the board interconnect—which is not much!

For a DDR3-1066 SDRAM for instance, data from the JEDEC, JESD79-3E DDR3 standard specifies 25ps for Setup and 100ps for Hold time as in Table 1. Ideally, one should use a simulation tool, such as HyperLynx, to measure the setup and hold times to ensure they are within the timing budget. However, if you do not have access to an analysis tool, work on 10ps delay, for the routing tolerance, and you can be assured that you are within the margin allowing for any derating. That is, providing the transmission lines are matched to 40/80 ohms single-ended/differential impedance, the correct drive currents are being used and the waveforms are not distorted. Let's face it, it is not that difficult to route each signal to the exact propagation delay given that you have access to each layer's flight time. It is also worth noting that the margin limits can be increased, if the memory interface is not operating at the maximum frequency and/or if a fast memory device is used.

Now let's consider a typical 10-layer DDR3 stackup as in Figure 4. There are six routing layers and all the DDR3 signals are routed to 40/80 single ended/differential impedance and matched to 2.3 inches in length. In my 2014 column <u>Matched Length does not</u> <u>equal Matched Delay</u>, I cited the difference between the propagation delay of the layers on a PCB. The most dramatic is that of microstrip (outer layers) compared to stripline (inner layers). In this case, the delta between layers 1 and 4 is a massive 54ps—way outside the setup margin.

Whilst stripline layers 3 and 4 have an 11ps difference, even though they are routed to the same length. This is due to the variance in dielectric constant of each layer which changes the velocity of the signals propagation. The difference is graphically displayed in the ICD Stackup Planner's new Relative Signal Propagation dialog. Even the 11ps stripline variance is more than enough to offset the timing, particularly using high-speed DDR3 and DDR4 devices, regardless of the routing topology.

In conclusion, fly-by topology supports higher frequency operation, reduces simultaneous switching noise, reduces the quantity and length of stubs and consequently improves signal integrity and timing. And, most importantly, from a PCB designer's point of view, it eases routing of memory devices dramatically. However, no matter what topology is implemented, one should pay strict attention to the signal propagation, on each layer, ensuring the total flight time of the critical signals match, regardless of length.

#### **Points to Remember**

- Fly-by topology supports higher frequency operation, reduces simultaneous switching noise, reduces the quantity and length of stubs and consequently improves signal integrity and timing.
- The controller must add the write leveling delays to each byte lane to maintain the strobe to clock requirement at the SDRAMs.
- T-topology can be challenging to route but it can be advantageous when using multidie packages with high capacitance loads. Whereas, fly-by topology eases routing of DDR3/4 devices.
- The double T-topology was used for DDR2 and had a downside in that the impedance discontinuities, due to branching along the traces, caused obvious margin losses.
- With conventional T-topology, the trace stub is lengthened with an increase in number of memory device loads.
- The clock traces should be routed to a longer delay than the strobe traces per byte lane.

- Designing a memory interface is all about timing closure.
- The portion of the timing budget, consumed by the controller IC and SDRAM devices are fixed and cannot be influenced by the PCB designer.
- If you work on 10ps, then you can be assured you are within the margin allowing for any derating.
- The margin limits can be increased, if the memory interface is not operating at the maximum frequency and/or if a fast memory device is used.
- Matched length does not equal matched delay. The most dramatic difference being that of microstrip (outer layers) to stripline (inner layers). **PCBDESIGN**

#### References

1. Barry Olney's Beyond Design columns: <u>PCB Design Techniques for DDR, DDR2 &</u> <u>DDR3 Parts 1 & 2; Matched Length does not</u> <u>equal Matched Delay</u>

2. JEDEC Specifications JESD 79F, JESD79-2E, JESD79-3F & JESD79-4

3. <u>The Xilinx Zynq-7000 PCB Design Guide</u>

4. EDN article by Chang Fei Yee: <u>DDR4</u> <u>memory interface; Solving PCB design challenges</u>

5. Micron's <u>TN-41-08: Design Guide for Two</u> <u>DDR3-1066 UDIMM Systems Introduction</u>

# 6. The SI List, available at <u>Freelists.org</u>

7. The ICD Stackup and PDN Planner software can be downloaded from <u>www.icd.com.au</u>



**Barry Olney** is managing director of In-Circuit Design Pty Ltd (ICD), Australia. This PCB design service bureau specializes in board-level simulation, and has developed the ICD Stackup Planner and ICD PDN Planner software. To read past

columns, or to contact Olney, click here.