Previous Page  15 / 52 Next Page
Show Menu
Previous Page 15 / 52 Next Page
Page Background


Chip Scale Review July • August • 2019


an SoC with a D2W approach would

not only increase memory bandwidth,

but also improve per formance. For

example, having a SoC bonded with

high-density DBI pads to a memor y

device would allow for more memory

bandwidth than utilizing a HBM. This

is achieved by having the potential for

orders-of-magnitude more interconnects

between the die. In addition, assuming

that this was done without the need for

a standardized memory interface, such

as HBMx, low-power double-data rate

(LPDDRx), double-data rate (DDRx),

etc., there would then be a savings of

area, power and latency by eliminating

the need to have this interface serialize

data, consume power and area.

Now, with DBI


Ultra availability, the

need to match the size of the memory

die and the SoC is no longer a constraint.

Assuming the interconnect pitch between

die can be made sufficiently fine-pitch, such

as a 1-3µm pitch, these interconnects should

be considered no different than interconnects

within a die. For example, if a SoC device is

interfacing with a memory die, the interface

between the die would be the same type of

bus one would use if it were connecting two

blocks on the same die.

If we consider a SoC with various blocks

on the die as shown in

Figure 7

, where

each block is 1mm by 1mm in size, we can

see that the block in the center would be

able to use all four edges to communicate

with neighboring blocks. If we also assume

that this die is in 10nm process technology

with 4 signal routing layers (middle

layers in the metal stack) of which two

are north-south oriented routes, and two

are east-west oriented routes, then we can

approximately route 100,000 signal nets

along this block’s edges. Alternatively, if

we were to consider a 3D approach with a

1μm pitch, then 1,000,000 interconnects

would be achievable in this same area.

Based upon this, we can conclude that more

interconnects are possible across a dense

D2D interface than would be between

blocks within the same SoC. Furthermore,

by not having to push the

nets within these blocks

to the peripher y of the

block, routing is reduced.

This may simplify timing

closure at the block level.

Less routing not only means

lower latency, but also lower

power dissipated in routes.

With shorter routes, there

are fewer repeaters needed

as well, which drives down

power and area further. The

connection between the die

should just be considered

a trace, no different than

a connection within a die,

but shorter.

Having the potential for lower latency,

less area, heterogenous process mix,

lower power and higher performance

simultaneously comes with one significant

requirement. The design must be a true

3D design. Much of what is currently

called 3D is just a stack of 2D designs.

Just having through-silicon vias (TSVs)

does not make for a 3D design. Instead,

having signals that can cross a D2D

boundary with a similar electrical load

as would be seen in just moving between

blocks within a die is key to a true 3D

design. Many of the benefits of this 3D

architecture are lost when an interface is

inserted between these die that serializes

the signals, like an LPDDR. Planning

a design that spans more than one die,

potentially more than one technology

process and process design kit (PDK),

and designing it concurrently allows

for a sizable benefit that will exceed

the area and performance savings of

moving to a new node, at a considerably

lower investment as well. A 15-20%

performance benefit is often expected

by advancing to the next node [9], but at

staggering costs. There are also growing

limitations on wiring because the wires

do not scale with the nodes, thereby

affecting the area benefits of moving to

a new node. With a true 3D design, the

wiring burden is reduced as signals do not

need as much lateral movement, especially

if the critical high-density connections

are aligned well vertically. Furthermore,

the lateral routing capacity is extended as

there are routing layers available on both

sides of the DBI or DBI Ultra interface.

If we consider a typical application

processor architect u re as shown in

Figure 8

, we see that there is a need

Figure 6:

a) 20-high die stack; b:) Cross section of a

die stack; and c) DBI Ultra HBM representation.

Figure 7:

Block-to-block communication within a SoC.

Figure 8:

Cache hierarchy in a SoC.