NOC AND BUS ARCHITECTURE: A COMPARISON

(1)

NOC AND BUS ARCHITECTURE: A

COMPARISON

RAJEEV KAMAL

Dept. of Electronics and Communication Engineering., ITM University Gurgaon, Haryana, India*

[email protected]†

NEERAJ YADAV

Dept. of Electronics and Communication Engineering., ITM University Gurgaon, Haryana, India

[email protected]

Abstract:

Network-on-chip designs promise to offer considerable advantages over the traditional bus-based architecture. As continuing scaling of Moore’s law enables ever greater transistor densities, design complexity, power limitations and application convergence networks have started to replace busses in much smaller systems and the enhancement of NoC. This paper summarizes the advantages of the NoC and the limitations of traditional bus based architecture. In this paper we discuss a detailed comparison of area, power, scalability and performance of traditional busses in comparison with NoC.

Keywords:Networks on Chips, Systems on Chips.

1. Introduction

Systems-on-Chip (SoCs) consist of a large number of computing and storage cores that are interconnected by means of single or multiple layers of shared buses. The buses such as ARM’s AMBA bus [1] and IBM’s Core Connect [2] are commonly used communication mechanisms in SoCs. They support a modular design approach that uses standard interfaces and allows for IP re-use [3], but the bus is often the performance bottleneck in a large system. The advantages of the shared bus architecture are simple topology, low area cost, and extensibility. However, such approach has several shortcomings which will limit its use in future SOCs, such as non-scalability, non-predictable wire delay and power consumption and its complication of the design process. In SoC bus architecture there are conflicting tradeoffs between compatibility requirements, driven by IP blocks reuse strategies, and the necessary bus evolutions driven by technology changes: In many cases, introducing new features has required many changes in the bus implementation but more importantly in the bus interfaces (for example, the evolution from AMBA ASB to AHB2.0, then AMBA AHB-Lite, then AMBA AXI), with major impacts on IP reusability and new IP design.

With advances in technology, ICs will have billions of transistors, with feature sizes around 50nm and clock frequencies around 10GHz. [4] In search of a proven solution to scalability worries, researchers turned to wide area networks to get inspiration. Networks-on-chip (NoCs) were the outcome [5], [6] due to the following reasons: energy efficiency and reliability, scalability of bandwidth, reusability and distributed routing decisions. In this paradigm the cores are connected through on-chip routers and send data to each other through packet switched communication. The on-chip communication infrastructure is reusable across systems.

2. NoC Architecture

The basic traditional bus architecture is shown in figure1) the interconnections are dedicated by point-to-point connections, with one wire dedicated to each signal. For large designs, in particular, this has several limitations from a physical design viewpoint. The wires occupy much of the area of the chip, and in nanometer CMOS

*

(2)

technology, interconnects dominate both performance and dynamic power dissipation, as signal propagation in wires across the chip requires multiple clock cycles.

Figure 1: Traditional synchronous bus[1]

Moreover, Busses do not decouple the activities generally classified as transaction, transport and physical layer behaviors. Also, the design and verification times for complex systems continue to grow. As a result networks have started to replace busses.. NoC architecture can be described by its strategy for routing, flow control, switching, arbitration and buffering and the topology used in this architecture Arbitration is responsible to arrange the use of channels and buffers for the messages. Switching is the mechanism that gets data from an input channel of a router and places it on an output channel. The two major problems are facing by the switch architecture are: head-of-line and deadlock [7,8]. These problems can be eliminated by the proposed CTCNoC architecture. The CTCNOC architecture and switching algorithm is to reduce the affect of head-of-line and deadlock problems. In this architecture a small central cache is embedded into every switch to reduce the deadlock problems. It increases the throughput and average latency of the system. The head packet of any buffer can be stored in the caches if any packets calls and it stored in the buffer so the blocked packets to be bypassed without delay.

Figure 2: The Proposed Architecture[2]

(3)

3. The Proposed Switching Algorithm

The arbitrator collects the information from the neighboring switches. If any of the resources is available, the arbitrator will check the cache and lookup tables for the input buffers (in the order of cache, East, North, West and South buffers; obviously cache has the top priority in the sequence) and forward to the output port. The switching algorithm is described as follows:

Input Outputport_av[1:4]; Cache[0:7]; bufferE [0:7]; bufferN [0:7]; bufferW [0:7]; buffers [0:7]; addr [1:4]; do { for (i=1;i<=4;i++) if (outputport_av[i]==1) begin

for (j=0;j<=7;j++) // check cache if (cache[j]==addr[i])

forward cache[j] packet to the output port; break;

if (bufferE[0]==addr[i]) // check bufferE begin forward the packet to output port i; shift the bufferE;

break; end

else if (bufferE[1] or bufferE[2]==addr[i]) begin forward the packet to output port I; shift the bufferE;

break; end

else if ( bufferN[0]==addr[i]) // check bufferN ...

else if ( bufferW[0]==addr[i]) // check bufferW ...

else if ( buffers[0]==addr[i]) // check bufferS ...

end }

4. Comparison with Traditional Busses

4.1. Power

In the NoC architecture, there are several techniques for power management that are difficult to implement with traditional busses. The NoC can be divided into sub-networks .In any specific application if any sub-network does not operate then it can be independently powered off. Thus it reduces the static power which was consumed more in case of bus architecture.

4.1.1 Static Power

It is well known that Static power consumption is roughly proportional to the silicon area. Now since area requirement is lower for the NoC than for traditional busses. Therefore static power consumption should also be lower by the same factor.

4.1.2 Dynamic Power

(4)

4.2. Area

Traditional busses have been perceived as very area efficient because of their shared nature. However with introduction of Pipelining and buffering there is an addition of up to 250K gates. Adding MUX, arbiters, address decoders, and all the state information necessary to track the transaction retries within each bus, the total gate count for a system is higher than 400K gates. Although the NoC implementation also uses switch but the gate count will be approximately half of that required in bus. Thus there is lower area consumption.

4.3. Throughput

The communication bandwidth requirement of a module dictates to a large extent the type of interconnection required in order to achieve the overall system throughput specification. While comparing Bus and NoC we assumed all busses to be 4-byte data wide and thus the aggregate throughput of the entire SoC with 9 clusters is 250*4*9 = 9GB/s for 250MHz operating frequency, assuming one transfer at the same time per cluster. The NoC has a potential 10x throughput advantage over the bus-based approach. The actual ratio may be lower if multi-layered busses are used at the cluster level because multi-layers are similar to crossbars and the added complexity could limit the target frequency.

4.4. Latency

The communication latency is caused by the following two factors: (1) the access time to the bus, which is the time until the bus is granted; (2) the latency introduced by the bus to transfer the data. A bus master, such as a processor, is able to initiate read, write and controls transactions. A slave, such as memory, responds to requests from the master. A SOC design typically would have more than one master and numerous slaves.

4.5. Maximum Frequency Estimation

One of the major differences between a bus and NoC on the basis of physical implementation perspective is that the NoC uses a point-to-point, Globally Asynchronous Locally Synchronous approach, while the bus is synchronous and multipoint. It can be seen that NoC implementations can sustain far higher clock frequencies. In [9] arteries has estimated the maximum frequency for the bus-based transport to be 250MHz while in the NoC case, point-to-point links and GALS techniques greatly simplify the timing convergence problem at the SoC level and thus a operating frequency of 800MHz is achieved.

5. Conclusions

The results obtained for various criteria shows that for designs of the complexity level that we used for the comparison of the NoC and the traditional bus architecture .The results show that, the NoC approach has a clear advantage over traditional busses for nearly all criteria, mostly power and the system throughput.

6. Acknowledgments

The authors gratefully acknowledge the helpful suggestions made by the reviewers.

7. References

[1] ARM: ‘Amba specification’, Technical report, ARM, Revision 2.0, 1999

[2] Coreconnect: ‘Coreconnect bus architecture’, Technical report, IBM Cooperation, 1999

[3] Salminen, E., Lahtinen, V., Kuusilinna, K., and Hamalainen, T.: ‘Overview of bus-based system-on-chip interconnections’. Proc. IEEE Int. Symp. on Circuits and Systems, 26–29 May 2002, vol. 2, pp. II372–II375

[4] International Technology Roadmap for Semiconductors

[5] L. Benini and G. De Micheli, “Networks on chips: A new SoC paradigm,” Computer, vol. 35, no. 1, pp. 70–78, Jan. 2002.

[6] W. J. Dally and B. Towles, “Route packets, not wires: On-chip interconnection networks,” in Proc. 38th Des. Autom. Conf., Jun. 2001, pp. 684–689.

(5)

[8] A. Clouard et al., gUsing Transaction-Level Models in a SoC Design Flow h, in SystemC: Methodologies and Applications, edited by W. Muller, W. Rosenstiel, J. Ruf, Kluwer Academic Publishers, 2003, pp. 29-63.

[9] [9] ARTERIS white paper, “A comparison of Network-on-