Milen Nikolov: On-chip Networks

Routing for On-Chip networks

Fig. 1: Latency vs. Offered Bandwidth for our Optimal Combined Routes: Theoretical and numerical analysis predicts the combined routes should be able to acehive the same throughput as the specialized routes. Here, the ROMM, O1TURN and DOR algorithms are general purpose routing algorithms for on-chip networks. Figures 1a and 1b correspond to two different categories of traffic patterns. The specialized routes were specifically designed to handle optimally each traffic matrix in each category. Instead our combined routes are determined solving a convex optimization problem over the classes of traffic patterns; their performance is remarkably close to the optimal performance given by the specialized routes (essentially the two data sets overlap).

Modern computer chips comprise a collection of cores residing and interconnected on a single chip. These cores require data communication and rely on routing algorithms to determine the paths of flits (flow control digits). Most routing algorithms proposed for on-chip networks are general-purpose algorithms, since they are designed to perform well, over a wide-range of applications. They are either completely oblivious to the application's traffic pattern or they dynamically adapt to an application's traffic pattern through indirect local information about the network's global performance. If the application's traffic pattern is known statically, then application-aware routing can potentially achieve better performance compared to general-purpose algorithms. In on-chip networks, application-aware routes are usually determined by either solving a mixed-integer linear program to generate single-path routes, or by solving the optimal-routing multicommodity flow problem following earlier work done for general networks. We call this the optimal specialized routing problem since the goal is to find the optimal routes for a specific traffic pattern.

Such application-aware routing cannot efficiently handle dynamic changes in the traffic pattern, and this is a serious limitation for modern workloads which often include many application phases. Each application phase exhibits significantly different behavior than other phases.

Suppose we have an uncertainty set of traffic patterns (one per application phase) each of which can occur with a given probability. The traffic patterns in the uncertainty set and the corresponding probabilities of occurrence are usually obtained through static analysis or by profiling the application of interest.

We define the combined optimal routing problem as finding a single set of routes to be used across all application phases that results in the same performance or close to the same performance as if we used specialized routes for each application phase. We formulate the problem as a convex optimization problem, and we use theoretical analysis, numerical analysis, and simulation results to illustrate that this formulation produces optimal solutions when possible and produces nearly-optimal solutions when the optimal solutions are infeasible. Fig. 1 illustrates our on-chip routing simulation results using the Darsim simulator. Please refer to our paper (publication 7) for more details. Code and data available upon request.