Description of Simulation Infrastructure

The Journal of Instruction-Level Parallelism
The 2^nd JILP Workshop on Computer Architecture Competitions (JWAC-2):
Championship Branch Prediction
Sponsored by: Intel, JILP,
in conjunction with:
ISCA-38 http://isca2011.umaine.edu/

Description of the Simulation Infrastructure

The provided evaluation framework includes a set of traces, and a driver that reads traces and simulates the behavior of a branch predictor. The framework models a simple out-of-order core with the following basic parameters:

o 256-entry reorder buffer, and three schedulers: an interger scheduler with 64 entries and an FP and load/store schedulers with 32-entries each.

o The pocessor has a 14-stage, 4-wide pipeline except in the execution stage where it has a 12-wide execution scheduler (6 int, 4 FP, and. 2 load.store).

o The memory model will consist of a 2-level cache hierarchy, consisting of an L1 split instruction and data caches, and an L2 last level cache. All caches support 64-byte lines. The L1 instruction cache is 32KB 8-way set associative cache. The L1 data cache is 32KB 8-way set-associative. The L2 data cache is a 4 MB, 8-way set-associative cache.

Traces

The trace set includes 40 traces, classified into 5 categories: CLIENT, INT (Integer), MM (Multimedia), SERVER and WS (Workstation). Traces are approximately 50 million micro-ops long and will include both user and system activity. The traces include both value and timing information of each micro-op from a detailed out-of-order timing simulator. The timing simulator is configured with perfect branch prediction so that there are no wrong path micro-ops in the traces. In order to achieve maximum transparency, all traces are provided to contestants, and will be used to rank the contestants and crown the champion.

Driver

The driver will read a trace and call the branch predictor through a standard interface. The predictor can decide when and what predictions to provide to the driver. The driver will record whether the predictor was correct and when the prediction was provided. Then a misprediction penalty value is calculated for each branch. The misprediction penalty is measured by the number of cycles that the fetch unit was on wrong path. At the end of the run, the driver will provide two final scores (condition branches and indirect branches) of a predictor represented in Misprediction Penalty per Kilo Instructions (MPPKI). The framework will include an example predictor to help guide contestants.

The driver will provide the predictor with both static and dynamic information about the instructions and micro-ops in the trace. A static instruction includes one or more static micro-ops. For each micro-op, static information: the instructions program counter, micro-ops type (BR_CONDITIONAL, BR_INDIRECT, BR_CALL, LOAD, STORE,etc.), and register source and destination specifiers will be passed to the predictor. Dynamic information: results, load and store addresses, and branch outcomes will be made available on different pipeline stages.

The organizers believe that this framework allows the implementation of most published predictor algorithms in addition to providing some room for innovation. We cannot provide training and reference traces needed for profile-based predictors and the traces do not contain wrong path information, which, unfortunately, excludes some predictors.