Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance, trading computing power for reducing data access delay has become a natural choice. In this paper, we present a server-based data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues, such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.
- Cache memory
- Performance measurement
- Simulation of multiple-processor system