Implemented and programmed four prefetchers as a team of four from research papers in C++ for Texas A&M University's 2024 Electrical Engineering Capstone.

My prefetcher that I implemented, Timing-SKID (T-SKID)[6], achieved up to 91% accuracy, reduced Row Hammer vulnerability by up to 90%, and up to 55% Instruction Per Cycle (IPC) improvement. Team winning an award at the Texas A&M Capstone

Background

  • The goal of this project is to enhance Texas A&M's survey by adding more devices to the list
  • Collect and compare data to research papers to validate our implementations

Prefetching

Prefetching is a technique that loads data into the cache before it’s needed. By predicting future data requirements, prefetching improves program execution speed. Audio Clustering
Figure 1: Prefetchers explained [1]


Row Hammer

Row Hammer is a memory vulnerability where frequent access to a memory row causes bit flips in adjacent rows. This issue is worsened by prefetchers, as repeated access to specific rows increases the risk of data corruption. Audio Clustering
Figure 2: Cache layout [2]

Traces

A trace is a collection of data access patterns. Different prefetchers exploit different patterns, which may range from completely random to highly predictable. Audio Clustering
Figure 3: Texas A&M Trace Speedup Survey [3]


T-SKID: Timing-SKID Prefetcher

The T-SKID prefetcher, which I implemented, is designed to delay prefetching until the optimal moment. This strategic delay prevents unnecessary cache misses and improves cache efficiency.


Design

  • Implemented core data structures:
    • Target Table (TT)
    • Address Prediction Table (APT)
    • Inflight Prefetch Table (IPT)
    • Recent Request Program Counter Queue (RRPCQ)
  • Followed the research paper’s methodology for learning addresses, timing, and issuing prefetches.

Audio Clustering
Figure 4: T-SKID Issuing a prefetch (a), and learning timing (b) [6]


T-SKID Individual Results

When running traces from the T-SKID research paper [6], the following results were observed:

  • Comparison to Research Paper: Results varied. This was due to differences in software versions. However, the general improvement pattern over the baseline was followed.
  • IPC Speedup: Significant improvement in patterned traces. Randomized traces like MCF showed limited gains, as expected. Up to about 22% improvement.
  • Accuracy: High accuracy in workloads with discernible patterns. Up to 91% accurate.
  • Hammer Rate: Managed row activations effectively, aligning with expected behavior. Graph is in Figure 9.

Audio Clustering
Figure 5: IPC Speedup in single-core configuration
Audio Clustering
Figure 6: T-SKID Accuracy


Team Contributions

Beyond T-SKID, the team implemented and validated additional prefetchers:

  • LSTM-based Prefetcher (Nathaniel Bush): A neural network to predict memory access offsets.
  • Tag Correlating Prefetcher (TCP) (John Iler): Utilized tag sequences to predict cache accesses.
  • Managed Irregular Stream Buffer (MISB) (Garvit Dhingra): A structure-based temporal approach using custom caches.

Integrated Results

The team combined results to analyze:

  • Speedup: Achieved measurable IPC improvements over benchmarks. The goal is to get a higher IPC than the baseline.
  • Accuracy: Evaluated useful vs. useless prefetches. This shows how agressive a prefetcher was.
  • Hammer Rate: Compared row activations to a baseline (no prefetching). The goal is to keep this number as low as possible.

Audio Clustering
Figure 7: Integrated Results for IPC Speed Up
Audio Clustering
Figure 8: Integrated Results for Accuracy

Audio Clustering
Figure 9: Integrated Results for Hammer Rate


Conclusion

My prefetcher was able to get the highest IPC and the lowest hammer rate when compared to other prefetchers, while still retaining strong accuracy.

Furthermore, we were able to achieve our original goals:
1. Improved sponsor’s Row Hammer survey.
2. Successfully validated prefetcher implementations with research papers.
3. Achieved performance speedups and analyzed row activation trends.
4. Developed prefetchers that provide a foundation for further research.


References