HCIP: Hybrid Short Long History Table-based Cache Instruction Prefetcher

##plugins.themes.academic_pro.article.main##

Swapnita Srivastava
P.K. Singh

Abstract

In modern applications, instruction cache misses have become a performance constraint, and numerous prefetchers have been developed to conceal memory latency. With today's client and server workloads, large instruction working sets require more. These working sets are typically large enough to fit in the Last Level Cache (LLC). However, the Level 1 Instruction (L1-I) cache has a high miss rate, which typically prevents the processor front-end from receiving instructions. Instruction prefetching is a latency hiding method that allows the LLC to send instructions to the L1-I cache. In order to design a high-performance cache architecture, prefetching instructions in the L1-I cache is a fundamental approach. When developing an efficient and effective prefetcher, accuracy and coverage are the most important parameters to be considered. This paper proposed a novel Hybrid Short Long History Table-based Cache Instruction Prefetcher (HCIP) for the L1-I cache. The HCIP makes use of a hybrid configuration of the two history-based prefetchers tables that are Long History Table (LST) and Short History Table (SHT). The transitive closure of the control flow graph is the PRE+PC table used in HCIP. In contrast to PIPS and NOPREF, HCIP indicates maximum coverage of 67% for the majority of the benchmarks given.

##plugins.themes.academic_pro.article.details##

How to Cite
Srivastava, S., & Singh, P. . (2022). HCIP: Hybrid Short Long History Table-based Cache Instruction Prefetcher. International Journal of Next-Generation Computing, 13(3). https://doi.org/10.47164/ijngc.v13i3.758

References

  1. Ansari, A., Golshan, F., Lotfi-Kamran, P., and Sarbazi-Azad, H. 2021. Mana: Microarchitecting an instruction prefetcher. arXiv preprint arXiv:2102.01764 . DOI: https://doi.org/10.1109/TC.2022.3176825
  2. Ayers, G., Nagendra, N. P., August, D. I., Cho, H. K., Kanev, S., Kozyrakis, C., Krishnamurthy, T., Litz, H., Moseley, T., and Ranganathan, P. 2019. Asmdb understanding and mitigating front-end stalls in warehouse-scale computers. In Proceedings DOI: https://doi.org/10.1145/3307650.3322234
  3. of the 46th International Symposium on Computer Architecture. 462–473.
  4. Baer, J.-L. 2009. Microprocessor architecture: from simple pipelines to chip multiprocessors. Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511811258
  5. Barroso, L. A., Gharachorloo, K., and Bugnion, E. 1998. Memory system characterization of commercial workloads. In Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No. 98CB36235). IEEE, 3–14. DOI: https://doi.org/10.1145/279361.279363
  6. Christian, A. and Chapa, D. 2021. Instruction prefetchers and cache replacement policies. Ph.D. thesis.
  7. Falsafi, B. and Wenisch, T. F. 2014. A primer on hardware prefetching. Synthesis Lectures DOI: https://doi.org/10.1007/978-3-031-01743-8
  8. on Computer Architecture 9, 1, 1–67.
  9. Ferdman, M., Kaynak, C., and Falsafi, B. 2011. Proactive instruction fetch. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. 152–162. DOI: https://doi.org/10.1145/2155620.2155638
  10. Ferdman, M., Wenisch, T. F., Ailamaki, A., Falsafi, B., and Moshovos, A. 2008. Temporal instruction fetch streaming. In 2008 41st IEEE/ACM International Symposium on Microarchitecture. IEEE, 1–10. DOI: https://doi.org/10.1109/MICRO.2008.4771774
  11. Gober, N., Chacon, G., Jim´enez, D., and Gratz, P. 2020. Temporal ancestry prefetcher. The 1st Instruction Prefetching Championship (IPC1).
  12. Gupta, V., Kalani, N. S., and Panda, B. Run-jump-run: Bouquet of instruction pointer jumpers for high performance instruction prefetching.
  13. Jin, R., Ruan, N., Xiang, Y., and Wang, H. 2011. Path-tree: An efficient reachability indexing scheme for large directed graphs. ACM Transactions on Database Systems (TODS) 36, 1, 1–44. DOI: https://doi.org/10.1145/1929934.1929941
  14. Kanev, S., Darago, J. P., Hazelwood, K., Ranganathan, P., Moseley, T., Wei, G.-Y., and Brooks, D. 2015. Profiling a warehouse-scale computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 158–169. DOI: https://doi.org/10.1145/2749469.2750392
  15. Karp, R. M. 1990. The transitive closure of a random digraph. Random Structures & Algorithms 1, 1, 73–93. DOI: https://doi.org/10.1002/rsa.3240010106
  16. Khan, T. A., Sriraman, A., Devietti, J., Pokam, G., Litz, H., and Kasikci, B. 2020. I-spy: Context-driven conditional instruction prefetching with coalescing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 146– 159. DOI: https://doi.org/10.1109/MICRO50266.2020.00024
  17. Kolli, A., Saidi, A., and Wenisch, T. F. 2013. Rdip: Return-address-stack directed instruction prefetching. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 260–271. DOI: https://doi.org/10.1145/2540708.2540731
  18. Lo, J. L., Barroso, L. A., Eggers, S. J., Gharachorloo, K., Levy, H. M., and Parekh, S. S. 1998. An analysis of database workload performance on simultaneous multithreaded processors. In Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No. 98CB36235). IEEE, 39–50. DOI: https://doi.org/10.1145/279361.279367
  19. Michaud, P. 2020. Pips: Prefetching instructions with probabilistic scouts. In IPC-1-First Instruction Prefetching Championship. 1–4.
  20. Nakamura, T., Koizumi, T., Degawa, Y., Irie, H., Sakai, S., and Shioya, R. 2020. D-jolt: Distant jolt prefetcher. The 1st Instruction Prefetching Championship (IPC1). Ramirez, A., Santana, O. J., Larriba-Pey, J. L., and Valero, M. 2002. Fetching instruction streams. In 35th Annual IEEE/ACM International Symposium on Microarchitecture,
  21. (MICRO-35). Proceedings. IEEE, 371–382.
  22. Reinman, G., Calder, B., and Austin, T. 1999. Fetch directed instruction prefetching. In MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture. IEEE, 16–27.
  23. Ros, A. and Jimborean, A. 2020. The entangling instruction prefetcher. IEEE Computer Architecture Letters 19, 2, 84–87. DOI: https://doi.org/10.1109/LCA.2020.3002947
  24. Seznec, A. 2020. The fnl+ mma instruction cache prefetcher. In IPC-1-First Instruction Prefetching Championship. 1–5.
  25. Spracklen, L., Chou, Y., and Abraham, S. G. 2005. Effective instruction prefetching in chip multiprocessors for modern commercial applications. In 11th International Symposium on High-Performance Computer Architecture. IEEE, 225–236.
  26. Weiss, M. 1992. The transitive closure of control dependence: The iterated join. ACM Letters on Programming Languages and Systems (LOPLAS) 1, 2, 178–190. DOI: https://doi.org/10.1145/151333.151337
  27. Yeh, T.-Y., Marr, D. T., and Patt, Y. N. 1993. Increasing the instruction fetch rate via multiple branch prediction and a branch address cache. In Proceedings of the 7th International Conference on Supercomputing. 67–76. DOI: https://doi.org/10.1145/165939.165956