Extending Lifetime Reliability Model for Multi-Threaded Architectures

##plugins.themes.academic_pro.article.main##

Harini Sriraman
Pattabiraman Venkatasubbu

Abstract

As the processor technology points scale down, the hardware reliability of the processor, due to aging emerges as a significant design constraint. Currently it is estimated that the aging servers needs to be replaced ideally every three years in a data center kind of environment. This incurs huge cost. Under this scenario, it is very essential to explore methods to delay aging of processor cores. In this paper, we analytically establish the relation between life-time of the processor core and its multi-threaded workload. This paper answers the question ’how much multi-threading for how much reduction in aging?’. To answer the above question, we propose an analytical model to extend the relation between multi-threading and failure rate of processor core. To measure the delay in aging of processor, we use Aging Factor that is derived in our analytical model. We analyze the aging factor of the processor core at the granularity of structural units for different applications. Based on the proposed analytical model, a software tool AgeEstimate is designed that will take as input, power, temperature values for single threaded environment and estimate aging factor for multi-threaded environment. The results obtained from AgeEstimate for ALPHA based out-of order processor core are analyzed in this paper. With multi-threading, the instantaneous Mean Time to Fail (MTTF) for a workload can reduce up to 3%. Considering a base MTTF of 1 billion hours, the improvement due to multi-threading will be around 3* 10^7 hours.

##plugins.themes.academic_pro.article.details##

How to Cite
Harini Sriraman, & Pattabiraman Venkatasubbu. (2018). Extending Lifetime Reliability Model for Multi-Threaded Architectures. International Journal of Next-Generation Computing, 9(1), 51–65. https://doi.org/10.47164/ijngc.v9i1.140

References

  1. Brooks, D., Dick, R. P., Joseph, R., and Shang, L. 2007. Power, thermal, and reliability modeling in nanometer-scale microprocessors. Ieee Micro 27, 3.
  2. Chakraborty, A. and Pan, D. Z. 2013. Skew management of nbti impacted gated clock trees. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 6, 918{927.
  3. Fadishei, H., Deldari, H., and Naghibzadeh, M. 2014. Pre-execution power consumption prediction of computational multithreaded workloads. Cluster computing 17, 4, 1323{1333.
  4. Gizopoulos, D., Psarakis, M., Adve, S. V., Ramachandran, P., Hari, S. K. S., Sorin, D., Meixner, A., Biswas, A., and Vera, X. 2011. Architectures for online error de- tection and recovery in multicore processors. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011. IEEE, 1{6.
  5. Hari, S. K. S., Li, M.-L., Ramachandran, P., Choi, B., and Adve, S. V. 2009. mswat: low-cost hardware fault detection and diagnosis for multicore systems. In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on. IEEE, 122{132.
  6. Huang, L. 2011. Lifetime reliability of multi-core systems: Modeling and applications. Ph.D. thesis, Chinese University of Hong Kong.
  7. Huang, W., Rajamani, K., Stan, M. R., and Skadron, K. 2011. Scaling with design constraints: Predicting the future of big chips. IEEE Micro 31, 4, 16{29.
  8. Kerrison, S. and Eder, K. 2015. Energy modeling of software for a hardware multi- threaded embedded microprocessor. ACM Transactions on Embedded Computing Systems (TECS) 14, 3, 56.
  9. Liao, M.-H., Hsieh, C.-P., and Lee, C.-C. 2017. Systematic investigation of self-heating e ect on cmos logic transistors from 20 to 5 nm technology nodes by experimental thermoelectric measurements and nite element modeling. IEEE Transactions on Electron Devices 64, 2, 646{648.
  10. Saravanan, V., Chandran, S. K., Punnekkat, S., and Kothari, D. 2011. A study on fac- tors in uencing power consumption in multithreaded and multicore cpus. WSEAS Trans- actions on Computers 10, 3, 93{103.
  11. Sorin, D. J. 2009. Fault tolerant computer architecture. Synthesis Lectures on Computer Architecture 4, 1, 1{104.
  12. Srinivasan, J. 2006. Lifetime reliability aware microprocessors. Tech. rep.
  13. Tang, A., Yang, Y., Lee, C.-Y., and Jha, N. K. 2015. Mcpat-pvt: Delay and power modeling framework for nfet processor architectures under pvt variations. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23, 9, 1616{1627.
  14. Thiyagalingam, J. and Trefethen, A. E. 2014. Understanding the interactions hard- ware/software parameters on the energy consumption of multi-threaded applications.
  15. Zu, Y., Lefurgy, C. R., Leng, J., Halpern, M., Floyd, M. S., and Reddi, V. J. 2015. Adaptive guardband scheduling to improve system-level eciency of the power7+. In Pro- ceedings of the 48th International Symposium on Microarchitecture. ACM, 308{321.