To effectively design a computer system for the worst case power consumption scenario, system architects often use hand-crafted maximum power consuming benchmarks at the assembly language level. A comparison with an exhaustive exploration of a large power design space demonstrates that StressMaker is very effective in automatically generating stressmarks in a limited amount of time.
Leveraging this abstract workload modeling approach, we propose StressMaker, a framework that uses machine learning for the automated generation of stressmarks. This paper demonstrates that with a suitable choice of only 40 hardware-independent program characteristics related to the instruction mix, instruction-level parallelism, control flow behavior, and memory access patterns, it is possible to generate a synthetic benchmark whose performance relates to that of general-purpose and commercial applications. A synthetic program that can be tuned to produce a variety of benchmark characteristics would significantly help in addressing this problem by enabling the automatic exploration of the large temperature and power design space.
However, manually developing and tuning so called stressmarks is extremely tedious and time-consuming while requiring an intimate understanding of the processor. Typical benchmark suites used in performance evaluation do not stress the processor to its limit though, and current practice in industry is to develop artificial benchmarks that are specifically written to generate maximum processor (component) activity. Overall, we can identify extreme behaviors in microprocessor workloads with a three orders of magnitude speedup and an error of a few percent on average.Įstimating the maximum power and thermal characteristics of a processor is essential for designing its power delivery system, packaging, cooling, and power/thermal management schemes. We identify a wide range of extreme behaviors, such as max energy, max power, max CPI, max branch misprediction rate, and max cache miss rate stress patterns. Second, we find that threshold clustering is a better alternative than k-means clustering, which is typically used in representative sampling, for finding stress patterns. First, although representative sampling is slightly less effective in characterizing average behavior than statistical sampling, it is substantially more effective in finding stress patterns. For doing so, we borrow from sampled simulation theory and we provide two key insights. This paper closes the gap between these two extremes by studying techniques for the automated identification of stress patterns (worst-case or extreme application behaviors) in typical workloads. Overall, we can identify extreme energy and power behaviors in microprocessor workloads with a three orders of magnitude speedup with an error of a few percent on average. This paper closes the gap between these two extremes by studying techniques for the automated identification of stress patterns (worst-case application behaviors) in typical workloads. These workloads range from benchmarks that represent typical behavior up to hand-tuned stress benchmarks (so called stressmarks) that stress the microprocessor to its extreme power consumption. Understanding the power characteristics of a microprocessor under design requires a careful study using a variety of workloads. Power consumption has emerged as a key design concern across the entire computing range, from low-end embedded systems to high-end supercomputers.