Enhancing GPU Energy Efficiency with an Asymmetric Streaming Multiprocessor Architecture

Supachai Thongsuk, Prabhas Chongstitvatana


Abstract

Graphics Processing Units (GPUs) significantly enhance computational performance through parallel processing but often suffer from energy inefficiency due to resource under utilization, particularly in memory-bound workloads. Conventional GPUs are typically designed with symmetric streaming multiprocessors operating under a unified frequency domain, which limits their ability to adapt to diverse workload requirements. To address this limitation, this paper proposes an Asymmetric Streaming Multiprocessor (ASM) architecture that partitions streaming multiprocessors into high-frequency and low-frequency clusters.  A neural network-based classifier analyzes static Parallel Thread Execution (PTX) code to predict the most suitable cluster for each application at compile time. This approach eliminates runtime profiling overhead and enables efficient workload-aware mapping. Experimental evaluations on standard benchmark applications demonstrate that ASM reduces execution time by 49%, lowers power consumption by 39%, and improves energy efficiency by 124% compared with conventional Dynamic Voltage and Frequency Scaling. Prior work by SSAGA achieved about a 20% improvement in energy efficiency by customizing streaming multiprocessors for different voltage–frequency domains and further gains with workload-aware scheduling and power gating. These findings indicate that the proposed ASM architecture constitutes a
practical and scalable approach to enhancing GPU performance and energy efficiency.

key words: asymmetric streaming multiprocessor, GPU energy efficiency, static source code analysis, machine learning