Tournament Warp Scheduling: A Dynamic Policy Selector Exploiting Sub-cores in GPUs
Jiyong Jeong
Sungbin Jang
Seokin Hong
Warp scheduling policies have been extensively studied to improve GPU performance, with each policy optimized for different workload characteristics. However, current GPUs typically support only a fixed or minor parameter-tuned scheduling policy in hardware, limiting performance across diverse applications. We propose a \emph{Tournament Warp Scheduling}, a lightweight hardware mechanism that dynamically selects the most suitable warp scheduling policy at runtime. Our design exploits the sub-core structure of modern GPUs by initially assigning a different warp scheduling policy to each of an SM’s four warp schedulers. After a short evaluation period, all schedulers within an SM select and adopt the best-performing policy. Our approach improves performance by up to 10.6% compared to a static scheduler, with parallel comparison and minimal wiring overhead among schedulers. While our evaluation focuses on four conventional policies (LRR, GTO, SWL, TLS), the tournament mechanism can be extended to support a broader range of existing or future warp scheduling strategies with flexibility.
Keywords