Leveraging Chiplet-Locality for Efficient Memory Mapping in Multi-Chip Module GPUs
Junhyeok Park
Sungbin Jang
Osang Kwon
Yongho Lee
Seokin Hong
While the multi-chip module (MCM) design allows GPUs to scale compute and memory capabilities through multi-chip integration, it also introduces memory system non-uniformity, particularly when a thread accesses resources in remote chiplets. This work specifically investigates the impact of page size in memory mapping on the non-uniformity. While large pages are advantageous in reducing address translation overhead by covering larger memory regions per page, they also enforce coarse-grained data placement, potentially leading to data misallocation across chiplets. In contrast, small pages allow for finer-grained placement, increasing the likelihood of mapping data to the chiplet most likely to access it.
This paper introduces CLAP which determines the proper page size—specifically, the level of page contiguity—for each application. CLAP observes that GPU applications exhibit a distinct memory mapping pattern, in which specific groups of pages are primarily accessed by the same chiplet with the group size tending to remain consistent within each data structure-a property referred to as chiplet-locality. Notably, the group size tends to remain consistent within each data structure. Leveraging this insight, CLAP predicts which groups of pages exhibit chiplet-locality and pre-organizes them within the chiplet predicted to access them, forming a region that functions as a large page. Therefore, CLAP achieves the benefits of large pages without compromising memory locality. Our evaluation shows that CLAP improves performance by up to 19.2% over previous paging schemes, including an 11.9% improvement over an ideal NUMA-aware paging scheme.
Keywords