Inproceedings,

ScaleUPC: A UPC Compiler for Multi-Core Systems

, and .
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, page 11:1--11:8. New York, NY, USA, ACM, (2009)
DOI: 10.1145/1809961.1809976

Abstract

Since multi-core computers began to dominate the market, enormous efforts have been spent on developing parallel programming languages and/or their compilers to target this architecture. Although Unified Parallel C (UPC), a parallel extension to ANSI C, was originally designed for large scale parallel computers and cluster environments, its partitioned global address space programming model makes it a natural choice for a single multi-core machine, where the main memory is physically shared. This paper builds a case for UPC as a feasible language for multi-core programming by providing an optimizing compiler, called ScaleUPC, which outperforms other UPC compilers targeting SMPs.</p> <p>As the communication cost for remote accesses is removed because all accesses are physically local in a multi-core, we find that the overhead of pointer arithmetic on shared data accesses becomes a prominent bottleneck. The reason is that directly mapping the UPC logical memory layout to physical memory, as used in most of the existing UPC compilers, incurs prohibitive address calculation overhead. This paper presents an alternative memory layout, which effectively eliminates the overhead without sacrificing the UPC memory semantics. Our research also reveals that the compiler for multi-core systems needs to pay special attention to the memory system. We demonstrate how the compiler can enforce static process/thread binding to improve cache performance.

Tags

Users

  • @gron

Comments and Reviews