Abstract
In this paper we introduce the Concurrent Collections pro-
gramming model, which builds on past work on TStreams 8. In this
model, programs are written in terms of high-level application-specific
operations. These operations are partially ordered according to only their
semantic constraints. These partial orderings correspond to data flow and
control flow.
This approach supports an important separation of concerns. There are
two roles involved in implementing a parallel program. One is the role of
a domain expert, the developer whose interest and expertise is in the ap-
plication domain, such as finance, genomics, or numerical analysis. The
other is the tuning expert, whose interest and expertise is in performance,
including performance on a particular platform. These may be distinct
individuals or the same individual at different stages in application de-
velopment. The tuning expert may in fact be software (such as a static
or dynamic optimizing compiler). The Concurrent Collections program-
ming model separates the work of the domain expert (the expression of
the semantics of the computation) from the work of the tuning expert
(selection and mapping of actual parallelism to a specific architecture).
This separation simplifies the task of the domain expert. Writing in this
language does not require any reasoning about parallelism or any un-
derstanding of the target architecture. The domain expert is concerned
only with his or her area of expertise (the semantics of the application).
This separation also simplifies the work of the tuning expert. The tuning
expert is given the maximum possible freedom to map the computation
onto the target architecture and is not required to have any understand-
ing of the domain (as is often the case for compilers).
We describe two implementations of the Concurrent Collections program-
ming model. One is IntelR
Concurrent Collections for C/C++ based
on IntelR
Threaded Building Blocks. The other is an X10-based imple-
mentation from the Habanero project at Rice University. We compare
the implementations by showing the results achieved on multi-core SMP
machines when executing the same Concurrent Collections application,
Cholesky factorization, in both these approaches.
Users
Please
log in to take part in the discussion (add own reviews or comments).