© 1996 by British Computer Society
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
A Scalable Shared Queue on a Distributed Memory Machine
1 Scalable Systems and Algorithms Group, School of Computer Studies, The University, Leeds LS2 9JT, UK, 2 School of Computer Studies, The University, Leeds LS2 9JT, UK
The emergence of low latency, high throughput routers means that network locality issues no longer dominate the performance of parallel algorithms. One of the key performance issues is now the even distribution of work across the machine, as the problem size and number of processors increase. This paper describes the implementation of a highly scalable shared queue, supporting the concurrent insertion and deletion of elements. The main characteristics of the queue are that there is no fixed limit on the number of outstanding requests and the performance scales linearly with the number of processors (subject to increasing network latencies). The queue is implemented using a general-purpose computational model, called the WPRAM. The model includes a shared address space which uses weak coherency semantics. The implementation makes extensive use of pairwise synchronization and concurrent atomic operations to achieve scalable performance. The WPRAM is targeted at the class of distributed memory machines which use a scalable interconnection network.
Received August 14, 1995. revised July 23, 1996.
* School of Computer Studies, The University, Leeds LS2 9JT, UK
Scalable Systems and Algorithms Group, School of Computer Studies, The University, Leeds LS2 9JT, UK