Thomas E. Anderson, David E. Culler, David A. Patterson, and the NOW team. A Case for Networks of Workstations: NOW. In Principles of Distributed Computing, August 1994
N. Adiga, G. Almasi et al. An Overview of the BlueGene/L Supercomputer. Proc. SC 2002.
Per Stenstrom. A Survey of Cache Coherent Schemes for Multiprocessors. IEEE Computer, pages 12-24, June 1990.
John Hennessy and David Patterson. Computer Architecture: A Quantitative Approach (2nd Edition). Chapter 8 (Multiprocessors). Morgan Kaufman.
John B. Carter, John K. Bennett, and Willy Zwaenepoel. Implementation and Performance of Munin. SOSP 1991, p. 152-164.
Honghui Lu, Alan Cox, R. Rajamony, Willy Zwanapoel, and Sandhya Dwarkadas. Compiler and Software Distributed Shared Memory Support for Irregular Applications Proceedings of the Sixth PPOPP, Las Vegas, NV, June 1997
E. Penheiro et al. S-DSM for Heterogeneous Machine Archictectures Second Workshop on Software Distributed Shared Memory, May 2000
Honghui Lu et al. Contention elimination by replication of sequential sections in distributed shared memory programs. In PPOPP, 2001.
A. Itzkovitz and A. Schuster. MultiView and Millipage - Fine-Grain Sharing in Page-Based DSMs. In OSDI, 1999.
Todd C. Mowry, Monica S. Lam, and Anoop Gupta. Design and Evaluation of a Compiler Algorithm for Prefetching. (PostScript) In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62-73, October 1992
David Padua and Michael J. Wolfe. Advanced Compiler Optimizations for Supercomputers. Communications of the ACM , 29(12), December 1986
Ken Kennedy and Kathryn McKinley. Optimizing for Parallelism and Data Locality International Conference on Supercomputing, 1992
Mike Voss and Rudi Eigenmann. High-level adaptive program optimization with ADAPT. In PPOPP, 2001
Alexandru Salcianu and Martin Rinard. Pointer and escape analysis for multithreaded programs. In PPOPP, 2001
Lawrence Rauchwerger and David Padua. The
LRPD Test: Speculative Run-Time Parallelization of Loops with
Privatization
and Reduction Parallelization. Proceedings of the SIGPLAN'95 Conference
on Programming Language
Design and Implementation, June 1995,La Jolla, CA, pp.218--232.
Raja Das, Mustafa Uysal, Joel Saltz, Yuan-Shin Hwang. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures. Journal of Parallel and Distributed Computing, 1993.
Ernie Chan, William Gropp, Rajeev Thakur, and Robert van de Geijn. Collective Communication on Architectures that Support Simultaneous Communication over Multiple Links , PPOPP 2006.
Chao Huang, Gengbin Zheng, Sameer Kumar, and Laxmikant Kale. Performance Evaluation of Adaptive MPI , PPOPP 2006.
Sotiris Ioannidis and Sandhya Dwarkadas. Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems. Fourth Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, May 1998
Donald G. Morris III and David Lowenthal. Accurate Data Redistribution Cost Estimation in Software Distributed Shared Memory Systems. PPOPP 2001.
Ken Kennedy and Uli Kremer. Automatic Data Layout for Distributed Memory Machines. ACM Transactions on Programming Languages and Systems (TOPLAS), 20 (4), ACM Press, 1998.
Jaydeep Marathe, and Frank Mueller. Hardware Profile-guided Automatic Page Placement for ccNUMA Systems , PPOPP 2006.
Dimitrios S. Nikolopoulos, Theodore S. Papatheodorou, Constantine D. Polychronopoulos, Jesus Labarta, and Eduard Ayguade. A Case for User-Level Dynamic Page Migration. ICS '00
Gosia Wrzesinska, Jason Maassen and Henri Bal. Self-adaptive Applications on the Grid PPOPP 2007
T. Heath, B. Diniz, E. V. Carrera, W. Meira Jr., and R. Bianchini. Energy Conservation in Heterogeneous Server Clusters. ( PS gzipped )
Robert Springer, Barry Rountree, David Lowenthal, and Vince Freeh. Minimizing Execution Time in MPI Programs on an Energy-Constrained, Power-Scalable Cluster. 11th ACM Symposium on Principles and Practice of Parallel Programming (PPOPP), March 2006
Software Support for Virtual Memory-Mapped Communication. Cezary Dubnicki, Liviu Iftode, Edward W. Felten, and Kai Li. Intl. Parallel Processing Symposium, April 1996.
Vincent W. Freeh, David K. Lowenthal, and Gregory R. Andrews. Distributed Filaments: Efficient Fine-Grain Parallelism on a Cluster of Workstations. First Symposium on Operating Systems Design and Implementation, p. 201-212, Monterey, CA, November 14-17, 1994
A. Radulescu et al. CPR: Mixed Task and Data Scheduling for Distributed Systems. In IPDPS, May 2001.
T. Gross, D. O. Hallaron, J. Subhlok. Task Parallelism in a High Performance Fortran Framework. IEEE parallel and distributed technology: systems and applications, 1994.
R. Bordewekar et al. A Model and Compilation Strategy for Out-of-core Data Parallel Programs. In PPOPP '95.
Sadaf Alam, Pratul Agarwal, Al Geist, and Jeffrey Vetter. Performance characterization of bio-molecular simulations using molecular dynamics , PPOPP 2006.
Jeff Vetter and Michael McCracken. Statistical scalability analysis of communication operations in distributed applications. In PPOPP, 2001
J.S. Vetter and P. Worley. Asserting Performance Expectations. Proc. SC 2002.
J.S. Vetter. Dynamic Statistical Profiling of Communication Activity in Distributed Applications. Proc. SIGMETRICS: Joint International Conference on Measurement and Modeling of Computer Systems
J.S. Vetter. Performance Analysis of Distributed Applications using Automatic Classification of Communication Inefficiencies. Proc. ACM Int'l Conf. Supercomputing
Michael Scott and William Scherer. Scalable queue-based spin locks with timeout. In PPOPP, 2001