MIMD
All materials posted here are for personal use only.
Material will be added incrementally throughout the Spring 2022 semester.
Shared Memory Programming
From very low level to very high level....
-
MIMDPI.tgz
-
A little collection of different MIMD code to compute Pi
-
OpenMP (aka, OMP)
-
Here are my OpenMP overview
slides, as presented in class. OMP pragmas are understood
by recent GCC releases (GOMP is built-in), but must be enabled
by giving -fopenmp on the gcc command line with no
other special options; my Pi computation example for OMP is mppi.c. Normally,
environment variables are used to control things like how many
processes to make
-
Mutex (exclusive lock) vs. Semaphore (signaling mechanism)
-
Don't yet have a great reference for this, but they're everywhere.
Basic Mutex operations are lock(m) and unlock(m),
withe many implementations.
Basic Semaphore operations are classically called P and V (wait and signal).
The simplest counting semaphore would be something like
void p(semaphore s) { while (s<=0); --s; }
and void v(semaphore s) { ++s; }.
-
Futexes
-
Many short, yet still confusing, descriptions of Futexes are
available and here's probably the best early overview (PDF); the
catch is that various Linux kernels have different
futex() implementations with 4, 5, or 6 arguments
-
Barrier synchronization
-
There are various atomic counter algorithms; alternatively, here is GPU
SyncBlocks algorithm from my Magic Algorithms page
-
That's basically the same as used in The Aggregate Function API: It's Not Just For PAPERS Anymore
-
Direct use of System V shared memory
-
My System V shared memory version of the Pi computation is shmpi.c -- note that
this version uses raw assembly code to implement a lock, which
has far less overhead than using the System V OS calls (unless
you're counting on the OS to schedule based on who's waiting for
what)
-
POSIX Threads
-
POSIX Threads (pthreads) is now a standard library included in
most C/C++ compilation environments, and linked as the -lpthread
library under Linux GCC; my Pi computation example for pthreads is pthreadspi.c
-
UPC (unified parallel C)
-
UPC (Unified Parallel C) is an
extension of the C language, and hence requires a special
compiler. There are several UPC compilers; the fork of GCC
called GUPC must be installed as described at the project
homepage (in my systems, it is installed at
/usr/local/gupc/bin/gupc). My Pi computation example
for UPC is upcpi.upc; compilation
is straightforward, but the executable produced processes some
command line arguments as UPC controls, for example, -n
is used to specify the number of processes to create.
Basic MIMD Architecture & Concepts
A little about historically how this has evolved...
-
Fetch-&-Add in the NYU Ultracomputer
-
A. Gottlieb,
R. Grishman,
C.P. Kruskal,
K.P. McAuliffe,
L. Rudolph, and
M. Snir, "The NYU Ultracomputer -- Designing an MIMD Shared Memory Parallel
Computer" in IEEE Transactions on Computers, vol. 32, no. 02, pp. 175-189,
1983. doi: 10.1109/TC.1983.1676201 (URL,
local copy)
-
"An Overview of the NYU Ultracomputer Project (1986)"
(PDF) is a better, but more obscure, reference
-
Explanation of the "Hot Spot" problem for RP3
-
G. F. Pfister and V. A. Norton, "``Hot spot'' contention
and combining in multistage interconnection networks," in IEEE Transactions on
Computers, vol. C-34, no. 10, pp. 943-948, Oct. 1985.
(URL, local copy)
-
Memory consistency models
-
"Shared Memory Consistency Models: A Tutorial"
(PDF) -- Sarita Adve has done quite a few versions of this
sort of description
-
Modern atomic memory access instructions
-
AMD64 atomic instructions
-
Futexes
-
Many short, yet still confusing, descriptions of Futexes are
available and here's probably the best early overview (PDF); the
catch is that various Linux kernels have different
futex() implementations with 4, 5, or 6 arguments
-
Transactional memory
-
Transactional Memory has been a hot idea for quite a while.
Intel's Haswell processors incorporate a hardware implementation
described in chapter 8 of this
PDF (locally, PDF); but there were (still are) problems.
-
Wikipedia has a nice summary of software support for transactional memory.
-
There is a version of software transactional memory implemented in GCC.
-
Replicated/Distributed Shared Memory
-
A very odd one is implemented in AFAPI as Replicated Shared Memory
-
The best known is Treadmarks, out of Rice University
-
One of the latest is DEX: Scaling Applications Beyond Machine Boundaries, which is part of
Popcorn Linux
Distributed Memory Programming
-
One-page MPI reference card
-
This one-page reference card I wrote isn't everything you need to know about MPI,
but it'll do for most things....
-
MPICH (MPI over CHameleon)
-
One of the earliest complete MPI implementations, MPICH was layered on top of another library,
and hence had some performance issues. The latest versions are highly tuned and no longer
suffer significant layering costs.
-
OpenMPI
-
This is one of many MPI implementations. It grew out of LAM MPI, which was more efficient than MPICH,
and arguably still is.
GPU and Multi-Core Computing