References: Performance & Supercomputers

This semester the material on performance and supercomputers is merged -- after all, supercomputing is all about performance. The new slides are here.

From performance analysis, the key ideas are really the tabular breakdown of expected instruction execution counts, CPIs, and clock period; the concepts of real, user, and system time; the different types of benchmarks; and Amdahl's law. Here are a few other interesting things about performance:

Of course, supercomputing might be fundamentally about larger-scale use of parallel processing, but parallel processing is really the key to high performance in any modern computer system. The amount of parallel processing used inside a typical cell phone exceeds that used in most supercomputers less than three decades ago. Looking at supercomputers gives a glimpse of what's coming to more mundane systems sooner than you'd expect.

The textbook places emphasis on shared memory multiprocessors (SMP stuff) and cache coherence issues. We covered a bit of that here, and coherence basics in memory systems, but it's a small piece of the whole pie because these systems really don't scale very large. That said, AMD is pushing 256 cores.

More generally, you should be aware of SIMD (including GPUs and the not-so-scalable SWAR/vector models) and MIMD, and also the terms Cluster, Farm, Warehouse Scale Computer, Grid, and Cloud. In the discussion of interconnection networks, you should be aware of Latency, Bandwidth, and Bisection Bandwidth, as well as some understanding of network topologies including Direct Connections (the book calls these "fully connected"), Toroidal Hyper-Meshes (e.g., Rings, Hypercubes), Trees, Fat Trees, and Flat Neighborhood Networks (FNNs), Hubs, Switches, and Routers. The concept of quantum computing as a form of parallel processing without using parallel hardware was also very briefly introduced.

You will find a lot of information about high-end parallel processing at aggregate.org. Professor Dietz and the University of Kentucky have long been leaders in this field, so Dietz has writen quite a few documents that explain all aspects of this technology. One good, but very old, overview is the Linux Documentation Project's Parallel Processing HOWTO; a particularly good overview of network topologies appears in this paper describing FNNs.

A quick summary of what things look like in Spring 2024:

One last note: Tesla's Full Self Driving Chip is a great example of supercomputing moving into mass-market devices


CPE380 Computer Organization and Design.