Grading PAPERS

A computer engineering professor's research into "aggregate function communications" has some eye-popping potential.

The Engineers Behind PAPERS: Professor Hank Dietz (standing, right) and his current graduate students (clockwise) Soohong Kim, Tim Mattox,and Ray Hoare.

Zooming in for a close-up view of Purdue's Ross-Ade Stadium.

Mention "grading papers" to most college faculty, and they'll respond with a frown. They usually consider it the drudge work tainting their overall labor of love: pushing forward the frontiers of knowledge and passing it on to future generations.

Not so with Hank Dietz, associate professor of electrical and computer engineering at Purdue. For him, PAPERS - of the uppercase kind - is a reason for smiles and excitement. Dietz's PAPERS translates to "Purdue's Adapter for Parallel Execution and Rapid Synchronization," and so far it's earning very high marks.

PAPERS is custom hardware, developed by Dietz and a dozen or so graduate students, that allows a cluster of unmodified PCs or workstations to function together as a unit capable of the kinds of processing normally reserved for expensive parallel-processing supercomputers.

"Thus far," says Dietz, "I would give our work an A-plus in demonstrating the use of clustered PCs to make a parallel supercomputer, and an A in demonstrating our new model of computation."

A new model of computation - that's what PAPERS is really all about.

"We did not start out to build a piece of hardware," Dietz says. "In about 1987 we developed compiler technology that made it easy to automatically parallelize programs. So you could take an ordinary program - one that didn't explicitly say to do things simultaneously - and enable it to do things in parallel."

Then came an inevitable bump in the road that led to PAPERS. As Dietz explains, "The code the compiler would generate had certain constructs that most parallel supercomputers do not implement efficiently. In particular, they couldn't handle something called barrier synchronization [see sidebar]. So we decided to build our own hardware that implemented what we wanted."

Big technology, small size: Component parts for PAPERS are no larger than a toaster but combine for powerful computation.

Physically, PAPERS looks humble enough. It's a box no bigger than a toaster, containing a not- too-scary number of component chips. Its simplest version plugs into the parallel port of a computer or workstation - the same place where you normally hook up a printer.

Since 1994, PAPERS has evolved through 15 generations of prototypes and has proved to be remarkably utilitarian.

"We were surprised to find that not only did the ideas work, but also this simple implementation was actually useful," says Dietz. "We really had no expectations for tying together PCs this way. It wasn't going to be anything more than an academic exercise."

Part of the reason for PAPERS' success is its speed. Although the parallel printer port interface is not glamorous, the latency (waiting period) is about as low as you can get.

"It takes about 1 microsecond for the processor in a PC to access the parallel port, whereas things like SCSI controllers and Ethernet cards tend to take at least tens of microseconds," says Dietz.

In the world of computing, of course, microseconds are eternities.

PAPERS in action can be thrilling to watch, too. At least one application of this new technology represents some of the most visually stunning research being performed in Purdue engineering.

PAPERS In Action: A composite color image of the Jovian moon Io, under real-time analysis by a viewer.

When you walk into the Parallel Processing Laboratory, you come face to face with a "four-by- four video wall" composed of sixteen 20-inch color monitors driven by a cluster of Pentium II PCs donated to the project by Intel. This set-up demonstrates, on a small scale, one of the potential uses of PAPERS hardware. For example, Dietz can:

As you might expect, these exercises are more than fun and games. They illustrate the way PAPERS enhances the computing power of the machines it links together, creating speed and display resolution you can't get on a single computer.

According to Dietz, two things distinguish the PAPERS video walls: "One, to increase resolution, we simply add more machines, and two, as we add those machines, we're also adding compute power to drive each of the display units. That's not been possible before."

Video-wall technology may be coming soon to a theater near you. The motion-picture industry is especially interested, because audiences crave higher-quality pictures, and current projection technology is bumping up against its limits.

Dietz says a movie theater's conventional projection system could be inexpensively replaced with a video-wall setup that would offer high-quality digital video on a single seamless screen with rear projection cells behind it.

"You wouldn't see any of the defects you are used to seeing in standard film projection," Dietz says. "It wouldn't matter if the film has been playing for weeks. It would still be a pristine copy, without dust, dirt, and scratches - essentially a first-generation master, compared to what you now see, which is often a tenth-generation copy. Jitter would not be a problem, and there is no flicker using LCD projectors."

What excites Dietz most about the cinematic application of PAPERS technology is its potential for interactivity.

"You could have a much more interactive version of a movie like Clue," he suggests, "with the audience voting as the show progresses and the computer responding in real time by projecting different video. In general, you could weave throughout an entire film different alternative video sequences, with branching decisions made in real time. There even can be enough compute power available to add new elements to the video, such as animated characters customized to the audience.

"You could also have directional audio cues, motion control, and other special effects - like smoke in the theater - coordinated to projected images. Right now you can coordinate those things only by a prescripted arrangement. But using the new technology, all these things could be coordinated in real time by monitoring and responding to the audience."

Up to now, PAPERS research has proceeded along two lines. One has aimed to make the technology easy for other people to replicate, and the other focuses on building more sophisticated versions for specialized research purposes.

In the near future, Dietz hopes to demonstrate that aggregate function communication is as fundamental as message-passing or shared memory.

"We haven't yet explored the full range of what can be done using this model of computing," he says. "Now we are attempting to put more sophisticated aggregate functions into hardware, to stretch the model and see how far it will go. It's not clear that there is an end in sight."

Sidebar: Reducing the Barriers

A major limit on the effectiveness of parallel computing - and a major breakthrough of PAPERS - revolves around an operation known as barrier synchronization.

In parallel computing, multiple processors simultaneously work on different portions of a single problem. To yield correct results, the actions of the individual processors must be coordinated.

Barrier synchronization ensures that no processor gets too far ahead of the others. An extension of barrier synchronization, aggregate functions allow a single network operation to simultaneously sample data from all processors. This gives every processor easy access to global properties of the computation relative to the problem being solved, despite the fact that the computation is spread across multiple processors.

"Ethernet or other networks typically require at least hundreds of microseconds to send a simple message from one processor to another," Dietz says. Operations like barrier synchronization or aggregate functions must be accomplished using many of these slow one-to-one communications.

PAPERS hardware not only reduces time per network operation, but also reduces the number of network operations needed by supporting many-to-many aggregate function communications.

"With our hardware, it takes only about three microseconds to have every processor communicate, not just one processor to another but all processors to all other processors," Dietz explains. "That difference is significant enough to qualitatively change how we can use a cluster of PCs for supercomputing."

For Additional Information

Hardware designs for the simpler versions of PAPERS are available online at, as are several versions of the support software called AFAPI (Aggregate Function Application Program Interface). Dozens of institutions worldwide are using PAPERS and the AFAPI libraries.

The original article was written by Sig Kriebel for Purdue Engineering Extrapolations Summer 1998 issue, which was distributed free of charge to more than 65,000 Purdue alumni and friends. For further information about Extrapolations and/or use of this article, please contact:

Schools of Engineering
Purdue University
1280 Engineering Administration Blgd.
West Lafayette, IN  47907-1280

The Aggregate. The only thing set in stone is our name.