Assignment 2: Nearly Perfect Hash Functions

The goal of this OpenACC project is to modify nphf.c so that it can obtain some speedup by using a GPU.

Nearly Perfect Hash Functions

What is nphf.c? It's a program that searches for a nearly perfect hash function. What's that? Well, a perfect hash function implements a 1:1 and onto mapping -- in other words, a bijection. Think of it this way: a bijective hash function is a hash function that maps every domain value into a unique range value using a hash table which has precisely as many entries as there were input (domain) values. Now in truth, we don't really benefit from the mapping being 1:1 (invective); in fact, if two domain values map to the same range value, perhaps we can use a smaller hash table by having them share an entry. We also don't care too much if the hash table has a small number of unused entries -- if the mapping isn't onto (surjective). What we do care about is having very few collisions: pairs of domain values that should map to different range values but end up hashing to the same bucket.

Perfect Hash Functions are thus not quite the target. What we really want is a type of nearly perfect hash function that is not only conflict-free (or at least has very few collisions), but also is very cheap to compute. There are quite a few tools to create PHFs, usually from strings. For example, GPerf is quite flexible. However, here we want really cheap NPHFs. Using npfh.c, we'll find them by heavily-pruned exhaustive search.

This is all explained fairly well in Coding Multiway Branches Using Customized Hash functions, which is about how a similar approach was implemented on a MasPar MP1 SIMD supercomputer.

NPFH

NPFH is a simple C program that is driven by a table of hash function forms to search:

n    		0	0
n*n             0       0
n*n*n	        0       0
n<<i	        1       logd
(n>>i)^n        1       logd
(n^i)	        mind    maxd
(n>>i)+n        1       logd
(-n)^i          mind    maxd
n-(n<i)         mind    maxd
n>>(n&i)        1       logd
n>>i            1       logd
n-(n>i)         mind    maxd
(-n)>>i         1       logd
i-n             mind    maxd
n^(n>i)         mind    maxd
(n*i)^n         2       maxd
(n>>i)*n        1       logd
n+(n<i)         mind    maxd
n%i             tabsize maxd
(n*i)*n         2       maxd
ones(n|i)^n     1       maxd
(n+i)^n         mind    maxd
(n^i)%n         mind	maxd
n^(n<i)		mind	maxd
-(n^i)		mind	maxd
((~n)%i)        tabsize maxd
ones(n^i)+n     mind    maxd
ones(n>>i)^n    0       logd
ones(n>>i)-n    0	logd
(~n)%i		tabsize maxd
(n/i)^n		2	maxd

This table is in a text file called forms. Each line is an entry consisting of the pattern for the hash function, the minimum i value to try, and the maximum i value to try. For example, n<< means to take the domain value, n, and shift it over by i positions where i is a constant between 1 and logd (the log of the difference between the minimum and maximum domain values). To ensure that the result from n<l<i is in the range of valid hash table index values, the actual formula used is (n<l<i)%tablesize. However, if the table size is a power of 2, that simplifies to (n<l<i)&(tablesize-1).

Of course, those forms are not directly C code to do the search. The program that converts them into C code is mkforms.sh, which is actually a simple Awk script. Running that script makes a C code file called forms.h

./mkforms.sh forms.h

The forms.h file is included in the middle of driver code in nphf. There also is a set of #defines needed to specify how costly each operation in a hash computation is; that's included from costs.h. The costs basically specify how expensive each operation is so the minimum cost solution can be selected. The measure of expense is just an integer, so you can use whatever scale you want. Note that TABCOST(X) is supposed to be specifying the cost associated with using a table with X entries, so cost there is not only execution time for the masking or modulus to make values fit in the table, but also should include some cost to discourage having a larger table. In a similar way, COLLIDE is supposed to be a pseudo-cost that biases in favor of conflict-free hash functions, but a low value will allow a cheap hash function with some collisions to be selected over an expensive collision-free one.

There is a Makefile, but to build nphf, simply compile the C program:

cc nphf.c -o nphf

The program is run simply by feeding it a list of pairs of numbers on stdin. The first example from the MasPar MP1 paper is test0:

That can be processed by executing ./nphf <test0 to produce output like:

110 -> 0 *
300 -> 1 *
1200 -> 2 *
9600 -> 3 *
Cost 2067 (2 collisions): (n)&3
Cost 1047 (1 collisions): ((n>>1)^n)&3
Cost 23 (0 collisions): ((n>>3)^n)&3
Cost 21 (0 collisions): (n>>7)&3
19009 hashes evaluated

In addition to the pre-made test cases, there is also a shell script, mktest.sh, that can make random test mappings. It takes three command line arguments: the number of pairs in the domain, the maximum domain value (they'll be between 0 and that), and the maximum range value (they'll be between 0 and that).

Functional Requirements

Your task is simply to get some speedup by editing and recompiling this code to use OpenACC. If you are a graduate student, you will be expected to give a short presentation in the final exam timeslot on what you did to improve performance.

Start by grabbing NPHF.tgz. To use OpenACC, you'll have to add some OpenACC #pragma and compile with a command line that looks something like:

gcc nphf.c -fopenacc -foffload=nvptx-none -O3 -foffload="-O3" -o accnphf

The compile line shown above basically enables OpenACC pragma processing and says code should be generated to use a generic NVIDIA GPU as the accelerator hardware. Why call the executable program accnphf? Because I want your makefile to still be able to build nphf from the same source file(s) so we can directly compare outputs and run times with/without the GPU acceleration. The regular and accelerated versions might pick different hash functions, but that should only happen if both have the same cost.

Note that you probably also have to create a modified version of mkforms.sh to include OpenACC pragma in the C code it generates. I'd suggest having the OpenACC version of forms.h be called accforms.h. I'd recommend using a compile command line option like -DUSEGPU, combined with #ifdef USEGPU in your C code, to enable/disable parts of your C code for the normal or accelerated targets.

Due Dates

This project is due before the final exam timeslot. That means before 10:30AM on Tuesday, December 11, 2018.

Submission Procedure

For each project, you will be submitting a tarball (i.e., a file with the name ending in .tar or .tgz) that contains all things relevant to your work on the project. Minimally, each project tarball includes the source code for the project and a semi-formal "implementors notes" document as a PDF named notes.pdf. Your implementors notes for this project must include your observations about the execution time improvement using the GPU via OpenACC pragma; your observations can be stated in paragraph for, as a table, or as a graph of an appropriate type. For this particular project, place everything in the NPHF directory and submit a tarball of that directory.

Submit your tarball below. The file can be either an ordinary .tar file created using tar cvf file.tar yourprojectfiles or a compressed .tgz file file created using tar zcvf file.tgz yourprojectfiles. Be careful about using * as a shorthand in listing yourprojectfiles on the command line, because if the output tar file is listed in the expansion, the result can be an infinite file (which is not ok).

GPU Computing