Gr8BOnd Reference Material

There's this guy named John Gustafson. He's done a lot of things in the high-performance computing world, and for much of that time, he's been unhappy about floating-point arithmetic. Really unhappy. So unhappy that in 2015 he wrote a book about it: The End of Error: Unum Computing. Unums, "Universal Numbers," aas he defines them have three types. It's the third type that seems most appealing, and it's called a Posit.

Posits are rather like floating-point values, but they have another field: the regime. Basically, the regime determines the trade-off between the number of remaining bits allocated for exponent and mantissa. This little trick allows representable values to be spaced closer where it's most important and further apart when it's not. There is also the lovely concept of precisely-representable numbers having values "between" them, which really isn't how floating-point is viewed. He tends to point people at his paper, Beating Floating Point at its Own Game: Posit Arithmetic, for the details, but he's also got a nice set of slides on this: Beyond Floating Point: Next-Generation Computer Arithmetic and a C++ library implementation, bfp - Beyond Floating Point.

So, the processor this semester is Gr8BOnd, pronounced "Great Beyond," is a a journey into beyond floating-point arithmetic, with heavy emphasis on 8-bit Posits. I know, you're thinking what could 8-bit floats be good for? Well, honestly, nothing. However, 8-bit Posits tend to behave surprisingly similar to 16-bit floats for many applications -- especially implementing neural networks, which is something everybody wants to do these days.... Hey, there's even a song recorded by R.E.M. about the Gr8BOnd (yeah, that's not really about this, but I couldn't resist).

Gr8BOnd Overview

Instruction set design is hard. Prof. Dietz has designed dozens of instruction sets in the three decades he's been a professor, and it still isn't easy for him to get things right. Thus, rather than giving you complete freedom to design your own instruction set, we're going to walk through the design logic for a reasonably well-crafted one that he built specifically for Spring 2020 CPE480. However, this design is not complete -- each team of students must devise your own encoding of the instructions and your own implementations. This document only covers the design principles, assembly language, and functionality.

So, what the heck is Gr8BOnd? Well, it's a little processor designed to support lots of 8-bit Posit arithmetic in a form that should be useful for implementing neural networks. Basically, it's a 16-bit machine implementing 8-bit Posit SWAR (SIMD Within A Register) in which each 16-bit word can be either a single value or a vector of two 8-bit fields. You're probably thinking two elements do not make a very long word, and you're right, but this is CPE480 and we're trying to keep things simple. Data memory (the .data segment) is organized as 65536 16-bit words with word addresses -- not byte addresses. Similarly, each Gr8BOnd instruction takes one 16-bit word. Instructions are kept in a word-addressed .text segment that can hold 65536 instructions.

This instruction set is complete enough to implement complete programs, but it is missing priviledged instructions, etc.

Changes for the last project are shown in RED, and a few integer conversions I forgot are shown in GREEN. Basically, we are replacing 16-bit Posits with 16-bit floats. However, that change causes a bit of a ripple... so we're adding a few instructions: f2pp, pp2f, and negf. The type conversion instructions are a little odd in that you would be converting between one float and two posits, so they instead merely work on one value, the low 8-bit posit field, which is replicated into the high 8-bit posit field for f2pp. I've also added a dup instruction, which was an omission that became obvious as I was writing the C compiler.

The Gr8BOnd Instruction Set

Enough introduction; let's charge into the Gr8BOnd!

Instruction Description Functionality
addi $d, $s Add 16-bit integers $d[15:0] += $s[15:0]
addii $d, $s Add 8-bit integers $d[15:8] += $s[15:8]; $d[7:0] += $s[7:0]
addf $d, $s Add 16-bit floats $d[15:0] += $s[15:0]
addpp $d, $s Add 8-bit posits $d[15:8] += $s[15:8]; $d[7:0] += $s[7:0]
and $d, $s bitwise AND 16-bit $d[15:0] &= $d[15:0]
anyi $d bitwise ANY reduction, 16-bit integer $d[15:0] = ($d[15:0] ? -1 : 0)
anyii $d bitwise ANY reduction 8-bit integers $d[15:8] = ($d[15:8] ? -1 : 0); $d[7:0] = ($d[7:0] ? -1 : 0)
bnz $c, addr Branch if Non-Zero if ($c[15:0] != 0) PC += (addr-PC)
bz $c, addr Branch if Zero if ($c[15:0] == 0) PC += (addr-PC)
ci $d, c16 Constant 16-bit $d[15:0] = c16 // by shortest sequence of instructions
ci8 $d, c8 Constant 8-bit sign extended to 16-bit $d[15:0] = ((c8 & 0x80) ? 0xff00 : 0) | (c8 & 0xff)
cii $d, c8 Constant 8-bit duplicated to 16-bit $d[15:8] = c8; $d[7:0] = c8
cup $d, c8 Constant 8-bit to upper 8-bits $d[15:8] = c8
dup $d, $s Duplicate $d[15:0] = $s[15:0]
f2i $d 16-bit float to integer $d[15:0] = (float16)$d[15:0]
f2pp $d 16-bit float to 8-bit posits $d[15:0] = { 2{((posit8)$d[15:0])}}
i2f $d 16-bit integer to float $d[15:0] = (float16)$d[15:0]
ii2pp $d 8-bit integers to posits $d[15:8] = (posit8)$d[15:8]; $d[7:0] = (posit8)$d[7:0]
invf $d Reciprocal 16-bit float $d[15:0] = 1 / $d[15:0]
invpp $d Reciprocal 8-bit posits $d[15:8] = 1 / $d[15:8]; $d[7:0] = 1 / $d[7:0]
jmp addr Jump to 16-bit address PC = addr // by shortest sequence of instructions
jnz $d, addr Jump Non-Zero to 16-bit address if ($d != 0) PC = addr // by shortest sequence of instructions
jz $d, addr Jump Zero to 16-bit address if ($d == 0) PC = addr // by shortest sequence of instructions
jr $a Jump Register PC = $a[15:0]
ld $d, $s LoaD $d[15:0] = memory[$s[15:0]]
muli $d, $s Multiply 16-bit integers $d[15:0] *= $s[15:0]
mulii $d, $s Multiply 8-bit integers $d[15:8] *= $s[15:8]; $d[7:0] *= $s[7:0]
mulf $d, $s Multiply 16-bit floats $d[15:0] *= $s[15:0]
mulpp $d, $s Multiply 8-bit posits $d[15:8] *= $s[15:8]; $d[7:0] *= $s[7:0]
negf $d Negate 16-bit float $d[15:0] = -$d[15:0]
negi $d Negate 16-bit integer $d[15:0] = -$d[15:0]
negii $d Negate 8-bit integers $d[15:8] = -$d[15:8]; $d[7:0] = -$d[7:0]
not $d bitwise NOT 16-bit $d[15:0] = ~$d[15:0]
or $d, $s bitwise OR 16-bit $d[15:0] |= $d[15:0]
pp2f $d (low) 8-bit posit to 16-bit float $d[15:0] = (float)$d[7:0]
pp2ii $d 8-bit posits to integers $d[15:8] = (int8)$d[15:8]; $d[7:0] = (int8)$d[7:0]
shi $d, $s shift 16-bit $d[15:0] = (($s[15:0] > 0) ? ($d[15:0] << $s[15:0]) : ($d[15:0] >> -$s[15:0]))
shii $d, $s shift 8-bit fields $d[15:8] = (($s[15:8] > 0) ? ($d[15:8] << $s[15:8]) : ($d[15:8] >> -$s[15:8]));
$d[7:0] = (($s[7:0] > 0) ? ($d[7:0] << $s[7:0]) : ($d[7:0] >> -$s[7:0]))
slti $d, $s Set Less Than 16-bit integers $d[15:0] = $d[15:0] < $s[15:0]
sltii $d, $s Set Less Than 8-bit integers $d[15:8] = $d[15:8] < $s[15:8]; $d[7:0] = $d[7:0] < $s[7:0]
st $d, $s STore memory[$s[15:0]] = $d[15:0]
trap Trap to OS this does a bunch of things...
xor $d, $s bitwise XOR 16-bit $d[15:0] ^= $s[15:0]

A few details about the above:

Determining how to encode the above instructions as bit patterns is a key part of your project. Given that some instructions have one register specified while others have two; there are even some instructions with a register and an 8-bit immediate value. For example, just encoding $d and c8 for the ci8 instruction requires 12 bits... but the remaining 4 bits could only decode 16 instruction types, and there are a lot more than that -- 39 in all! Don't worry; you can still figure-out an encoding scheme. In fact, there are lots of different, perfectly reasonable, ways to encode this instruction set.

By now you're probably thinking that Gr8BOnd is a pretty strange little instruction set. Well, it sort-of is. However, it's not as strange as you might think in that virtually every modern processor has SWAR support... and most are substantially more complex than this. However, that is not all the parallelism here. Remember how each instruction is only 16 bits long while data words are 32 bits wide? Well, later on you'll see that this instruction set is designed to make it easy to fetch two instructions each clock cycle for superscalar processing... but that's a topic for later in the course. :-)

The Gr8BOnd Registers

The Gr8BOnd instruction set only directly refers to 16 registers (although there might be more accessed by renaming). Just as in MIPS, each register has a number and a symbolic name. All those names should all be predefined using AIK's .const construct. In reality, none of registers are special in the hardware, but we have names and suggested uses for all 16:

.const { r0, r1, r2, r3, r4, r5, r6, r7,
         r8, r9, r10, at, rv, ra, fp, sp }

The registers $r0 through $r10 would normally be used by the compiler or assembly-language programmer as general-purpose registers. The $at register is reserved for use as an assembler temporary in implementing pseudo-instructions. The $rv and $ra registers are used for the return value and return address in function calls. As you probably expected, $fp and $sp are intended to serve respectively as the frame pointer and stack pointer.

Well, that's about it.


CPE480 Advanced Computer Architecture.