A programmable processor — RISC-V#

Learning goals#

  • Apply features of the language to describe fundamental components of a programmable processor: register file, program counter, ALU, instruction memory

Introductory problem#

You are an FPGA engineer working the research department of a tech company that builds data loggers for passenger trains. These loggers can be used in case of legal investigations, so the data inside must be signed. Most off-the-shelf embedded processors do not meet the data signing speed requirements so you decide to accelerate data signing using a custom circuit on the FPGA. Moreover you don’t want to buy an extra CPU chip and decide to implement both the programmable processor and the data signing accelerator on a single FPGA chip.

Instead of using available open-source RISC-V designs, you decide to implement your own that can be used in future products. After browsing the RISC-V instruction set manual you feel overwhelmed by the number of instructions and decide to implement a very simple 32 bit RISC-V processor that can only add numbers. “This should give me a soft start” you think. You choose to use the ADDI instruction which can add an immediate value to a given source register.

You ask your AI assistant to come up with a simple program that uses ADDI and you get the following program:

addi t0, t0, -1
stop:
    j stop
ADDI

ADDI adds the sign-extended 12-bit immediate to register rs1. Arithmetic overflow is ignored and the result is simply the low XLEN bits of the result.

XLEN

We use the term XLEN to refer to the width of an integer register in bits (either 32 or 64)

“Of course I need an endless loop to finish execution — the processor always executes unless the clock stops!”, you remember. J is a jump to an address.

To test your program you search for a RISC-V simulator and stumble upon Ripes RISC-V simulator. You see that your program is translated to the following:

0:        fff28293        addi x5 x5 -1
0004 <stop>:
4:        0000006f        jal x0 0 <stop>

0 and 4 are the addresses of the two instructions you have provided. fff28293 and 0000006f are the instructions in binary format.

You notice that t0 was converted to x5. According to the assembler manual t0 stands for temporary register 0 and x5 is used as t0.

Moreover you notice that j was converted to jal with x0 and 0. But how are x0 and 0 are used?

JAL

The jump and link (JAL) instruction uses the J-type format, where the J-immediate encodes a signed offset in multiples of 2 bytes. The offset is sign-extended and added to the address of the jump instruction to form the jump target address. … JAL stores the address of the instruction following the jump (pc+4) into register rd.

Plain unconditional jumps (assembler pseudoinstruction J) are encoded as a JAL with rd=x0.

Pay attention to the three requirements marked in bold font and also to the following:

x0 is always zero

Register x0 is hardwired with all bits equal to 0.

So j stop simply jumps to PC + 0, even the label stop is at address 4, because the jumps are relative to the current instruction or technically to the PC.

Imagine you want to decode an instruction. How will you differentiate between different instructions? Probably using case statement. But you need the actual bits that make up each instruction — the opcodes. Manually coding dozens of opcodes is fault-prone. Fortunately there is a tool that can generate a package with the constants for the opcodes on the RISC-V Opcodes repository. In the following you see an excerpt from this package:

Listing 26 code/riscv-single-cycle-only-addi-jal/riscv_instr.sv#
/* Automatically generated by parse_opcodes */
package riscv_instr;
  localparam [31:0] ADD                = 32'b0000000??????????000?????0110011;
  localparam [31:0] ADDI               = 32'b?????????????????000?????0010011;

You see that the constants include the symbol ?. This symbol acts as don’t care in a casez statement.

You find the generated file here: riscv_instr.sv. The file was generated using:

git clone https://github.com/riscv/riscv-opcodes
cd riscv-opcodes
make inst.sverilog EXTENSIONS="rv_i rv32_i"
mv inst.sverilog riscv_instr.sv

You finally start coding.

Requirements#

Complete the following module that can execute the simple addi program above:

Listing 27 code/riscv-single-cycle-only-addi-jal/mp.sv#
// verilator lint_off WIDTHEXPAND
import mp_pkg::*;
import riscv_instr::*;

module mp #(
    string MEM_INIT_HEX_FILE = "",  // Only used if non-empty
    int unsigned MEM_DEPTH_LG2 = 10  // In bytes
) (
    input clk,
    rst
);
  byte mem[2**MEM_DEPTH_LG2];
  word rf[RegfileSize], rfn[RegfileSize];  // Register file
  word pc, pcn;  // Program counter
  word inst;  // Current instruction

  assign inst = {mem[pc+3], mem[pc+2], mem[pc+1], mem[pc]};

  initial
    if (MEM_INIT_HEX_FILE.len != 0) begin
      $readmemh(MEM_INIT_HEX_FILE, mem);
      $display("Initialized memory using %s.", MEM_INIT_HEX_FILE);
    end

  always_ff @(posedge clk, posedge rst) begin
    pc <= rst ? 0 : pcn;
    rf <= rst ? '{default: '0} : rfn;
  end

  // Variables used in instruction parsing
  i_inst_t i_inst;
  j_inst_t j_inst;
  always_comb begin

Use the following package that defines structures and functions that help with instruction decoding:

Listing 28 code/riscv-single-cycle-only-addi-jal/mp_pkg.sv#
package mp_pkg;
  localparam int unsigned XLen = 32, RegfileSize = 32;
  typedef logic [XLen-1 : 0] word;

  // Instruction fields
  typedef logic [$clog2(RegfileSize)-1:0] reg_id_t;
  typedef logic [6:0] opcode_t;
  typedef logic [2:0] funct3_t;
  // `opcode`, `funct*` are only used for decoding, so they are not used as
  // inputs for the ALU. So these fields are also named `unused*` in the
  // following instruction definitions.
  // Note that fields not used by the ALU have grey color in the ISA manual.

  // All field names chosen according to the ISA manual.
  typedef struct packed {
    logic [11:0] imm11_0;
    reg_id_t rs1;
    funct3_t notused1;
    reg_id_t rd;
    opcode_t notused2;
  } i_inst_t;  // Immediate

  typedef struct packed {
    logic [20:20] imm20;
    logic [10:1] imm10_1;
    logic [11:11] imm11;
    logic [19:12] imm19_12;
    reg_id_t rd;
    opcode_t notused;
  } j_inst_t;  // Jump
  // J instruction is a variant of U

  // verilator lint_off UNUSEDSIGNAL
  function automatic word assemble_j_imm(j_inst_t i);
    // verilator lint_on UNUSEDSIGNAL
    return {{12{i.imm20}}, i.imm19_12, i.imm11, i.imm10_1, 1'b0};
  endfunction
endpackage

For testing use the following testbench. Pay attention to the tested hierarchical names dut.* and choose the names of your variables accordingly.

Listing 29 code/riscv-single-cycle-only-addi-jal/tb.sv#
import util::*;
module tb;
  logic clk, rst;
  mp #("program.mem") dut (.*);

  always #1 clk = !clk;

  initial begin
    rst = 1;
    @(posedge clk);
    rst = 0;

    @(dut.pc == 4);
    // x5 must be -1
    assert (dut.rf[5] == -1);

    repeat (3) begin
      @(posedge clk);
      // We stay in the same address due to the Jump
      assert (dut.pc == 4);
    end

    $finish;
  end

  // x0 must be hardwired to zero
  always @(posedge clk) assert (dut.rf[0] == 0);

  initial dump_and_timeout;
endmodule

The testbench uses program.mem to initialize the instruction memory:

Listing 30 code/riscv-single-cycle-only-addi-jal/program.mem#
93
82
f2
ff

6f
00
00
00

Warning

Do not try to compile program.mem with Verilator, i.e., do not include program.mem as an input file to verilator. Otherwise you will get the following error:

%Error: ...:1:1: syntax error, unexpected INTEGER NUMBER

Implementation on the board#

Connect:

  1. rst to the push buttons

  2. Least significant bits of rf[5] to the LEDs.

Only the least significant LED should switch on.

Optional: Display the (1) current opcode and (2) program counter on the seven segment display. Use the left-most slide switch to switch between them.

Tasks#

Read the sections 7.1, 7.2 of the book Digital Design and Computer Architecture - RISC-V Edition, 2021, which should be freely available for reading on Google Books. Alternatively you can refer to the lecture slides that accompany the book.

Mini-lecture#

Microarchitecture#

  • microarchitecture: Concrete arrangement of the processor components like register file, control logic, ALU.

  • a particular architecture can have multiple microarchitectures.

  • architectural state

    • register file

    • data memory

  • instruction set

Design#

  • this week’s goal: a processor with the following instructions

    • I-type: addi

    • jump: jal

  • data-path: 32-bit

  • control-unit

  • Two elements: state + combinational

  • state

    • program counter

    • instruction memory

    • register file

    • data memory

  • example microarchitectures

    • single-cycle

    • multi-cycle

    • pipelined

Performance analysis#

execution_time = instruction_count * cycles_per_instruction * time_per_cycle

Helpful resources for implementing a RISC-V processor#

Further language constructs#

Hierarchical names#

Typically we exchange data through inputs and outputs. Imagine you are testing your design and would like to access an internal variable. Using hierarchical names you can access every entity in your design. Both in synthesis and simulation.

Hierarchical names consist of instance names separated by periods, where an instance name can be an array element. The instance name $root refers to the top of the instantiated design and is used to unambiguously gain access to the top of the design.

$root.mymodule.u1 // absolute name 
u1.struct1.field1 // u1 shall be visible locally or above, including globally 
adder1[5].sum 

The complete path name to any object shall start at a top-level (root) module. This path name can be used from any level in the hierarchy or from a parallel hierarchy.

We should not use hierarchical names for ordinary design. If a signal must be exchanged, then we should declare it as input / output. On the other hand, routing an internal signal to an LED on the board for testing the design is acceptable.

Compiler directive define#

A text macro substitution facility has been provided so that meaningful names can be used to represent commonly used pieces of text. For example, in the situation where a constant number is repetitively used throughout a description, a text macro would be useful in that only one place in the source description would need to be altered if the value of the constant needed to be changed.

Macros are compiler directives that can be used for shortening repetitive code. Assume you have the function sign_extend(data, to_number_of_bits) and you would like to use as follows:

 reg_filen[inst.dest] = sign_extend(inst.imm11_0, 32) + reg_file[inst.src1] - sign_extend(inst.imm11_0, 32);

You know that you will always extend to 32 bits. To avoid the unnecessary argument 32, you would like to create a specialized version of this function. The most convenient solution would be to declare the following macro:

`define extend(imm) sign_extend(imm, 32}

Then you can use it as follows:

 reg_filen[inst.dest] = `extend(inst.imm11_0) + reg_file[inst.src1] - `extend(inst.imm11_0);

Defining macros like this is very convenient, however it is basically text substitution, so the compiler cannot help with programming mistakes like using an argument with a wrong type. Moreover macros are defined everywhere, which may cause naming problems in large projects. We should favor language components whenever possible to avoid programming errors.

Alternatives are:

  1. function

  2. let

function has a lengthy syntax and requires typed arguments and return value – useful for more cautious coding. let in turn does not require any typing and is the better alternative for convenient text substitutions.

let#

A let declaration defines a template expression (a let body), customized by its ports. A let construct may be instantiated in other expressions.

let declarations can be used for customization and can replace the text macros in many cases. The let construct is safer because it has a local scope, while the scope of compiler directives is global within the compilation unit. Including let declarations in packages ,.. is a natural way to implement a well-structured customization for the design code.

Assume you want to write a short function that assembles a word from non-neighboring struct components:

typedef struct packed {
    logic [20:20] imm20;
    logic [10:1] imm10_1;
    logic [11:11] imm11;
    logic [19:12] imm19_12;
    ...
} opcode_t;
opcode_t opcode;

let assemble_j_immediate(opcode) = {
   {{12{opcode.imm20}}, opcode.imm19_12, opcode.imm11, opcode.imm10_1, 1'b0};
};

If you want to be more cautious, you can also provide the argument type:

let assemble_j_immediate(logic[31:0] opcode) = {
...

Warning

As of 2024-04, Verilator does not support typed ports for let.

However let does not return, but only substitute. So you should use a function if you want to have a typed return value. The superpower of let is that you can define it between statements (compared to a function):

Another intended use of let is to provide shortcuts for identifiers or subexpressions. For example:

task write_value;
  input logic [31:0] addr;
  input logic [31:0] value;
  ...
endtask 
...
let addr = top.block1.unit1.base + top.block1.unit2.displ;
...
write_value(addr, 0);

casez, casex#

Case statement with do-not-cares

Two other types of case statements are provided to allow handling of do-not-care conditions in the case comparisons. One of these treats high-impedance values (z) as do-not-cares, and the other treats both high-impedance and unknown (x) values as do-not-cares. …

The syntax of literal numbers allows the use of the question mark (?) in place of z in these case statements. This provides a convenient format for specification of do-not-care bits in case statements.

For example:

casez (instruction)
    8'b1???????: add(instruction);
    8'b01??????: jump(instruction);
    8'b00??????: other(instruction);
endcase

Assigning (default) values to a data structure#

Scalars or packed arrays can be initialized by assigning a value, e.g., x = 0 or y = '1. In the following we see how we can initialize a structure.

module mod1;
  typedef struct {
    int x; int y;
  } st;

  st s1;
  int k = 1;
  initial begin
    #1 s1 = '{1, 2+k}; // by position
    #1 $display( s1.x, s1.y);
    #1 s1 = '{x:2, y:3+k}; // by name
    #1 $display( s1.x, s1.y);
    #1 $finish;
  end
endmodule

It can sometimes be useful to set structure members to a value without having to keep track of how many members there are or what the names are. This can be done with the default keyword:

initial s1 = '{default:2}; // sets x and y to 2

Expression bit lengths#

In hardware design involves operating with various bit lengths, e.g., adding a 4 and a 5 bit number. Systemverilog has rules that define the length of an expression based on the context and this can save code. Let us analyze the following module which generates a pulse at a given clock cycle and the corresponding testbench:

Listing 31 code/expression-bit-length-problem/m.sv#
module m #(
    int unsigned PULSE_AT = 100
) (
    input clk,
    output logic pulse
);
  logic [$clog2(PULSE_AT)-1:0] cntr;
  always_ff @(clk)++cntr;
  assign pulse = cntr == PULSE_AT;
endmodule
Listing 32 code/expression-bit-length-problem/tb.sv#
module tb #(
    int unsigned PULSE_AT = 100
);
  logic clk = 0, pulse;
  m #(PULSE_AT) m_i (.*);
  assign #1 clk = !clk;
  initial begin
    @(pulse);
    assert (m_i.cntr == PULSE_AT);
    @(m_i.cntr == 0);
    $finish;
  end
endmodule

Above we compare the 32 bit value PULSE AT and $clog(PULSE_AT) bit counter cntr. How would you compare values of different widths?

For comparison we should either truncate or expand the number of bits on one side. To not lose any information, it would probably make sense to expand the side that has less bits — cntr in our case. We don’t want to change the value during exchange, so we would extend the unsigned cntr with zeroes.

Indeed SystemVerilog determines the bit size according to \(\max[\mathrm{L}(i), \mathrm{L}(j)]\) — the maximum of the lengths of left and right operand. So cntr is expanded to 32 bits by automatic type conversion:

Automatic type conversions from a smaller number of bits to a larger number of bits involve zero extensions if unsigned or sign extensions if signed. Automatic type conversions from a larger number of bits to a smaller number of bits involve truncations of the most significant bits (MSBs). …

Let us test our thoughts by compiling above design:

%Warning-WIDTHEXPAND: m.sv:9:23: Operator EQ expects 32 bits on the LHS, but LHS's VARREF 'cntr' generates 7 bits.
                               : ... note: In instance 'tb.m_i'
    9 |   assign pulse = cntr == PULSE_AT;
      |                       ^~
                      ... For warning description see https://verilator.org/warn/WIDTHEXPAND?v=5.025
                      ... Use "/* verilator lint_off WIDTHEXPAND */" and lint_on around source to disable this message.
%Warning-WIDTHEXPAND: tb.sv:9:22: Operator EQ expects 32 bits on the LHS, but LHS's VARXREF 'cntr' generates 7 bits.
                                : ... note: In instance 'tb'
    9 |     assert (m_i.cntr == PULSE_AT);
      |                      ^~
make[1]: *** [makefile:34: obj_dir/sim] Error 1

😮 What is the problem with Verilator, even our code is syntactically correct?

Sometimes languages are too flexible to engineers and this flexibility may lead to mistakes. In this case Verilator maintainers chose to warn the user and stop the compilation. The problem is that SystemVerilog is also used for chip design and this industry tends to be very conservative due to the high costs of a single mistake.

There are two solutions in this case:

  1. Explicitly providing the width, e.g., by truncating the right operand: $bits(cntr)'(PULSE_AT).

  2. Acknowledging the lint error by using verilator lint_{off_on} WIDTHEXPAND pragmas

    1. around the problematic line or

    2. directly disabling at the beginning of the file.

I prefer disabling WIDTHEXPAND (and WIDTHTRUNC), as manually setting the width may lead to lengthy and less readable code. For example imagine a line which adds four inputs with overflow:

assign o = $bits(o)'(i1) + $bits(o)'(i2) + $bits(o)'(i3) + $bits(o)'(i4);

and compare it to:

// verilator lint_off WIDTHEXPAND
assign o = i1 + i2 + i3 + i4;

We are not designing chips where fixing minor mistakes later in manufacturing stage can cost millions of dollars. I believe automatic bit widths and type conversions lead to more understandable code in FPGA circuit design.

When working with Verilator, I recommend disabling WIDTHEXPAND and WIDTHTRUNC warnings by compiling your code using -Wno-WIDTH argument or disabling them on individual files at the beginning of the file.

Let us apply our approach to our previous design. If you only want to deactivate temporarily to be more cautious:

Listing 33 code/expression-bit-length-problem-solved-single-line/m.sv#
module m #(
    int unsigned PULSE_AT = 100
) (
    input clk,
    output logic pulse
);
  logic [$clog2(PULSE_AT)-1:0] cntr;
  always_ff @(clk)++cntr;
  // verilator lint_off WIDTHEXPAND
  assign pulse = cntr == PULSE_AT;
  // verilator lint_on WIDTHEXPAND
endmodule
Listing 34 code/expression-bit-length-problem-solved-single-line/tb.sv#
module tb #(
    int unsigned PULSE_AT = 100
);
  logic clk = 0, pulse;
  m #(PULSE_AT) m_i (.*);
  assign #1 clk = !clk;
  initial begin
    @(pulse);
    // verilator lint_off WIDTHEXPAND
    assert (m_i.cntr == PULSE_AT);
    // verilator lint_on WIDTHEXPAND
    @(m_i.cntr == 0);
    $finish;
  end
endmodule

Output:

🟩 Simulation start
- tb.sv:13: Verilog $finish

Deactivating altogether at the beginning of the file:

Listing 35 code/expression-bit-length-problem-solved/m.sv#
// verilator lint_off WIDTHEXPAND
module m #(
    int unsigned PULSE_AT = 100
) (
    input clk,
    output logic pulse
);
  logic [$clog2(PULSE_AT)-1:0] cntr;
  always_ff @(clk)++cntr;
  assign pulse = cntr == PULSE_AT;
endmodule
Listing 36 code/expression-bit-length-problem-solved/tb.sv#
// verilator lint_off WIDTHEXPAND
module tb #(
    int unsigned PULSE_AT = 100
);
  logic clk = 0, pulse;
  m #(PULSE_AT) m_i (.*);
  assign #1 clk = !clk;
  initial begin
    @(pulse);
    assert (m_i.cntr == PULSE_AT);
    @(m_i.cntr == 0);
    $finish;
  end
endmodule

Output:

🟩 Simulation start
- tb.sv:13: Verilog $finish

Deactivating for all files:

verilator -Wno-WIDTH ...

The same principle also applied when we assign a narrow array to a wider one and vice-versa:

Listing 37 code/verilator-lint-warning-width-expand/m.sv#
module m;
  byte wide_array;
  logic [2:0] narrow_array = 'b111;

  initial begin
    wide_array = narrow_array;
    $displayb(wide_array);
    $finish;
  end
endmodule
%Warning-WIDTHEXPAND: m.sv:6:16: Operator ASSIGN expects 8 bits on the Assign RHS, but Assign RHS's VARREF 'narrow_array' generates 3 bits.
                               : ... note: In instance 'm'
    6 |     wide_array = narrow_array;
      |                ^
                      ... For warning description see https://verilator.org/warn/WIDTHEXPAND?v=5.025
                      ... Use "/* verilator lint_off WIDTHEXPAND */" and lint_on around source to disable this message.
make[1]: *** [makefile:34: obj_dir/sim] Error 1

Fixed version:

Listing 38 code/verilator-lint-warning-width-expand-fixed/m.sv#
module m;
  byte wide_array;
  logic [2:0] narrow_array = 'b111;

  initial begin
    // verilator lint_off WIDTHEXPAND
    wide_array = narrow_array;
    // verilator lint_on WIDTHEXPAND
    $displayb(wide_array);

    // More cautious alternative
    wide_array = $bits(wide_array)'(narrow_array);
    // But lint_off approach preferred
    $displayb(wide_array);
    $finish;
  end
endmodule
🟩 Simulation start
00000111
00000111
- m.sv:15: Verilog $finish
Rules for expression bit lengths#

… A self-determined expression is one where the bit length of the expression is solely determined by the expression itself—for example, an expression representing a delay value.

A context-determined expression is one where the bit length of the expression is determined by the bit length of the expression and by the fact that it is part of another expression. For example, the bit size of the right-hand expression of an assignment depends on itself and the size of the left-hand side.

Table 1 Bit lengths resulting from self-determined expressions#

Expression

Bit length

Comments

Unsized constant number

At least 32 bits

Sized constant number

As given

i op j, where op is: + - * / % & | ^ ^~ ~^

\(\max(\mathrm{L}(i), \mathrm{L}(j))\)

op i, where op is: + - ~

\(\mathrm{L}(i)\)

i op j, where op is: === !== == != > >= < <=

1 bit

Operands are sized to \(\max(\mathrm{L}(i), \mathrm{L}(j))\)

i op j, where op is: && || –> <->

1 bit

All operands are self-determined

op i, where op is: & ~& | ~| ^ ~^ ^~ !

1 bit

All operands are (j is) self-determined

i op j, where op is: >> << ** >>> <<<

\(\mathrm{L}(i)\)

j is self-determined

i ? j: k

\(\max(\mathrm{L}(j), \mathrm{L}(k))\)

i is self-determined

{i,...,j}

\(\mathrm{L}(i)+...+\mathrm{L}(j)\)

All operands are self-determined

{i{j,...,k}}

\(i × (\mathrm{L}(j)+...+\mathrm{L}(k))\)

All operands are self-determined

Note that the expression a + b’s length is self-determined, however in the expression c = a + b, a + b’s length is determined by the context, namely by the length of c. Exercise 27 contains an example.

Exercise 27

Guess each expression bit size and the result of each expression:

Listing 39 code/expression-bit-length-examples/m.sv#
module m;
  logic [4:0] i = 1;
  `define E(expr) $write(i++); $write($bits(expr)); $write(" ➡️  "); $display(expr)
  logic [1:0] w2 = 2;
  logic [3:0] w4 = 14;
  logic [4:0] w5;

  // verilator lint_off WIDTHEXPAND
  // verilator lint_off WIDTHTRUNC
  initial begin
    `E(-1);
    `E(3'(-1));
    `E(w2 * w2);
    $display("🟠");

    `E(w2 * w4);
    `E(w2 * w4 + 0);
    `E(~w4);
    $display("🟡");

    `E(w2 >= w4);
    `E(w2 && w4);
    `E(1 | 0);
    $display("🟢");

    `E(^w4);
    `E(^w4 + 1);
    `E(w4 ** w2);
    $display("🔵");

    `E(w2 >> 1);  // >> is logical shift
    `E(w2 >>> 1);  // >>> is arithmetic shift
    `E($signed(w2) >> 1);
    `E($signed(w2) >>> 1);
    $display("🟣");

    `E(w2 ? w4 : w2);
    `E({w2, w4});
    `E({2{w2, 1'b1}});
    $display("🟤");

    `E(w4 + w2);
    `E((w4 + w2) >> 1);
    `E((w4 + w2 + 0) >> 1);
    w5 = w4 + w2;
    `E(w5);
    $display("⚫");

    $finish;
  end
endmodule

Signedness#

Binary add operation does not have to differentiate between signed and unsigned numbers, because processors use typically two’s complement and adding two numbers both with and without two’s complement lead to correct results.

However there are other operations like comparison, e.g., less-than, which may give different results based on the signedness of the operands. For example take the binary numbers 00 and 10:

  1. If both are unsigned, then 0 < 2.

  2. Else: 10 corresponds to -2 (01 + 1), so 0 > -2.

Packed logic arrays are unsigned as default:

… Vectors of reg, logic, and bit types shall be treated as unsigned quantities, unless declared to be signed or connected to a port that is declared to be signed (see 23.2.2.1 and 23.3.3.8).

To change signedness we can use:

  1. logic signed

  2. casting: signed'(x), unsigned'(x)

  3. system functions: $signed(x), $unsigned(x)

LRM references

The signedness can also be changed.

signed'(x)

… In addition to the cast operator, the $signed and $unsigned system functions are available …

Example which demonstrates the difference:

module m;
  logic [2:0] a = 'b000, b = 'b110;
  logic signed [2:0] s;

  logic signed [3:0] s4;
  logic [3:0] u4;

  initial begin
    $display(a < b);  // logic is unsigned as default
    $display(signed'(a) < signed'(b));
    $display;

    // verilator lint_off WIDTHTRUNC
    // verilator lint_off WIDTHEXPAND
    a = $unsigned(-1);
    $displayb(a);
    a = $unsigned(-2'sd1);
    $displayb(a);
    $display;

    s = $signed(3'b100);
    $display(s);
    $display;

    a = 1;
    b = 3'b110;
    s = a + b;  // unsigned addition
    $display(s);
    s = signed'(a) + signed'(b);  // signed addition
    $display(s);
    s = a + signed'(b);  // unsigned addition
    $display(s);
    $display;
    // Addition works the same for both signed and unsigned numbers on
    // hardware.

    s4 = a + b;
    $display(s4);
    s4 = signed'(a) + signed'(b);
    $display(s4);
    s4 = a + signed'(b);
    $display(s4);
    $display;
    // The results can be different due to sign extension

    u4 = a + b;  // unsigned addition
    $display(u4);
    u4 = signed'(a) + signed'(b);  // signed addition
    $display(u4);
    u4 = a + signed'(b);  // unsigned addition
    $display(u4);
    // Same as previous, but -1 is interpreted as 15 instead.

    $finish;
  end
endmodule

Output:

🟩 Simulation start
1
0

111
011

-4

-1
-1
-1

 7
-1
 7

 7
15
 7
- m.sv:54: Verilog $finish

Expression evaluation rules#

The following are the rules for determining the resulting type of an expression:

  • Expression type depends only on the operands. It does not depend on the left-hand side (if any).

  • The sign and size of any self-determined operand are determined by the operand itself and independent of the remainder of the expression.

  • For non-self-determined operands, the following rules apply:

    • If any operand is unsigned, the result is unsigned, regardless of the operator.

    • If all operands are signed, the result will be signed, regardless of operator, except when specified otherwise.

Automatic sign extension#

Remember that if we extend the width of signed array, the sign bit is extended. We can leverage this feature by declaring a variable as signed or casting to signed as follows:

module m;
  byte wide_signed = 0;  // byte is signed as default
  logic [7:0] wide_unsigned = 0;
  logic [1:0] narrow_unsigned = 'b11;
  logic signed [1:0] narrow_signed = 'b11;

  function automatic byte sign_extend_to_8bits(logic [1:0] arr);
    return {{8 - 2{arr[1]}}, arr};
  endfunction

  initial begin
    $displayb(sign_extend_to_8bits(narrow_signed));
    $display;

    // Instead of manual sign extension, we can simply use automatic width
    // expansion:
    $displayb(8'(narrow_unsigned));
    $displayb(8'(narrow_signed));
    $displayb(8'(signed'(narrow_unsigned)));
    $display;

    // verilator lint_off WIDTHEXPAND
    wide_signed += narrow_signed;
    wide_unsigned += narrow_signed;
    // verilator lint_on WIDTHEXPAND
    $displayb(wide_signed);
    $displayb(wide_unsigned);

    $finish;
  end
endmodule
🟩 Simulation start
11111111

00000011
11111111
11111111

11111111
00000011
- m.sv:29: Verilog $finish

In wide_unsigned += narrow_signed no sign extension takes place, because if any operand is unsigned, the result is unsigned.

We have a non-self-determined expression, so the expression is treated as unsigned.

So pay attention to have all signed operands if you want to keep the signedness of the signed operands. Otherwise signed operands will be converted to unsigned and no sign extension will occur.

Adding two variables with different widths also happens in RISC-V design, e.g., we add a 12 bit immediate to a wider register.

Command line input#

Until now a testbench did not have any port, so we could not input any data through the port. Then we introduced $readmem system task to input data to our simulation and to synthesis.

Assume we are implementing a programmable. One of its obvious inputs would be the program. Can $readmem help us to provide different programs?

TODO example with readmem

Alternatively we can use:

An alternative to reading a file to obtain information for use in the simulation is specifying information with the command to invoke the simulator. …

  • $test$plusargs(string)

  • $value$plusargs(user_string, variable)

$value$plusargs#
Listing 40 code/value-plusargs/m.sv#
module m;
  byte mem[2];

  string program_file;
  initial begin
    if ($value$plusargs("program=%s", program_file)) begin
      $readmemh(program_file, mem);
      $display("Programmed");
    end

    $display("%p", mem);
    $finish;
  end
endmodule
Listing 41 code/value-plusargs/program.mem#
1
2
$ obj_dir/sim +verilator+quiet
'{'h0, 'h0} 
- m.sv:12: Verilog $finish

With the plus-argument:

$ obj_dir/sim +verilator+quiet +program=program.mem
Programmed
'{'h1, 'h2} 
- m.sv:12: Verilog $finish
$test$plusargs#

$test$plusargs is useful for providing a boolean value to the simulation:

Listing 42 code/test-plusargs/m.sv#
module m;
  initial begin
    $display("Standard tests begin");
    #1 assert (42 == 42);
    // ...
    #1 $display("Standard tests end");

    if (!$test$plusargs("optional")) $finish;
    #1 $display("Optional tests begin");
    // ...
    $finish;
  end
endmodule
$ obj_dir/sim +verilator+quiet
Standard tests begin
Standard tests end
- m.sv:8: Verilog $finish

With the plus-argument:

$ obj_dir/sim +verilator+quiet +optional
Standard tests begin
Standard tests end
Optional tests begin
- m.sv:11: Verilog $finish

Verible linter recommends to only use $value$plusargs, however I still find it useful, because $value$plusargs always requires a variable to store the value.

Solution for the introductory problem#

Listing 43 code/riscv-single-cycle-only-addi-jal/mp.sv#
// verilator lint_off WIDTHEXPAND
import mp_pkg::*;
import riscv_instr::*;

module mp #(
    string MEM_INIT_HEX_FILE = "",  // Only used if non-empty
    int unsigned MEM_DEPTH_LG2 = 10  // In bytes
) (
    input clk,
    rst
);
  byte mem[2**MEM_DEPTH_LG2];
  word rf[RegfileSize], rfn[RegfileSize];  // Register file
  word pc, pcn;  // Program counter
  word inst;  // Current instruction

  assign inst = {mem[pc+3], mem[pc+2], mem[pc+1], mem[pc]};

  initial
    if (MEM_INIT_HEX_FILE.len != 0) begin
      $readmemh(MEM_INIT_HEX_FILE, mem);
      $display("Initialized memory using %s.", MEM_INIT_HEX_FILE);
    end

  always_ff @(posedge clk, posedge rst) begin
    pc <= rst ? 0 : pcn;
    rf <= rst ? '{default: '0} : rfn;
  end

  // Variables used in instruction parsing
  i_inst_t i_inst;
  j_inst_t j_inst;
  always_comb begin
    rfn = rf;
    pcn = pc + 4;  // Increment as default

    unique casez (inst)
      ADDI: begin
        i_inst = i_inst_t'(inst);
        rfn[i_inst.rd] = signed'(rf[i_inst.rs1]) + signed'(i_inst.imm11_0);
      end

      JAL: begin
        j_inst = j_inst_t'(inst);
        pcn = pc + assemble_j_imm(j_inst);
        rfn[j_inst.rd] = pc + 4;
      end
    endcase

    // Hardwire rf[0] to zero
    rfn[0] = 0;
  end
endmodule
Listing 44 code/riscv-single-cycle-only-addi-jal-boolean/mp_boolean.sv#
module mp_boolean (
    input clk,
    input [3:0] btn,
    output [15:0] led
);
  logic rst;
  mp #("program.mem") mp_i (.*);
  assign led = mp_i.rf[5][$bits(led)-1:0];
  assign rst = |btn;
endmodule

All source files:

Homework#

Exercise 28

Implement the integer register-immediate instructions SLTI, SLTIU, ANDI, ORI, XORI described in isa:_integer_register_immediate_instructions.

Hint

Using logic signed instead of logic for bit arrays may save you some coding.

For testing your design you can use the following code:

Listing 45 code/riscv-single-cycle-inst-i/program.S#
addi x1, zero, -1  # -1 corresponds to all ones
addi x2, zero, 1

slti x10, x1, -2
slti x11, x1, 0

sltiu x12, x2, 0
sltiu x13, x2, -1  # -1 corresponds to all ones even the instruction is unsigned

andi x14, x1, -1
andi x15, x1, 0

ori x16, x1, -1
ori x17, x1, 0

xori x18, x1, -1
xori x19, x1, 0

stop:
    j stop

You can simulate the code on the simulator to understand what it does.

The code must be translated to machine code before it can be loaded to the instruction memory:

riscv64-unknown-elf-as program.S \
	-o program.o
riscv64-unknown-elf-objcopy program.o \
	-O verilog \
	program.mem
rm program.o

*-as is the assembler and *-objcopy converts the object file *.o to another format — SV memory format in our case. The resulting program.mem can then be used for programming the processor:

Listing 46 code/riscv-single-cycle-inst-i/program.mem#
@00000000
93 00 F0 FF 13 01 10 00 13 A5 E0 FF 93 A5 00 00
13 36 01 00 93 36 F1 FF 13 F7 F0 FF 93 F7 00 00
13 E8 F0 FF 93 E8 00 00 13 C9 F0 FF 93 C9 00 00
6F 00 00 00

Note

If you want to translate assembler to Verilog memory file on your own, install riscv64-unknown-elf-binutils. It contains *-as and *-objcopy.

Use the following testbench for verifying your implementation:

Listing 47 code/riscv-single-cycle-inst-i/tb.sv#
import util::*;
module tb;
  logic clk, rst;
  mp #("program.mem") dut (.*);

  always #1 clk = !clk;

  initial begin
    rst = 1;
    @(posedge clk);
    rst = 0;

    @(dut.pc >> 2 == 1);
    // x1 must be -1
    assert (dut.rf[1] == -1);

    @(dut.pc >> 2 == 2);
    assert (dut.rf[2] == 1);


    @(dut.pc >> 2 == 3);
    assert (dut.rf[10] == 0);

    @(dut.pc >> 2 == 4);
    assert (dut.rf[11] == 1);


    @(dut.pc >> 2 == 5);
    assert (dut.rf[12] == 0);

    @(dut.pc >> 2 == 6);
    assert (dut.rf[13] == 1);


    @(dut.pc >> 2 == 7);
    assert (dut.rf[14] == -1);

    @(dut.pc >> 2 == 8);
    assert (dut.rf[15] == 0);


    @(dut.pc >> 2 == 9);
    assert (dut.rf[16] == -1);

    @(dut.pc >> 2 == 10);
    assert (dut.rf[17] == -1);


    @(dut.pc >> 2 == 11);
    assert (dut.rf[18] == 0);

    @(dut.pc >> 2 == 12);
    assert (dut.rf[19] == -1);

    repeat (3) begin
      @(posedge clk);
      // We stay in the same address due to the Jump
      assert (dut.pc >> 2 == 12);
    end

    $finish;
  end

  // x0 must be hardwired to zero
  always @(posedge clk) assert (dut.rf[0] == 0);

  initial dump_and_timeout;
endmodule

Implementation on the board: Our processor cannot read any data other than the instructions. But we cannot even change the instructions in our current implementation. For testing compare the last bits of the working registers 10 to 19 with the values in the assertions in tb.sv.

  • Choose the smallest memory size which can store your program. If it is not large enough, then part of your program cannot be executed. If it is too large, then the synthesis may take longer and your design may malfunction due to timing issues.

  • Connect the LSB of the registers 10 to 19 to the LEDs 0 to 9, respectively.

Compare the assertions in tb.sv to the LEDs.

RISC-V testing tools — Installation and execution#

Our goal is to implement all RISC-V instructions. Creating thorough tests for all RISC-V instructions manually, however, is tedious. Fortunately there are open-source RISC-V tests and we will use them for the next exercises.

tests for each instruction. Each unit test is based on a test program which will be executed both on your implementation and a reference implementation. Each test program writes resulting data to a memory address range that will be written to a file. Then the two files from your design under test — the DUT, and the reference will be compared for equality.

This is automatized by the tool RISCOF, which stands for RISC-V compatibility framework. RISCOF under the hood uses RISC-V architecture test suite which consists of assembly programs for testing each instruction and the reference implementation RISC-V Sail model. Sail is a language for describing an instruction set.

Files for RISCOF configuration#

We will create the folder riscof/ on the same level as your processor project directory, e.g., riscv-single-cycle/ so that you can use RISCOF for other processor implementations other than riscv-single-cycle. Let us assume that your processor implementation riscv-single-cycle is under the folder code/. riscof.tar.xz contains RISCOF-related files, download it at unpack it:

cd code/
tar xf riscof.tar.xz  # should create the directory `riscof` with RISCOF-related files

Installation#

There are two ways to install the tools:

  1. The easiest way is to use the Docker image that contains all the software used in this course.

  2. Manual installation

Docker image#

Mount your code folder to the container and execute the tests:

cd code
docker run --rm -it \
  --mount type=bind,src=.,dst=/root/code \
  registry.gitlab.com/goekce/sphinx-fpga

This will give you a prompt in the container. Then you can proceed with Execution.

Hint

To exit the container you can just exit or Ctrld.

Manual installation#

The following is based on RISCOF Quickstart.

  1. Install RISCOF (under riscof/):

    cd riscof
    python -m venv venv
    ./venv/bin/activate  # Prepends `./venv/bin` to your PATH
    pip install riscof
    
  2. Install RISC-V GNU toolchain. We need the bare metal (not to be run on an OS) tools riscv64-unknown-elf-binutils and -gcc.

  3. Create the reference simulation model riscv_sim_RV32:

    First install the OCaml package manager opam and the theorem prover z3. Then in the riscof folder:

    opam init --shell-setup --yes  # Initializes opam repository under ~/.opam
    opam install sail --yes  # For compiling the sail-riscv model
    eval $(opam env)  # Sets variables for the current environment
    git clone https://github.com/riscv/sail-riscv
    ARCH=RV32 make -C sail-riscv c_emulator/riscv_sim_RV32  # Compiles the model
    opam clean  # Deletes cached opam packages
    

    riscv_sim_RV32 will be available under riscof/sail-riscv/c_emulator/riscv_sim_RV32.

Execution#

  1. Change to your processor directory and create there riscof_config.ini.

    Listing 48 code/riscv-single-cycle/riscof_config.ini#
    [RISCOF]
    ReferencePlugin=sail_cSim
    ReferencePluginPath=../riscof/sail_cSim
    DUTPlugin=mp
    DUTPluginPath=../riscof/mp
    
    [mp]
    pluginpath=../riscof/mp
    ispec=../riscof/mp/mp_isa.yaml
    pspec=../riscof/mp/mp_platform.yaml
    sim_binary=obj_dir/sim
    
    [sail_cSim]
    pluginpath=../riscof/sail_cSim
    # If `riscv_sim_RV*` not in PATH, then set the directory of `riscv_sim_RV*` here:
    #sim_binary_path=
    

    Set sim_binary_path to sail-riscv/c_emulator.

  2. Place the following convenience script in your implementation directory that starts a test.

    Listing 49 code/riscv-single-cycle/run-tests.sh#
    riscof run \
        --config=riscof_config.ini \
        --suite=../riscof/riscv-arch-test/riscv-test-suite \
        --env=../riscof/riscv-arch-test/riscv-test-suite/env
    

    Make it executable and use it to test your implementation:

    chmod +x run-tests.sh
    

    This script executes riscof that we installed in a previous step.

  3. Before you run the tests, make sure to generate your simulation binary obj_dir/sim.

  4. Run the tests in the implementation folder using:

    ./run-tests.sh
    

    After you run-tests.sh, you should see lines similar to:

    INFO | ****** RISCOF: RISC-V Architectural Test Framework ... *******
    ...
    INFO | Following 39 tests have been run :
    ...
    ... rv32i_m/I/src/add-01.S : ... : ...
    ...
    

Exercise 29

Implement all the remaining instructions for a single cycle processor

I recommend implementing the following instruction types one by one:

  • rest of the integer register-immediate instructions

  • integer register-register operations

  • control transfer instructions

  • load and store instructions

You don’t have to implement:

  • performance features like return-address prediction stack in unconditional jumps

  • exceptions

  • memory ordering instructions

  • environment call and breakpoints

  • hint instructions

To avoid compiling our memory each time we load a new program, we introduce the following block in mp.sv. The environment variable SYNTHESIS is only defined during synthesis, but not in simulation:

Listing 50 code/riscv-single-cycle/mp.sv#
  initial begin
    // Read program and data memfile from command line in simulation only.
    // Synthesis cannot initialize memory using $value$plusargs (presumably).
`ifndef SYNTHESIS
    string mem_file;
    if ($value$plusargs("mem-file=%s", mem_file)) begin
      $readmemh(mem_file, mem);
      $display("Initialized memory using %s.", mem_file);
    end
`else
    if (MEM_INIT_HEX_FILE.len) begin
      $readmemh(MEM_INIT_HEX_FILE, mem);
      $display("Initialized memory using %s.", MEM_INIT_HEX_FILE);
    end
`endif
  end

Use the following testbench for your implementation.

Listing 51 code/riscv-single-cycle/tb.sv#
import util::*;
module tb;
  logic clk, rst;
  mp #(
      .MEM_INIT_HEX_FILE("program.mem"),
      .MEM_DEPTH_LG2(24)
  ) dut (
      .*
  );

  always #1 clk = !clk;

  initial dump_and_timeout(10_000_000);  // TODO set a value dependent on the length of each test
  int tohost_symbol_addr, test_signature_begin_addr, test_signature_end_addr;
  initial begin
    $value$plusargs("tohost-symbol-addr=%h", tohost_symbol_addr);
    $display("Address for communicating with the host (tohost): 0x%h", tohost_symbol_addr);

    $value$plusargs("test-signature-begin-addr=%h", test_signature_begin_addr);
    $display("Test signature begin addr: 0x%h", test_signature_begin_addr);

    $value$plusargs("test-signature-end-addr=%h", test_signature_end_addr);
    $display("Test signature end addr: 0x%h", test_signature_end_addr);

    rst = 1;
    @(posedge clk);
    #1 rst = 0;
    @(dut.mem[tohost_symbol_addr] == 1);
    $display("RV_MODEL_HALT message received.");
    dump_signature;
    $finish;
  end

  function static void dump_signature;
    string  filename = "DUT-mp.signature";
    integer fd = $fopen(filename);
    for (int unsigned addr = test_signature_begin_addr; addr < test_signature_end_addr; addr += 4)
      // Signature file has a word on each line
      $fwriteh(
          fd, "%h\n", {dut.mem[addr+3], dut.mem[addr+2], dut.mem[addr+1], dut.mem[addr]}
      );
  endfunction
endmodule

Note

Very likely some tests will not pass. Even if you implemented some instructions correctly, they may still not pass because many instructions use other instructions in the tests. For example a mistake in load/store instructions will most likely lead to failed tests in other instructions, because the test framework compares the results saved by a store instruction.

Hint

Advice on fixing errors in your implementation:

  1. Read the specification for the instruction again and compare it with your implementation.

  2. Pay attention to signedness and sign extension

  3. Check if your memory is large enough. 24 bit memory specified in tb.sv above is sufficient.

  4. Look at the differences between the output of your implementation and the reference’s. This output is also called signature.

    All the files created by the test are under riscof_work/. For example the following outputs the difference between the signatures for ADD:

    diff riscof_work/rv32i_m/I/src/add-01.S/{dut,ref}/*.signature
    

    If you cannot guess which instruction the faulty output belongs to, you have to look into the test program. To get a human-readable program listing, you can use:

    riscv64-unknown-elf-objdump --source riscof_work/rv32i_m/I/src/add-01.S/dut/my.elf > /tmp/my.lst
    

    Let us analyze the following excerpt from the listing:

    inst_587:
     // rs2_val == 4096, rs1_val == 4
     // opcode: add ; op1:x10; op2:x11; dest:x12; op1val:0x4;  op2val:0x1000
     TEST_RR_OP(add, x12, x10, x11, 0x1004, 0x4, 0x1000, x1, 168, x2)
         328c:       00400513                li      a0,4
         3290:       000015b7                lui     a1,0x1
         3294:       00b50633                add     a2,a0,a1
         3298:       0ac0a423                sw      a2,168(ra)
    

    This is the 587th test which tests the addition of 4096 and 4. It loads 4096 into x11(a1) and 4 into x10(a0), adds them and writes the result into x12(a2). Then the result is saved into the signature area.

    Using these traces and by locating the instruction in the waveform you may be able to understand your implementation error.

Hint

Persevering through implementation errors and fixing them is part of the learning process. If you think you tried enough or you are just curious, you may browse the mistakes that I made during my implementation.

  1. SLL uses only the last 5 bits of the source register. Not all. Shifting for example \(2^{32}-1\) times does not make any sense. 🤦

  2. SLTIU first sign-extends and then does a unsigned compare. If you use the automatic width expansion in SystemVerilog, this means that we first treat the immediate signed but then convert it to unsigned for the comparison. 🙂

  3. I had duplicate definitions of some instructions. These lead to warnings in Verilator, and may eventually stop the simulation. 😐

  4. I decoded some instructions as j_inst, however used i_inst for the actual operation. j and i look almost identical. The problem is that I declared all the variables for instruction type casting (e.g., i_inst, j_inst, b_inst, etc) before the case statement, because it is not possible to declare a variable inside a case statement. However this leads to other variables having all zeroes (at least in Verilator). I had to persevere through this bug 😣.

    In hindsight, I believe it would have been better to find a way without declaring all the instruction decode variables all at once.

Board implementation:

  • Use the same instructions from Exercise.

Warning

Synthesis of the design may take up to 3.5 GB of RAM. Parallel processes like a file indexer of your desktop environment (for accelerating file search) may exacerbate the resource consumption, because during synthesis about ~100 additional files are created, which wakes up the indexer.

I recommend saving your files before starting synthesis. If you have access to a remote computer, running synthesis there is also an option.

Compare the state of the LEDs with the result of Exercise. Very likely you will see a difference — the results will be wrong due to timing errors. We will solve this problem in the next chapter.