Solutions to exercises#
Solution to Exercise 3
Synthesis is the process of translating a behavioral code to simple digital logic components like gates and flip-flops, also called netlist.
Solution to Exercise 4
Implementation comes after synthesis. Netlist is the output of the synthesis step. A netlist needs to be mapped to the logic primitives available on the target FPGA and then routed. Mapping & routing is called implementation.
Solution to Exercise 5
Gate-level modeling does not require synthesis. It is the lowest-level of modeling that SV supports.
Netlist is similar to assembler code, i.e., instructions. Instructions can be directly fed to a processor and it does not need any translation. Compared to instructions, a netlist still needs to be mapped and routed, however.
Some reasons to use gate-level modeling:
We have a gate level description of a logic circuit, e.g., an obfuscated circuit.
We want to debug the circuit at the lowest-level
We want to optimize a circuit at the lowest-level
We should not write code at this level if we do not need to.
Solution to Exercise 1
SV is for describing and simulating digital systems and is compiled to static logic gates. SV is not for generating machine instructions
SV code is typically executed in parallel, there may be no specific order in how the code is executed.
Solution to Exercise 2
Verilog was not enough to cater the needs of digital designs that have been continuously getting more complex
SystemVerilog adds some modeling and verification features
Verilog and Systemverilog were integrated in 2009 into the language Systemverilog
Solution to Exercise 6
It can store 0
, 1
, X
(don’t care or unknown) or Z
(high-impedance or floating).
Solution to Exercise 7
these are typically relevant in the simulation
assigning
X
to a value means that we don’t care if it is0
or1
. The tool can use this information for optimizationsX
is also the default state in simulations. During simulation anX
shows that a signal has not been properly initialized.X
is also used if two signals drive (there are also different strengths, but we keep it simple here) another signal with different values at the same time.Z
is used if a signal is not driven by any strength. This is useful for modeling busses (e.g., CAN) where multiple participants can drive the same signal
Solution to Exercise 8
It is not 4-state. Using 4-state logic may ease the verification of circuits in simulation
We should use
int
only in for loops
Solution to Exercise 9
variable is one of the two data object groups in SystemVerilog beside nets.
wire
is a kind of net, i.e., a net type. (SV17-6.5 Nets and variables)Examples of other net types are are
uwire
,tri
,wor
(SV-2017-6.6 Net types)
A net (consequently also
wire
) can be assigned to only in a continuous assignment, but not procedurallyvar
is for data storage. It can be assigned to using procedural assignments including procedural continuous assignments.var
can be also written to using a continuous assignment likewire
.var
keeps the last value that has been written to.continuous assignment assigns values to nets or variables (SV17-10.2 Assignment statements - Overview)
procedural … to variables (ditto)
procedural continuous assignments are procedural statements which can be used outside procedural blocks (e.g.,
always
,initial
) to continuously drive a net or variable. These are:assign
,deassign
,force
,release
. (SV17-10.6 Procedural continuous assignments)
Solution to Exercise 10
logic
can be both a net or variable.logic
is a data type whereas nets and variables are data object groups.If we write
logic x
orvar x
this implicitly means:var logic x
(SV17-6.8 Variable declarations).If we want a net with a
logic
data type, then we can writewire x
, which implieswire logic x
, becauselogic
is the default data type.
Solution to Exercise 11
logic
is the most basic data type for modeling signalsWhen we declare a new signal as
logic
, then this signal is a variable as default (instead of net)Compared to nets, variables cannot have multiple (simultaneous) drivers
Typically we don’t have multiple drivers on a signal (e.g., inside the FPGA) Actually we could also use
var
instead oflogic
for declaringlogic
signals if we wanted to shorten our code, however most developers (seem to) knowlogic
instead ofvar
.
Solution to Exercise 12
These are primitives that are included in SystemVerilog as builtin modules
Solution to Exercise 13
m m_i(.a(x), .b(y), .c(z))
orm m_i(.c(z), .b(y), .a(x))
: named assignment.a
,b
,c
are module ports and the rest are signals that we connect to the portsm m_i(x, y, z)
: positional assignmentm m_i(.a, .b, .c)
: automatic assignment to the signalsa
,b
andc
.m m_i(.*)
: automatic assignment to the signalsa
,b
andc
(ifm
has the portsa
,b
andc
)m m_i(.*, .c(x))
automatic assignment with an exception of.c(x)
m m_i(.a, .b, .c(x))
automatic assignment with an exception of.c(x)
Solution to Exercise 14
assign o = sel ? a : b;
Solution to Exercise 15
assign x = ~y;
Solution to Exercise 16
assign x = a && b ? 2 : 3;
Solution to Exercise 17
assign light = ^switches;
Solution to Exercise 18
assign f = {a, {3{a, b, c}}, b};
Solution to Exercise 20
The signal changes every 1 time-unit. So signal’s period will be 2 time-units. The frequency is \(\frac{1}{2\cdot timeunits}\). The default time unit is ps
, so this equals to 500 GHz.
Solution to Exercise 22
repo:code/seven-segment-controller
package seven_segment_controller_pkg;
localparam int unsigned
DigitCountLg2 = 2,
DigitCount = 2 ** DigitCountLg2,
SegmentCount = 7,
MaxReprNumber = 10 ** DigitCount - 1,
BcdCount = $clog2(
10
);
typedef logic [BcdCount-1 : 0] bcd_t;
// Single BCD digit
typedef logic [$bits(bcd_t) * DigitCount - 1 : 0] bcds_t;
// Many BCD digits
typedef logic [$clog2(MaxReprNumber) - 1 : 0] number_t;
typedef logic [SegmentCount-1 : 0] segment_t;
segment_t bcd2seg[10] = '{
0: 'b0111111,
1: 'b0000110,
2: 'b1011011,
3: 'b1001111,
4: 'b1100110,
5: 'b1101101,
6: 'b1111101,
7: 'b0000111,
8: 'b1111111,
9: 'b1101111
}; // gfedcba
// Converts a BCD to its seven segment representation.
// a
// f b
// g
// e c
// d
function automatic bcds_t bin2bcd(number_t bin);
// Converts a binary digit to its BCD representation.
// Uses double-dabble algorithm (add 3 if greater than 4, then shift).
// https://en.wikipedia.org/wiki/Double_dabble
bcds_t bcds = '{default: 0};
for (int i = 0; i < $bits(bin); ++i) begin
// Check every BCD digit if > 4
for (int j = 0; j < DigitCount; ++j) begin
// $displayh(" i: ", i, " j: ", j,
// " bcds: ", bcds[BcdCount*(j+1)-1 -:BcdCount]);
// For debugging
if (bcds[BcdCount*(j+1)-1-:BcdCount] > 4) bcds[BcdCount*(j+1)-1-:BcdCount] += 3;
end
// Shift
bcds = {bcds[$left(bcds)-1 : 0], bin[$left(bin)-i]};
end
return bcds;
endfunction
endpackage
// Seven segment controller for unsigned integers
// - Uses only seven segments without dot or colon.
// - Seven segment signals are shared among digits.
module seven_segment_controller
import seven_segment_controller_pkg::*;
(
input clk,
rst,
input number_t number,
output logic [DigitCount - 1 : 0] digit, // Selects currently active digit
output segment_t segment // Currently active segments
);
logic[DigitCountLg2 - 1 : 0] digit_addr;
// Digit address stores the current digit id which is being illuminated. For
// example 1 would be the second least significant BCD digit. In every clock
// cycle the digit address is incremented which in turn determines which
// segments to illuminate dependent on the input `number`.
always_ff @(posedge clk, posedge rst) begin
digit_addr <= rst ? 0 : digit_addr + 1;
digit <=
rst ? 1 : {digit[$left(digit) -1 : 0],
digit[$left(digit)]
};
// `digit` is initialized with 1 and continuously shifted to the left.
// If `digit` would be created combinationally using `digit_addr`, then
// the more significant digits (e.g., 3 and 4) will be faded out, because
// the more significant digit signals will be less active compared to the
// less significant signals. The address logic requires more time to
// propagate to the more significant bits.
end
assign segment = bcd2seg[
{ bin2bcd(number) }
[BcdCount * digit_addr +: BcdCount]
];
endmodule
import seven_segment_controller_pkg::*;
module seven_segment_counter_test_boolean #(
int unsigned CLKDIV_FOR_1MS = 100e6 / 1000, // 1 ms
int unsigned MAX_NUMBER = 9999
) (
input clk,
input [3:0] btn,
output [3:0] D0_AN,
D1_AN,
output [7:0] D0_SEG,
D1_SEG
);
logic clk_1ms, rst;
clkdiv #(CLKDIV_FOR_1MS) clkdiv_1ms (
.i(clk),
.o(clk_1ms)
);
logic [$clog2(MAX_NUMBER+1)-1:0] cntr;
logic [DigitCount -1 : 0] digit0, digit1;
logic [SegmentCount - 1 : 0] segment0, segment1;
// Counter
always_ff @(posedge clk_1ms, posedge rst) cntr <= rst ? 0 : cntr + 1;
seven_segment_controller ssc0 (
.clk(clk_1ms),
.number(cntr),
.segment(segment0),
.digit(digit0),
.*
);
seven_segment_controller ssc1 (
.clk(clk_1ms),
.number(cntr),
.segment(segment1),
.digit(digit1),
.*
);
assign D0_AN = ~digit0;
assign D1_AN = ~digit1;
// Invert because digit signals are connected to PMOS transistors, which are active-low.
assign D0_SEG[6:0] = ~segment0;
assign D1_SEG[6:0] = ~segment1;
// Segment signals are also active-low
assign rst = |btn;
endmodule
Solution to Exercise 25
The difference is that we output the data asynchronously:
always_ff @(posedge clk) begin
if (wen) mem[din_addr] <= din;
end
assign dout = mem[dout_addr];
endmodule
When we look at the synthesis logs:
sed -n '/Final Mapping Report/,/Finished/p' prj/prj.runs/synth_1/runme.log
Distributed RAM: Final Mapping Report
+-----------------+---------------+-----------+----------------------+-------------+
|Module Name | RTL Object | Inference | Size (Depth x Width) | Primitives |
+-----------------+---------------+-----------+----------------------+-------------+
|ram_boolean_test | ram_i/mem_reg | Implied | 16 x 4 | RAM32M x 1 |
+-----------------+---------------+-----------+----------------------+-------------+
---------------------------------------------------------------------------------
Finished ROM, RAM, DSP, Shift Register and Retiming Reporting
Solution to Exercise 26
module sound_player #(
string ROM_INIT_HEX_FILE = "sound.mem",
int unsigned ROM_DEPTH_LG2 = 4
) (
input clk,
rst,
output o
);
typedef enum {
INIT_ROM_ADDR,
SET_DIVIDER_VALUE,
SET_DURATION,
WAIT
} st_t;
st_t st, stn;
integer unsigned divide_by, divide_byn, duration, durationn;
logic wen = 0;
logic [$bits(divide_by)-1 : 0] din = 0, dout;
logic [ROM_DEPTH_LG2-1 : 0] din_addr = 0, dout_addr, dout_addrn;
type (dout) wait_cntr, wait_cntrn;
clkdiv clkdiv (
.i(clk),
.o(o),
.divide_by(divide_by)
);
ram #(
.DEPTH_LG2(ROM_DEPTH_LG2),
.WORD_SIZE($bits(divide_by)),
.INIT_HEX_FILE(ROM_INIT_HEX_FILE)
) rom (
.*
);
always_ff @(posedge clk, posedge rst) begin
st <= rst ? st.first : stn;
dout_addr <= rst ? 0 : dout_addrn;
divide_by <= rst ? 0 : divide_byn;
duration <= rst ? 0 : durationn;
wait_cntr <= rst ? 0 : wait_cntrn;
end
always_comb begin
stn = st;
divide_byn = divide_by;
durationn = duration;
dout_addrn = dout_addr;
unique case (st)
INIT_ROM_ADDR: begin
// Note that dout_addr is 0 in this cycle and RAM requires one cycle
// to respond, so the data that we receive in next cycle will be mem[0]
dout_addrn = dout_addr + 1;
stn = st.next;
end
SET_DIVIDER_VALUE: begin
divide_byn = dout;
dout_addrn = dout_addr + 1; // For reading the next divider value
stn = st.next;
end
SET_DURATION: begin
wait_cntrn = dout;
if (dout == 0) stn = INIT_ROM_ADDR; // If duration zero, then skip
else stn = st.next;
end
WAIT: begin
wait_cntrn = wait_cntr - 1;
if (wait_cntr == 1) begin
stn = INIT_ROM_ADDR;
end
end
endcase
end
endmodule
module tb #(
string ROM_INIT_HEX_FILE = "test_data.mem",
int unsigned ROM_DEPTH_LG2 = 3
);
logic clk = 1, rst, o;
always #1 clk = !clk;
sound_player #(
.ROM_INIT_HEX_FILE,
.ROM_DEPTH_LG2
) dut (
.*
);
//FIXME Somehow dut.WAIT does not work
// probably related to https://github.com/verilator/verilator/issues/5229
typedef enum {
INIT_ROM_ADDR,
SET_DIVIDER_VALUE,
SET_DURATION,
WAIT
} st_t;
//FIXME
typedef type (dut.dout_addr) addr_t;
task static verify_period_and_duration(addr_t clkdiv_addr);
fork
// Create two parallel processes that wait for changes in the output
// (`o`) and state (`dut.st`). Waiting for changes in a single process
// would be less readable.
begin
time start_time, period;
let clkdiv = dut.rom.mem[clkdiv_addr];
if (clkdiv != 0) begin // Skip if div zero
// Period should be (2 * clkdiv_data_from_the_rom)
@(posedge o) start_time = $time;
@(posedge o) period = $time - start_time;
assert (period == 2 * time'(clkdiv))
else $error("period = %d != 2 * (mem = %d)", period, clkdiv);
end
end
begin
time start_time, time_passed;
let duration = dut.rom.mem[clkdiv_addr+1];
if (duration != 0) begin // Skip if duration zero
// Wait state duration must be (2 * duration_data_from_the_rom)
@(dut.st == WAIT) start_time = $time;
@(dut.st) time_passed = $time - start_time;
assert (time_passed == 2 * time'(dut.rom.mem[clkdiv_addr+1]))
else $error("time_passed = %d != 2 * (duration = %d)", time_passed, duration);
end
end
join
$display("[%4d] ✔️ clkdiv_addr %d", $time, clkdiv_addr);
endtask
initial begin
// Iterate over even addresses
for (int unsigned addr = 0; addr < 2 ** (ROM_DEPTH_LG2 - 1); ++addr)
verify_period_and_duration(addr_t'(addr * 2));
// Dut should start over from the beginning of the ROM
for (int unsigned addr = 0; addr < 2 ** (ROM_DEPTH_LG2 - 1); ++addr)
verify_period_and_duration(addr_t'(addr * 2));
$finish;
end
import util::dump_and_timeout;
initial dump_and_timeout(1000);
endmodule
module sound_player_boolean #(
string ROM_INIT_HEX_FILE = "sound.mem",
int unsigned ROM_DEPTH_LG2 = 6
) (
input clk,
output left_audio_out,
right_audio_out,
input [3:0] btn,
input [15:0] sw
);
logic mono_out;
sound_player #(
.ROM_INIT_HEX_FILE(ROM_INIT_HEX_FILE),
.ROM_DEPTH_LG2(ROM_DEPTH_LG2)
) sound_player (
.clk(clk),
.rst(|btn),
.o (mono_out)
);
assign left_audio_out = sw[15] & mono_out;
assign right_audio_out = sw[15] & mono_out;
endmodule
Solution to Exercise 27
module m;
logic [4:0] i = 1;
`define E(expr) $write(i++); $write($bits(expr)); $write(" ➡️ "); $display(expr)
logic [1:0] w2 = 2;
logic [3:0] w4 = 14;
logic [4:0] w5;
// verilator lint_off WIDTHEXPAND
// verilator lint_off WIDTHTRUNC
initial begin
`E(-1);
`E(3'(-1));
`E(w2 * w2);
$display("🟠");
`E(w2 * w4);
`E(w2 * w4 + 0);
`E(~w4);
$display("🟡");
`E(w2 >= w4);
`E(w2 && w4);
`E(1 | 0);
$display("🟢");
`E(^w4);
`E(^w4 + 1);
`E(w4 ** w2);
$display("🔵");
`E(w2 >> 1); // >> is logical shift
`E(w2 >>> 1); // >>> is arithmetic shift
`E($signed(w2) >> 1);
`E($signed(w2) >>> 1);
$display("🟣");
`E(w2 ? w4 : w2);
`E({w2, w4});
`E({2{w2, 1'b1}});
$display("🟤");
`E(w4 + w2);
`E((w4 + w2) >> 1);
`E((w4 + w2 + 0) >> 1);
w5 = w4 + w2;
`E(w5);
$display("⚫");
$finish;
end
endmodule
🟩 Simulation start
1 32 ➡️ -1
2 3 ➡️ -1
3 2 ➡️ 0
🟠
4 4 ➡️ 12
5 32 ➡️ 28
6 4 ➡️ 1
🟡
7 1 ➡️ 0
8 1 ➡️ 1
9 32 ➡️ 1
🟢
10 1 ➡️ 1
11 32 ➡️ 2
12 4 ➡️ 4
🔵
13 2 ➡️ 1
14 2 ➡️ 1
15 2 ➡️ 1
16 2 ➡️ -1
🟣
17 4 ➡️ 14
18 6 ➡️ 46
19 6 ➡️ 45
🟤
20 4 ➡️ 0
21 4 ➡️ 0
22 32 ➡️ 8
23 5 ➡️ 16
⚫
- m.sv:49: Verilog $finish
Solution to Exercise 28
repo:code/riscv-single-cycle-inst-i repo:code/riscv-single-cycle-inst-i-boolean
Most of the instructions (1) work with signed arithmetic and (2) do a sign extension. To save $signed
calls, we can use logic signed
for the immediates and the register file:
module mp_boolean (
input clk,
input [3:0] btn,
output logic [15:0] led
);
logic rst;
mp #(
.MEM_INIT_HEX_FILE("program.mem"),
.MEM_DEPTH_LG2(4)
) mp_i (
.*
);
always_comb begin
led = 0;
for (int reg_id = 10; reg_id <= 19; ++reg_id) led[reg_id-10] = mp_i.rf[reg_id][0];
end
assign rst = |btn;
endmodule
// verilator lint_off WIDTHEXPAND
import mp_pkg::*;
import riscv_instr::*;
module mp #(
string MEM_INIT_HEX_FILE = "", // Only used if non-empty
int unsigned MEM_DEPTH_LG2 = 10 // In bytes
) (
input clk,
rst
);
byte mem[2**MEM_DEPTH_LG2];
word_signed rf[RegfileSize], rfn[RegfileSize]; // Register file
word pc, pcn; // Program counter
word inst; // Current instruction
assign inst = {mem[pc+3], mem[pc+2], mem[pc+1], mem[pc]};
initial
if (MEM_INIT_HEX_FILE.len != 0) begin
$readmemh(MEM_INIT_HEX_FILE, mem);
$display("Initialized memory using %s.", MEM_INIT_HEX_FILE);
end
always_ff @(posedge clk, posedge rst) begin
pc <= rst ? 0 : pcn;
rf <= rst ? '{default: '0} : rfn;
end
// Variables used in instruction parsing
i_inst_t i_inst;
j_inst_t j_inst;
always_comb begin
rfn = rf;
pcn = pc + 4; // Increment as default
unique casez (inst)
// Integer register-immediate instructions
ADDI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] + i_inst.imm11_0;
end
SLTI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] < i_inst.imm11_0;
end
SLTIU: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = $unsigned(rf[i_inst.rs1]) < $unsigned(32'(i_inst.imm11_0));
// imm11_0 is sign extended before unsigned comparison
end
ANDI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] & i_inst.imm11_0;
end
ORI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] | i_inst.imm11_0;
end
XORI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] ^ i_inst.imm11_0;
end
JAL: begin
j_inst = j_inst_t'(inst);
pcn = pc + assemble_j_imm(j_inst);
rfn[j_inst.rd] = pc + 4;
end
endcase
// Hardwire rf[0] to zero
rfn[0] = 0;
end
endmodule
module mp_boolean (
input clk,
input [3:0] btn,
output logic [15:0] led
);
logic rst;
mp #(
.MEM_INIT_HEX_FILE("program.mem"),
.MEM_DEPTH_LG2(4)
) mp_i (
.*
);
always_comb begin
led = 0;
for (int reg_id = 10; reg_id <= 19; ++reg_id) led[reg_id-10] = mp_i.rf[reg_id][0];
end
assign rst = |btn;
endmodule
Solution to Exercise 29
package mp_pkg;
localparam int unsigned XLen = 32, RegfileSize = 32;
typedef logic [XLen-1 : 0] word;
typedef logic signed [XLen-1 : 0] word_signed;
typedef logic signed [31:2] pc_t; // Assume instructions are 4byte-aligned
// Instruction fields
typedef logic [$clog2(RegfileSize)-1:0] reg_id_t;
typedef logic [6:0] opcode_t;
typedef logic [2:0] funct3_t;
// `opcode`, `funct*` are only used for decoding, so they are not used as
// inputs for the ALU. So these fields are also named `unused*` in the
// following instruction definitions.
// Note that fields not used by the ALU have grey color in the ISA manual.
// All field names chosen according to the ISA manual
typedef struct packed {
logic signed [11:0] imm11_0;
reg_id_t rs1;
funct3_t unused1;
reg_id_t rd;
opcode_t unused2;
} i_inst_t; // Immediate
let shamt(imm11_0) = $unsigned(
imm11_0[4:0]
); // Immediate shift instructions only use 5 bits of the immediate
typedef struct packed {
logic signed [31:12] imm31_12;
reg_id_t rd;
opcode_t unused2;
} u_inst_t; // Upper-immediate
typedef struct packed {
logic [20:20] imm20;
logic [10:1] imm10_1;
logic [11:11] imm11;
logic [19:12] imm19_12;
reg_id_t rd;
opcode_t unused;
} j_inst_t; // Jump
// J instruction is a variant of U
// verilator lint_off UNUSEDSIGNAL
function automatic logic signed [20:0] assemble_j_imm(j_inst_t i);
// verilator lint_on UNUSEDSIGNAL
return $signed({i.imm20, i.imm19_12, i.imm11, i.imm10_1, 1'b0});
endfunction
typedef struct packed {
logic [6:0] unused1;
reg_id_t rs2, rs1;
funct3_t unused2;
reg_id_t rd;
opcode_t unused3;
} r_inst_t; // Register-register
typedef struct packed {
logic imm12;
logic [10:5] imm10_5;
reg_id_t rs2, rs1;
funct3_t unused1;
logic [4:1] imm4_1;
logic [11:11] imm11;
opcode_t unused2;
} b_inst_t; // Branch
// B instruction is a variant of S
// verilator lint_off UNUSEDSIGNAL
function automatic logic signed [12:0] assemble_b_imm(b_inst_t i);
// verilator lint_on UNUSEDSIGNAL
return $signed({i.imm12, i.imm11, i.imm10_5, i.imm4_1, 1'b0});
endfunction
typedef struct packed {
logic [11:5] imm11_5;
reg_id_t rs2, rs1;
funct3_t unused1;
logic [4:0] imm4_0;
opcode_t unused2;
} s_inst_t; // Store
// verilator lint_off UNUSEDSIGNAL
function automatic logic signed [11:0] assemble_s_imm(s_inst_t i);
// verilator lint_on UNUSEDSIGNAL
return $signed({i.imm11_5, i.imm4_0});
endfunction
endpackage
// verilator lint_off WIDTHEXPAND
// verilator lint_off WIDTHTRUNC
import mp_pkg::*;
import riscv_instr::*;
module mp #(
string MEM_INIT_HEX_FILE = "", // Only used if non-empty
int unsigned MEM_DEPTH_LG2 = 10 // In bytes
) (
input clk,
rst
);
byte mem[2**MEM_DEPTH_LG2];
word_signed mem_waddr, mem_wdata;
logic mem_wen;
word_signed rf[RegfileSize], rfn[RegfileSize]; // Register file
word_signed pc, pcn; // Program counter
word inst; // Current instruction
// assign inst = word'({<< byte{mem[pc+:4]}});
// Unpacked array slicing not supported in Verilator 2024-05-22
assign inst = {mem[pc+3], mem[pc+2], mem[pc+1], mem[pc]};
initial begin
// Read program and data memfile from command line in simulation only.
// Synthesis cannot initialize memory using $value$plusargs (presumably).
`ifndef SYNTHESIS
string mem_file;
if ($value$plusargs("mem-file=%s", mem_file)) begin
$readmemh(mem_file, mem);
$display("Initialized memory using %s.", mem_file);
end
`else
if (MEM_INIT_HEX_FILE.len) begin
$readmemh(MEM_INIT_HEX_FILE, mem);
$display("Initialized memory using %s.", MEM_INIT_HEX_FILE);
end
`endif
end
// [valueplusargs-block-end]
always_ff @(posedge clk, posedge rst) begin
pc <= rst ? 0 : pcn;
rf <= rst ? '{default: '0} : rfn;
if (rst);
else
unique casez (inst)
SW: {mem[mem_waddr+3], mem[mem_waddr+2], mem[mem_waddr+1], mem[mem_waddr]} <= mem_wdata;
SH: {mem[mem_waddr+1], mem[mem_waddr]} <= mem_wdata[15:0];
SB: mem[mem_waddr] <= mem_wdata[7:0];
default: ;
endcase
end
// Variables used in instruction parsing
i_inst_t i_inst;
u_inst_t u_inst;
j_inst_t j_inst;
r_inst_t r_inst;
b_inst_t b_inst;
s_inst_t s_inst;
always_comb begin
rfn = rf;
pcn = pc + 4; // Increment as default
mem_waddr = 0;
mem_wdata = 0;
mem_wen = 0;
unique case (inst) inside
// Integer register-immediate instructions
ADDI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] + i_inst.imm11_0;
end
SLTI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] < i_inst.imm11_0;
end
SLTIU: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = $unsigned(rf[i_inst.rs1]) < $unsigned(32'(i_inst.imm11_0));
// imm11_0 is sign extended before unsigned comparison
end
ANDI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] & i_inst.imm11_0;
end
ORI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] | i_inst.imm11_0;
end
XORI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] ^ i_inst.imm11_0;
end
SLLI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] << $unsigned(i_inst.imm11_0[4:0]);
// All shift instructions use the last 5 bits for the shift amount
end
SRLI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] >> $unsigned(i_inst.imm11_0[4:0]);
end
SRAI: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = rf[i_inst.rs1] >>> $unsigned(i_inst.imm11_0[4:0]);
end
LUI: begin
u_inst = u_inst_t'(inst);
rfn[u_inst.rd] = {u_inst.imm31_12, 12'b0};
end
AUIPC: begin
u_inst = u_inst_t'(inst);
rfn[u_inst.rd] = pc + {u_inst.imm31_12, 12'b0};
end
// Integer register-register operations
ADD: begin
r_inst = r_inst_t'(inst);
rfn[r_inst.rd] = rf[r_inst.rs1] + rf[r_inst.rs2];
end
SLT: begin
r_inst = r_inst_t'(inst);
rfn[r_inst.rd] = rf[r_inst.rs1] < rf[r_inst.rs2];
end
SLTU: begin
r_inst = r_inst_t'(inst);
rfn[r_inst.rd] = $unsigned(rf[r_inst.rs1]) < $unsigned(rf[r_inst.rs2]);
end
AND: begin
r_inst = r_inst_t'(inst);
rfn[r_inst.rd] = rf[r_inst.rs1] & rf[r_inst.rs2];
end
OR: begin
r_inst = r_inst_t'(inst);
rfn[r_inst.rd] = rf[r_inst.rs1] | rf[r_inst.rs2];
end
XOR: begin
r_inst = r_inst_t'(inst);
rfn[r_inst.rd] = rf[r_inst.rs1] ^ rf[r_inst.rs2];
end
SLL: begin
r_inst = r_inst_t'(inst);
rfn[r_inst.rd] = rf[r_inst.rs1] << rf[r_inst.rs2][4:0];
end
SRL: begin
r_inst = r_inst_t'(inst);
rfn[r_inst.rd] = rf[r_inst.rs1] >> rf[r_inst.rs2][4:0];
end
SUB: begin
r_inst = r_inst_t'(inst);
rfn[r_inst.rd] = rf[r_inst.rs1] - rf[r_inst.rs2];
end
SRA: begin
r_inst = r_inst_t'(inst);
rfn[r_inst.rd] = rf[r_inst.rs1] >>> rf[r_inst.rs2][4:0];
end
// Control transfer instructions
JAL: begin
j_inst = j_inst_t'(inst);
pcn = pc + assemble_j_imm(j_inst);
rfn[j_inst.rd] = pc + 4;
end
JALR: begin
i_inst = i_inst_t'(inst);
pcn = rf[i_inst.rs1] + i_inst.imm11_0;
pcn[0] = 0; // LSB bit must be zero
rfn[i_inst.rd] = pc + 4;
end
BEQ: begin
b_inst = b_inst_t'(inst);
pcn = rf[b_inst.rs1] == rf[b_inst.rs2] ? pc + assemble_b_imm(inst) : pc + 4;
end
BNE: begin
b_inst = b_inst_t'(inst);
pcn = rf[b_inst.rs1] != rf[b_inst.rs2] ? pc + assemble_b_imm(inst) : pc + 4;
end
BLT: begin
b_inst = b_inst_t'(inst);
pcn = rf[b_inst.rs1] < rf[b_inst.rs2] ? pc + assemble_b_imm(inst) : pc + 4;
end
BLTU: begin
b_inst = b_inst_t'(inst);
pcn = $unsigned(rf[b_inst.rs1]) < $unsigned(rf[b_inst.rs2]) ? pc + assemble_b_imm(inst) :
pc + 4;
end
BGE: begin
b_inst = b_inst_t'(inst);
pcn = rf[b_inst.rs1] >= rf[b_inst.rs2] ? pc + assemble_b_imm(inst) : pc + 4;
end
BGEU: begin
b_inst = b_inst_t'(inst);
pcn = $unsigned(rf[b_inst.rs1]) >= $unsigned(rf[b_inst.rs2]) ? pc + assemble_b_imm(inst) :
pc + 4;
end
// Load and store instructions
`define LADDR rf[i_inst.rs1] + i_inst.imm11_0
LW: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = {mem[`LADDR+3], mem[`LADDR+2], mem[`LADDR+1], mem[`LADDR]};
end
LH: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = $signed({mem[`LADDR+1], mem[`LADDR]});
end
LHU: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = {mem[`LADDR+1], mem[`LADDR]};
end
LB: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = mem[`LADDR];
end
LBU: begin
i_inst = i_inst_t'(inst);
rfn[i_inst.rd] = $unsigned(mem[`LADDR]);
end
SW, SH, SB: begin
s_inst = s_inst_t'(inst);
mem_wen = 1;
mem_wdata = rf[s_inst.rs2];
mem_waddr = rf[s_inst.rs1] + assemble_s_imm(s_inst);
end
// Memory-ordering instructions
FENCE: ;
endcase
// Hardwire rf[0] to zero
rfn[0] = 0;
end
endmodule
module mp_boolean (
input clk,
input [3:0] btn,
output logic [15:0] led
);
logic rst;
mp #(
.MEM_INIT_HEX_FILE("program.mem"),
.MEM_DEPTH_LG2(4)
) mp_i (
.*
);
always_comb begin
led = 0;
for (int reg_id = 10; reg_id <= 19; ++reg_id) led[reg_id-10] = mp_i.rf[reg_id][0];
end
assign rst = |btn;
endmodule
Solution to Exercise 30
Block RAM does not have synchronous read capability. So we have to use distributed RAM.
-
// RAM for the single-cycle microprocessor // - Two ports: // 1. Asynchronous read and synchronous write for data // 2. Asynchronous read for instructions // - `XLen` wide // - Byte enable signals for byte-wide store & load // - No support for unaligned addresses. LSBits removed. import mp_pkg::*; module mp_ram #( string INIT_HEX_FILE = "", // Only used if non-empty int unsigned DEPTH_LG2 = 5 // In log2(word) ) ( input clk, // Bus target word_signed daddr, // Data address (Read & write) wdata, // Written data output word_signed rdata, // Read data input wsize_t wsize, // Write size (0=>1b, 1=>2b, 2=4b) input word_signed iaddr, // Instruction address (Only read) output word_signed inst // Instruction (read) ); word mem[2**DEPTH_LG2]; always_ff @(posedge clk) unique case (wsize) ZERO: ; // Do not write BYTE: mem[to_word_addr(daddr)][8*daddr[1:0]+:8] <= wdata[8*daddr[1:0]+:8]; HWORD: mem[to_word_addr(daddr)][16*daddr[1]+:16] <= wdata[16*daddr[1]+:16]; WORD: mem[to_word_addr(daddr)] <= wdata; endcase assign rdata = mem[to_word_addr(daddr)]; assign inst = mem[to_word_addr(iaddr)]; initial begin // Read program and data memfile from command line in simulation only. // Synthesis cannot initialize memory using $value$plusargs (presumably). `ifndef SYNTHESIS string mem_file; if ($value$plusargs("mem-file=%s", mem_file)) begin $readmemh(mem_file, mem); $display("Initialized memory using %s.", mem_file); end `else if (INIT_HEX_FILE.len) begin $readmemh(INIT_HEX_FILE, mem); $display("Initialized memory using %s.", INIT_HEX_FILE); end `endif end endmodule
In the synthesis report you should see a distributed RAM primitive like:
Report Cell Usage: +------+---------+------+ | |Cell |Count | +------+---------+------+ |1 |BUFG | 1| |2 |LUT4 | 4| |3 |LUT5 | 32| |4 |RAM32X1D | 32| |5 |IBUF | 47| |6 |OBUF | 64| +------+---------+------+
RAM32X1D
is a 32 deep, 1 bit wide RAM.D
stands for dual port and it allows us to access the instructions and data at the same time. We need 32 of them, because we have 32 bit data width.In the utilization report you find the primitives that the design element
RAM32X1D
is based on:| Ref Name | Used | Functional Category | +----------+------+---------------------+ | RAMD32 | 64 | Distributed Memory | | OBUF | 64 | IO | | IBUF | 47 | IO | | LUT5 | 32 | LUT | | LUT4 | 4 | LUT | | BUFG | 1 | Clock | +----------+------+---------------------+
RAM32X1D
requires twoRAMD32
to carry out two read operations in parallel.
Solution to Exercise 31
repo:code/riscv-single-cycle-ram, repo:code/riscv-single-cycle-ram-boolean
Modifications:
--- /builds/fpga-lab/fpga-programming/code/riscv-single-cycle/mp.sv +++ /builds/fpga-lab/fpga-programming/code/riscv-single-cycle-ram/mp.sv @@ -4,52 +4,36 @@ import riscv_instr::*; module mp #( - string MEM_INIT_HEX_FILE = "", // Only used if non-empty - int unsigned MEM_DEPTH_LG2 = 10 // In bytes + string RAM_INIT_HEX_FILE = "", // Only used if non-empty + int unsigned RAM_DEPTH_LG2 = 10 // In words ) ( input clk, rst ); - byte mem[2**MEM_DEPTH_LG2]; - word_signed mem_waddr, mem_wdata; - logic mem_wen; word_signed rf[RegfileSize], rfn[RegfileSize]; // Register file word_signed pc, pcn; // Program counter word inst; // Current instruction - // assign inst = word'({<< byte{mem[pc+:4]}}); - // Unpacked array slicing not supported in Verilator 2024-05-22 - assign inst = {mem[pc+3], mem[pc+2], mem[pc+1], mem[pc]}; - - initial begin - // Read program and data memfile from command line in simulation only. - // Synthesis cannot initialize memory using $value$plusargs (presumably). -`ifndef SYNTHESIS - string mem_file; - if ($value$plusargs("mem-file=%s", mem_file)) begin - $readmemh(mem_file, mem); - $display("Initialized memory using %s.", mem_file); - end -`else - if (MEM_INIT_HEX_FILE.len) begin - $readmemh(MEM_INIT_HEX_FILE, mem); - $display("Initialized memory using %s.", MEM_INIT_HEX_FILE); - end -`endif - end - // [valueplusargs-block-end] + // verilator lint_off UNOPTFLAT + word_signed ram_daddr, ram_wdata, ram_rdata; + // verilator lint_on UNOPTFLAT + wsize_t ram_wsize; + mp_ram #( + .INIT_HEX_FILE(RAM_INIT_HEX_FILE), + .DEPTH_LG2(RAM_DEPTH_LG2) + ) ram ( + .clk, + .daddr(ram_daddr), + .wdata(ram_wdata), + .rdata(ram_rdata), + .wsize(ram_wsize), + .iaddr(pc), + .inst (inst) + ); always_ff @(posedge clk, posedge rst) begin pc <= rst ? 0 : pcn; rf <= rst ? '{default: '0} : rfn; - if (rst); - else - unique casez (inst) - SW: {mem[mem_waddr+3], mem[mem_waddr+2], mem[mem_waddr+1], mem[mem_waddr]} <= mem_wdata; - SH: {mem[mem_waddr+1], mem[mem_waddr]} <= mem_wdata[15:0]; - SB: mem[mem_waddr] <= mem_wdata[7:0]; - default: ; - endcase end // Variables used in instruction parsing @@ -63,9 +47,9 @@ rfn = rf; pcn = pc + 4; // Increment as default - mem_waddr = 0; - mem_wdata = 0; - mem_wen = 0; + ram_daddr = 0; + ram_wdata = 0; + ram_wsize = ZERO; unique case (inst) inside // Integer register-immediate instructions @@ -198,33 +182,33 @@ end // Load and store instructions - `define LADDR rf[i_inst.rs1] + i_inst.imm11_0 - LW: begin - i_inst = i_inst_t'(inst); - rfn[i_inst.rd] = {mem[`LADDR+3], mem[`LADDR+2], mem[`LADDR+1], mem[`LADDR]}; - end - LH: begin - i_inst = i_inst_t'(inst); - rfn[i_inst.rd] = $signed({mem[`LADDR+1], mem[`LADDR]}); - end - LHU: begin - i_inst = i_inst_t'(inst); - rfn[i_inst.rd] = {mem[`LADDR+1], mem[`LADDR]}; - end - LB: begin - i_inst = i_inst_t'(inst); - rfn[i_inst.rd] = mem[`LADDR]; - end - LBU: begin - i_inst = i_inst_t'(inst); - rfn[i_inst.rd] = $unsigned(mem[`LADDR]); + LW, LH, LHU, LB, LBU: begin + i_inst = i_inst_t'(inst); + ram_daddr = rf[i_inst.rs1] + i_inst.imm11_0; + unique casez (inst) + LW: rfn[i_inst.rd] = ram_rdata; + LH: rfn[i_inst.rd] = $signed(ram_rdata[ram_daddr[1]*16+:16]); + LHU: rfn[i_inst.rd] = ram_rdata[ram_daddr[1]*16+:16]; + LB: rfn[i_inst.rd] = $signed(ram_rdata[ram_daddr[1:0]*8+:8]); + LBU: rfn[i_inst.rd] = ram_rdata[ram_daddr[1:0]*8+:8]; + endcase end SW, SH, SB: begin s_inst = s_inst_t'(inst); - mem_wen = 1; - mem_wdata = rf[s_inst.rs2]; - mem_waddr = rf[s_inst.rs1] + assemble_s_imm(s_inst); + ram_wdata = rf[s_inst.rs2]; + ram_daddr = rf[s_inst.rs1] + assemble_s_imm(s_inst); + unique casez (inst) + SW: ram_wsize = WORD; + SH: begin + ram_wsize = HWORD; + ram_wdata <<= ram_daddr[1] * 16; + end + SB: begin + ram_wsize = BYTE; + ram_wdata <<= ram_daddr[1:0] * 8; + end + endcase end // Memory-ordering instructions
--- /builds/fpga-lab/fpga-programming/code/riscv-single-cycle-boolean/mp_boolean.sv +++ /builds/fpga-lab/fpga-programming/code/riscv-single-cycle-ram-boolean/mp_boolean.sv @@ -5,8 +5,8 @@ ); logic rst; mp #( - .MEM_INIT_HEX_FILE("program.mem"), - .MEM_DEPTH_LG2(4) + .RAM_INIT_HEX_FILE("program.mem"), + .RAM_DEPTH_LG2(2) ) mp_i ( .* );
1. Slice Logic -------------- +-------------------------+-------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +-------------------------+-------+-------+------------+-----------+-------+ | Slice LUTs | 16815 | 0 | 0 | 32600 | 51.58 | | LUT as Logic | 16815 | 0 | 0 | 32600 | 51.58 | | LUT as Memory | 0 | 0 | 0 | 9600 | 0.00 | | Slice Registers | 1535 | 0 | 0 | 65200 | 2.35 | | Register as Flip Flop | 1535 | 0 | 0 | 65200 | 2.35 | | Register as Latch | 0 | 0 | 0 | 65200 | 0.00 | | F7 Muxes | 1093 | 0 | 0 | 16300 | 6.71 | | F8 Muxes | 154 | 0 | 0 | 8150 | 1.89 | +-------------------------+-------+-------+------------+-----------+-------+ * Warning! LUT value is adjusted to account for LUT combining.
1. Slice Logic -------------- +----------------------------+-------+-------+------------+-----------+-------+ | Site Type | Used | Fixed | Prohibited | Available | Util% | +----------------------------+-------+-------+------------+-----------+-------+ | Slice LUTs | 14225 | 0 | 0 | 32600 | 43.63 | | LUT as Logic | 14194 | 0 | 0 | 32600 | 43.54 | | LUT as Memory | 31 | 0 | 0 | 9600 | 0.32 | | LUT as Distributed RAM | 31 | 0 | | | | | LUT as Shift Register | 0 | 0 | | | | | Slice Registers | 1023 | 0 | 0 | 65200 | 1.57 | | Register as Flip Flop | 1023 | 0 | 0 | 65200 | 1.57 | | Register as Latch | 0 | 0 | 0 | 65200 | 0.00 | | F7 Muxes | 279 | 0 | 0 | 16300 | 1.71 | | F8 Muxes | 3 | 0 | 0 | 8150 | 0.04 | +----------------------------+-------+-------+------------+-----------+-------+ * Warning! LUT value is adjusted to account for LUT combining.
The slice utilization has decreased.
Solution to Exercise 32
The phase frequency detector compares the frequency of the input signals. If the feedback signal has a lower frequency, it will increase the voltage so much that the input frequencies are the same again.
Division not active: \(F_\mathrm{i} = F_\mathrm{o}\), no voltage rise in the VCO input.
Division active: \(F_\mathrm{o}\) is divided by \(N\), so input frequencies are not equal anymore (\(F_\mathrm{i} \neq \frac{F_\mathrm{i}}{N}\)), thus voltage rises at VCO input to make up for this difference.
Frequency detector increases the voltage so that \(F_\mathrm{i} \overset!= k \frac{F_\mathrm{i}}{N}\), so the multiplication factor \(k\) will be \(N\).
Solution to Exercise 38
For exchanging large amounts of data, microprocessor and the FPGA use the RAM. The high-performance ports are interconnected to the memory through a private interconnect called Programmable Logic to Memory Interconnect
, where the general-purpose ports use the Central Interconnect
. The Central Interconnect
routes data from other peripherals, so data exchange through general-purpose ports will be slowed down by the peripherals.
Solution to Exercise 37
Examples:
f1
: memory-mapped bus interface forx
,y
and the return value, or simply a module with two 32 bit inputs and one 32 bit output with a valid signalf2
: two memory-mapped interfaces like AMBA AXI that can read/write from/to two RAMs in parallel.f3
’s input andf4
’s output can be similar tof2
’s interfaces, howeverf3
’s output andf4
’s input can be a FIFO-based interface, as these data are only exchanged betweenf3
andf4
and do not need to be part of a processor bus.
Solution to Exercise 39
The solution connects HP2
to the gmem1
. HP1
could be used too, however I remember reading that HP0
and HP1
sharing some resources. TODO