Memory#
Learning goals#
Apply features of the language to describe memory
Introductory problem#
You are designing a game on an FPGA that uses various sound effects. The sounds are stored as a sequence of bytes or in general words of arbitrary size. As a resource-efficient storage you want to use a random-access memory.
Design a memory module with the following requirements:
Must have the following interface:
module ram #( int unsigned DEPTH_LG2 = 10, int unsigned WORD_SIZE = 8 ) ( input clk, input [WORD_SIZE-1 : 0] din, // Data in input [DEPTH_LG2-1 : 0] din_addr, // Address input wen, // Write enable output logic [WORD_SIZE-1 : 0] dout, // Data out input [DEPTH_LG2-1 : 0] dout_addr // Address );
The testbench must write the RAM with values that correspond to the last significant bits of the address and then check the written values using the read operation.
Implementation on the board#
Use a depth of 16 and word size of 4
Connect the:
least significant slide switches to
din_addr
slide switches next to
din_addr
todin
most significant slide switches to
dout_addr
most significant LEDs to
dout
A button to
wen
Check the logs whether the synthesizer inferred a RAM from your description or not. If your description does not mimic the behavior of an on-chip RAM primitive, your description may be synthesized by other means. For example in your synthesis logs you should see:
| Module Name | RTL Object | Inference | Size (Depth x Width) | Primitives | +------------------+-------------+-----------+----------------------+------------| | ram_boolean_test | dut/mem_reg | Implied | 16 x 4 | RAM32M x 1 |
And:
+------+-------+------+ | |Cell |Count | +------+-------+------+ ... |* |RAM32M | 1|
RAM32M is a memory primitive available on 7 Series AMD FPGAs.
Testing on the board#
RAM will likely contain zeroes in the beginning. Write at least one value to an address as follows:
Set an address using
din_addr
and non-zero data (otherwise you won’t notice whether you wrote any data or not)din
Activate
wen
for at least one clock cycleFor reading, set the address you wrote to using
dout_addr
.The LEDs should show the data that you wrote. Pay attention that this data is only available on the address that you wrote to and not in other addresses.
Tasks#
Read section 7 of SV Guide.
Quiz#
TODO
Mini-lecture#
Memory infrastructure on FPGAs#
FF
Block RAM
high capacity but requires one clock cycle (i.e., synchronous) to read/write
distributed RAM using LUTs, also called SelectRAM
low capacity but immediate read (i.e., asynchronous) capability, one clock write
Ultra RAM, high-bandwidth memory (HBM)
Case study:
The synthesizer infers Block RAM or distributed RAM based on:
Memory size. For example our introductory example infers a distributed RAM. If we increase the
DEPTH_LG2
from 4 to 8, then we increase the size from 16*4 bits to 256*4 bits.Number of read/write ports: For example distributed RAM on 7 Series FPGAs support Quad port:
Distributed RAM configurations include:
…
Quad port
One port for synchronous writes and asynchronous reads
Three ports for asynchronous reads
However Block RAM supports only up to two ports:
… the two ports are symmetrical and totally independent, …
Read behavior: For example it is not possible to read the data on Block RAMs synchronously in the same clock cycle. Distributed RAM supports this.
Note that distributed RAM can also be inferred even we read the memory output asynchronously. The synthesizer does this by adding flip-flops to the memory data outputs. This happens if the memory size is under a threshold. This is the reason why the synthesizer infers a distributed RAM instead in our introductory example.
Ready-Valid Handshake#
a fundamental concept for communication between circuits
4.3 Ready-valid handshake in Common Design Patterns/Practices
Memory inference in synthesis#
Packed vs unpacked arrays#
packed:
logic [7:0] mem
the
logic
bits are very closely related, e.g., the bits of a byte
unpacked:
logic mem[8]
the
logic
bits are independent, e.g., a bit addressable RAM
Tracing large memory arrays in Verilator#
Verilator traces arrays up to a specific depth, so you may wonder why the waveform does not include some arrays. To include these, use the following argument:
verilator --trace-max-array 100 ...
# Traces up to a depth of 100
Initializing memory#
Manual
readmemh()
$readmemb(filename, memory_name [, start_addr[, finish_addr]]);
$readmemh(...
b
andh
stand for binary and hexadecimalVivado can synthesize
$readmem*
Example#
module ram #(
...
string INIT_HEX_FILE = "init.mem"
)
...
initial $readmemh(INIT_HEX_FILE, mem);
endmodule
1
2
3
4
5
6
7
8
9
a
b
c
d
e
f
0
module ram_boolean_test #(
int unsigned DEPTH_LG2 = 4,
int unsigned WORD_SIZE = 4
) (
input clk,
input [15:0] sw,
output [15:0] led,
input [3:0] btn
);
logic wen;
logic [WORD_SIZE-1 : 0] din, dout;
logic [DEPTH_LG2-1 : 0] din_addr, dout_addr;
ram #(
.DEPTH_LG2(DEPTH_LG2),
.WORD_SIZE(WORD_SIZE),
.INIT_HEX_FILE("test_data.mem")
) ram_i (
.*
);
assign din_addr = sw[0+:4], din = sw[4+:4], dout_addr = sw[15-:4], led[15-:4] = dout, wen = |btn;
endmodule
You have to add the initialization file to your project. Otherwise it won’t be visible to Vivado. If you use the extension .mem
, then Vivado will automatically recognize the file as memory initialization file.
Solution for the introductory problem#
module ram #(
int unsigned DEPTH_LG2 = 10,
int unsigned WORD_SIZE = 8
) (
input clk,
input [WORD_SIZE-1 : 0] din, // Data in
input [DEPTH_LG2-1 : 0] din_addr, // Address
input wen, // Write enable
output logic [WORD_SIZE-1 : 0] dout, // Data out
input [DEPTH_LG2-1 : 0] dout_addr // Address
);
type (din) mem[2**DEPTH_LG2];
always_ff @(posedge clk) begin
if (wen) mem[din_addr] <= din;
dout <= mem[dout_addr];
end
endmodule
We drive and check the signals @(negedge clk)
. Let us discuss the reason using the following example: If we would activate wen
right after @(posedge clk)
, then the RAM will update the memory immediately. However this behavior is not realistic. In reality, the signals are typically driven between two rising edges and must be driven before the setup time window of a flip-flop. We get a similar behavior by driving the signals after @(negedge clk)
.
To check the output in the same clock cycle after driving a combinational signal, a basic assert
won’t work. assert
checks the output
module tb #(
int unsigned DEPTH_LG2 = 10,
int unsigned WORD_SIZE = 8
);
logic clk, wen;
logic [WORD_SIZE-1 : 0] din, dout;
logic [DEPTH_LG2-1 : 0] din_addr, dout_addr;
ram dut (.*);
type (din) test_word;
always #1 clk = !clk;
initial begin
test_word = '1;
wen = 0;
@(posedge clk);
din = test_word;
din_addr = 1;
wen = 1;
@(posedge clk);
wen = 0;
dout_addr = 1;
@(posedge clk);
assert (test_word == dout);
@(posedge clk) $finish;
end
import util::dump_and_timeout;
initial dump_and_timeout;
endmodule
module ram_boolean_test #(
int unsigned DEPTH_LG2 = 4,
int unsigned WORD_SIZE = 4
) (
input clk,
input [15:0] sw,
output [15:0] led,
input [3:0] btn
);
logic wen;
logic [WORD_SIZE-1 : 0] din, dout;
logic [DEPTH_LG2-1 : 0] din_addr, dout_addr;
ram #(
.DEPTH_LG2(DEPTH_LG2),
.WORD_SIZE(WORD_SIZE)
) ram_i (
.*
);
assign din_addr = sw[0+:4], din = sw[4+:4], dout_addr = sw[15-:4], led[15-:4] = dout, wen = |btn;
endmodule
In every Verilator testbench we want to dump the signals and have a maximum simulation duration. We introduce the following task for this purpose:
package util;
task static dump_and_timeout(time t = 100, string dumpfile = "signals.fst");
/* Creates the signal dump for a waveform viewer and stops the
simulation after the timeout.
To be called in an `initial` block in a testbench.
*/
begin
$dumpfile(dumpfile);
$dumpvars;
#t $fatal("*** Timeout ***");
$finish;
end
endtask
task automatic clk_and_dump_and_timeout(ref logic clk, input time period = 2, time timeout = 100,
string dumpfile = "signals.fst");
/* Warning: if period has a unit, e.g., 2ps, and another file sets
* a larger precision, e.g., ns, then period may be converted to 0, because
* time datatype does not support floats */
fork
forever #(period / 2) clk = ~clk;
dump_and_timeout(timeout, dumpfile);
join
endtask
endpackage
`ifdef VERILATOR
// _verilator does not support assert final
// https://github.com/verilator/verilator/issues/5081
// Use #1 delay instead.
`define ASSERT_FINAL(arg) #1 assert (arg)
`else
`define ASSERT_FINAL(arg) assert final (arg)
`endif
Homework#
Describe a RAM which can read data in the same clock cycle but writes after a single clock cycle with the same interface as in the intoductory problem.
Verify the RAM by copying your testbench from the introductory problem and tweaking it.
Implement the RAM on the board similar to the introductory problem.
Which primitive was used for your description? Provide the synthesizer log snippet that proves your answer.
Implement a sound player. Requirements:
Has the following interface:
module sound_player #( string ROM_INIT_HEX_FILE = "sound.mem", int unsigned ROM_DEPTH_LG2 = 4 ) ( input clk, rst, output o );
The sound data is stored in a RAM which will be used read-only. The ROM is initialized using the bitstream (e.g.,
$readmemh
).The sound data in the memory has the following structure:
CLK_DIV
: Clock divisor value for creating the sound frequency for the current note.CLK_CYCLES
: Duration of the note played usingCLK_DIV
in clock cycles.CLK_DIV1 CLK_CYCLES1 CLK_DIV2 CLK_CYCLES2 ...
An example file follows:
2 // Creates a signal with a frequency of MAIN_CLK_FREQ/2 5 // ... for 5 clock cycles. 3 // MAIN_CLK_FREQ/3 7 // ... for 7 clock cycles. 0 // Sets the output to low 10 // ... 10 clock cycles. 0 // 0 // Skips.
Note that above values won’t create any hearable sound if the
MAIN_CLK_FREQ
is at MHz level. After you verified your module with these values, implement your module on the FPGA board and test it with hearable frequencies. For example sound.mem contains the Imperial March which was produced using notes_to_soundmem.py.If
CLK_DIV
is zero and one, then the outputo
is low and high, respectively.If
CLK_CYCLES
is zero, then the circuit skips immediately to the next note.Use the dynamic clock divider and RAM that we implemented before.
Board implementation:
Use the left-most slide switch for muting. The sound output is active when the slide switch is high.
Use the push buttons for reset.
Optional ideas:
Toggle an on-board LED each time a new note is encountered.
Configurable playback speed
Playback speed display using seven segment display
Configurable volume
Warning
Use a speaker with a volume knob if you did not implement configurable volume. The sudden full volume sound may be dangerous.