High-level synthesis#
Learning goals#
Integrate logic that was implemented through high-level synthesis into a system-on-a-chip
Understand how an accelerator interacts with a microprocessor on a system-on-a-chip
Introductory problem#
After implementing a processor you want to focus on your accelerator. As you remember, the you want to accelerate data signing on the FPGA. Implementing a data signing module in SystemVerilog can be very time-consuming, so you opt for writing your algorithms in a high-level language and synthesizing afterwards.
As a starter you come up with a simple calculation — multiplying two vectors component-wise and accumulating these:
#include "macc.hpp"
// Multiply and accumulate
int macc(vec_t &xs, vec_t &ys) {
int sum = 0;
for (auto i = 0; i < ARRAY_SIZE; ++i)
sum += xs[i] * ys[i];
return sum;
}
#include <array>
const auto ARRAY_SIZE = 1 << 8;
// Must be reasonably high. Otherwise Vitis does not infer an M-AXI interface
using vec_t = std::array<int, ARRAY_SIZE>;
int macc(vec_t &xs, vec_t &ys);
The corresponding testbench:
#include "macc.hpp"
#include <cstdlib> // exit, EXIT_FAILURE
#include <iostream>
int main() {
vec_t xs, ys;
unsigned int i = 0;
for (auto &x : xs)
x = ++i;
for (auto &y : ys)
y = 1;
auto sum = macc(xs, ys);
std::cout << "result: " << sum << std::endl;
std::cout << "expect: " << (ARRAY_SIZE + 1) * ARRAY_SIZE / 2 << std::endl;
// Assertion
if (sum != (ARRAY_SIZE + 1) * ARRAY_SIZE / 2)
exit(EXIT_FAILURE);
}
How would you integrate this functionality into a system-on-a-chip?
Requirements#
Use Vitis, which is a high-level synthesis tool by AMD. In this chapter we will use the PYNQ-Z2 platform to have hands-on experience with another kind of reconfigurable chips — a system-on-a-chip based on a hard processor and FPGA. The Jupyter notebook based interface to the FPGA will ease the interaction with our accelerator.
Integrate the accelerator and carry out a calculation.
Tasks#
Read chapter 22 of Zynq MPSoC book until including section 22.2.
Quiz#
Mini-lecture#
High-level synthesis#
Examples using other languages:
Accelerator design language based on Python
XLS: Accelerated HW Synthesis based on Rust
Interfaces#
In high-level programming, a function encapsulate algorithms with defined inputs and outputs. These inputs and outputs are declared as arguments and return value, respectively. However it is also possible to return a value by using references or pointers in the arguments.
You have seen many kind of interfaces until this chapter, e.g., memory interface for RAM and AMBA APB. Imagine you want to convert a C++ function to an HDL module. How would you synthesize the following function interfaces?
int f1(int x, int y);
void f2(int *input, int *output);
int* f3(int *input);
int* f4(int *input);
// to be used as f4(f3)
The circuit generated through HLS can have a plethora of interfaces, e.g.,
port/s which consist of as many bits as the argument/s to the synthesized function
a memory-mapped interface with address, data and enable signals, e.g., AMBA AXI-based bus
streaming of FIFO-based interface
a bus-based configuration interface to start/stop the circuit or check its status (e.g., idle, ready).
Tools#
Vitis Unified Software Platform#
Vitis Unified Software Platform caters for all software development aspects for FPGAs.
Vitis Unified IDE is an integrated development environment that acts as the frontend tool for the Vitis Unified Software Platform. Vitis Unified IDE can for example create an HDL implementation of a software-based algorithm, which can in turn be packaged as:
an IP that can be used in Vivado
an executable in
.xo
format that can be executed on an FPGA accelerator card running Xilinx runtime (XRT).
Vitis Unified IDE uses in the background the following tools:
Vitis HLS including
v++
Vivado for the circuit to bitstream flow
Let us refer to the Vitis Unified IDE as Vitis in the following sections.
PYNQ#
Python productivity for ZYNQ
As a framework#
a software framework which combines Python + hardware acceleration
a support structure comprising joined parts, arrangement of support beams that represent a building’s general shape
software providing generic functionality that can be selectively changed by additional code. This enables application-specific software
PYNQ can be seen as a starting point for ideas
Technically#
brings Linux, Jupyter notebooks and Python together
software components:
generic Python APIs for the accelerator
Linux drivers
hardware components:
hardware libraries (overlays), e.g., audio, video processing etc
Image processing example#
Let us try the example on Page 560 on MPSoC book together:
login to the Jupyter on a PYNQ board
open a terminal:
Right hand side: Click on
New🔽
Click on
Open a terminal
pip3 install pynq-helloworld --no-build-isolation pynq get-notebooks pynq-helloworld -p /home/xilinx/jupyter_notebooks
close the terminal
open the notebook
pynq-world/resizer_pl.ipynb
.note the processing time at the end
open the notebook
pynq-world/resizer_ps.ipynb
and compare the processing time
Key observations:
24 ms to 7 ms processing time
3x speed-up
Hardware architecture for the image resizer#
Resize IP
DMA
processor system (PS) <-AXI-> programmable logic (PL)
ARM processor <-AXI-GeneralPurpose-> Resize IP
Memory controller <-AXI-HighPerf-> DMA
data width converter between DMA and Resize IP based on AXI-stream datawidth converter
32 to 24 bit converter (RGB)
Look to the architectural overview of Zynq-7000 SoC. What could be the reason for the naming of general-purpose and high-performance ports?
Links#
Solution to the introductory problem#
First we synthesize our C++ code using Vitis and then use Vivado to implement our design on the board
Vitis#
Project creation#
For creating the project follow the following steps. If a setting is not mentioned, leave them in their default settings.
Start Vitis
If you don’t have an existing workspace, create one. I recommend setting your
code
folder as a workspace.
On the
Welcome
tab, click onHLS Development
->Create Component
->Create Empty HLS component
:Name and location
: I recommend:creating a folder for the sources, e.g.,
macc
and then using it aslocation
using simply
hls
as component name
Configuration File
: leave in their defaultsSource Files
:DESIGN FILES
: addmacc.cpp
andmacc.hpp
select
macc
as top functionTEST BENCH FILES
: addmain.cpp
Hardware
:xc7z020clg400-1
Settings
:clock
:50MHz
flow_target
:Vitis Kernel Flow Target
Summary
: Click onFinish
.
Executing the flow#
Click on
C SIMULATION
->▶Run
. Do not activateCode Analyzer
.The process should output:
result: 32896 expect: 32896
Run
C SYNTHESIS
:Notable output regarding the interface.
INFO: [RTGEN 206-500] Setting interface mode on port 'macc/gmem' to 'm_axi'. INFO: [RTGEN 206-500] Setting interface mode on port 'macc/xs' to 's_axilite & ap_none'. INFO: [RTGEN 206-500] Setting interface mode on port 'macc/ys' to 's_axilite & ap_none'. INFO: [RTGEN 206-500] Setting interface mode on function 'macc' to 's_axilite & ap_ctrl_chain'. INFO: [HLS 200-1030] Apply Unified Pipeline Control on module 'macc' pipeline 'VITIS_LOOP_6_1' pipeline type 'loop pipeline' INFO: [RTGEN 206-100] Bundling port 'return', 'xs' and 'ys' to AXI-Lite port control.
We see here the three aspects of an interface that we have seen before.
The same info can be also seen in
REPORTS
->Synthesis
Schedule Viewer
shows a time analysis of the generated circuit.Kernel Guidance
shows performance improvement recommendations.C/RTL COSIMULATION
allows a more accurate cycle-based analysis
PACKAGE
creates a package that can be imported in a Vivado project. The package is underhls/impl/ip/*.zip
.IMPLEMENTATION
places and routes the circuit using out-of-context (OOC) synthesis. Out-of-context implies that the environment is not synthesized, e.g., the processor that will use the circuit. OOC in context of Vitis is useful to get area and timing estimates for the circuit.
Integration of the IP in Vivado#
We want to connect the circuit to the hard ARM processor on the Zynq FPGA.
Start Vivado
Create a new
Project location
macc-single-channel-pynq-z2
under yourcode
folder. You can useprj
as yourProject name
.Project Type
:RTL Project
Add Sources
: noneAdd Constraints
: noneDefault Part
:xc7z020clg400-1
Click on
Finish
In the new project:
PROJECT MANAGER
-> `⚙ Settings’Project Settings
->IP
->Repository
->IP Repositories
Add a new repository with the path from the packaging step of the previous Vitis flow (ending with
hls/impl/ip
).Close settings
IP Integrator
->Create Block Design
->Design name
-> use the namemacc
Diagram
-> add IP using ➕ -> search formacc
and add itadd
ZYNQ7 Processing System
Configure the processing system using double-click
In
Zynq Block Design
click onHigh Performance AXI 32b/64b ...
Activate
S AXI HP0 Interface
.Click
OK
.Run Connection Automation
->All Automation
->OK
Run Block Automation
connects the fixed IO like DDR (Connecting them does not make a difference, however. Probably they are connected by the tool automatically.)In the project settings ->
General
->Top module name
-> writemacc
Generate Bitstream
Using a script for HLS and bitstream generation#
Instead of the whole GUI flow you can use the files under repo:code/macc-single-channel-pynq-z2 and use make
.
Using the overlay in PYNQ#
Login to the Jupyter environment on PYNQ. The default password is
xilinx
.Upload your overlay and handoff file.
Create a new notebook
Run the following code:
import numpy as np
from pynq import Overlay, allocate
ARRAY_SIZE = 2**8
overlay = Overlay("macc.bit")
# Allocate memory for data exchange between microprocessor and macc
xs = allocate(ARRAY_SIZE, np.int32)
ys = allocate(ARRAY_SIZE, np.int32)
# Convenience variable for the registers
regs = overlay.macc.register_map
# Initialize memory addresses for the input buffers `xs` and `ys` Vitis HLS
# generates 64 bit addresses as default, `*_1` and `*_2` are for the least and
# most significant bits of the address. PYNQ-Z2 is is based on a Zynq 7000
# series FPGA which includes a Cortex-A9 MPCore processor, which in turn has a
# 32bit data path.
regs.xs_1 = xs.device_address
regs.ys_1 = ys.device_address
# Fill input data
for i in range(ARRAY_SIZE):
xs[i] = i + 1
ys[i] = 1
# Wait until idle before starting the IP
while not regs.CTRL.AP_IDLE:
pass
regs.CTRL.AP_START = 1
# Wait until done
while not regs.CTRL.AP_DONE:
pass
display(regs)
assert regs.ap_return.ap_return == sum(xs) # Assuming ys[:] == 1
The result of the calculation will be under regs.ap_return
.
Warning
IP sticks at AP_START=1
if the calculation is restarted using AP_START=1
.
Workaround: If you want to repeat the calculation, reconfigure the FPGA by executing Overlay(...)
.
I could not fix this issue. Could be related to PYNQ, because the PYNQ version I used (3.0.1) is for Vivado version 2022.1.
Homework#
In the introductory problem we used the same AXI port for the inputs xs
and ys
. The following HLS configuration uses
two different AXI ports instead for parallel loading of
xs
andys
32 bit addressing instead of 64 bit for saving unnecessary address bits as PYNQ-Z2 does not require 64 bit addressing.
part=xc7z020clg400-1
[hls]
clock=50MHz
flow_target=vitis
package.output.syn=true
# https://docs.amd.com/r/en-US/ug1399-vitis-hls/Interface-Configuration
syn.interface.m_axi_auto_max_ports=1
# Creates a separate AXI port for each argument
syn.interface.m_axi_addr64=0
# Disables default 64bit address width. Results in 32 bit addresses
syn.file=../macc/macc.cpp
syn.file=../macc/macc.hpp
tb.file=../macc/main.cpp
syn.top=macc
Synthesize and package it using the config above
Create a Vivado project and connect it to the microprocessor by introducing a second high-performance port.
Check if the IP works using the following Python code we used before. Note that you have to replace
{x,y}s_1
with{x,y}s
, because we synthesize using 32 bit addresses this time.