Software Reference Manual: Open FPGA Stack
1.0 Introduction
1.1 Audience
The information presented in this document is intended to be used by software developers looking to increase their knowledge of the OPAE SDK user-space software stack and the kernel-space linux-dfl drivers. This information is intended as a starting point, with links to where users can deep dive on specific topics.
1.2 Terminology
Term | Abbreviation | Description |
---|---|---|
Open FPGA Stack | OFS | A modular collection of hardware platform components, open source software, and broad ecosystem support that provides a standard and scalable model for AFU and software developers to optimize and reuse their designs. |
Accelerator Functional Unit | AFU | Hardware Accelerator implemented in FPGA logic which offloads a computational operation for an application from the CPU to improve performance. Note: An AFU region is the part of the design where an AFU may reside. This AFU may or may not be a partial reconfiguration region. |
Board Management Controller | BMC | Supports features such as power sequence management and board monitoring through on-board sensors. |
FPGA Interface Manager | FIM | Provides platform management, functionality, clocks, resets and standard interfaces to host and AFUs. The FIM resides in the static region of the FPGA and contains the FPGA Management Engine (FME) and I/O ring. |
Platform Interface Manager | PIM | An interface manager that comprises two components: a configurable platform specific interface for board developers and a collection of shims that AFU developers can use to handle clock crossing, response sorting, buffering and different protocols. |
Intel Virtualization Technology for Directed I/O | Intel VT-d | Extension of the VT-x and VT-I processor virtualization technologies which adds new support for I/O device virtualization. |
Single-Root Input-Output Virtualization | SR-IOV | Allows the isolation of PCI Express resources for manageability and performance. |
Host Exerciser Module | HEM | Host exercisers are used to exercise and characterize the various host-FPGA interactions, including Memory Mapped Input/Output (MMIO), data transfer from host to FPGA, PR, host to FPGA memory, etc. |
Device Feature List | DFL | A concept inherited from OFS. The DFL drivers provide support for FPGA devices that are designed to support the Device Feature List. The DFL, which is implemented in RTL, consists of a self-describing data structure in PCI BAR space that allows the DFL driver to automatically load the drivers required for a given FPGA configuration. (link) |
Best Known Configuration | BKC | The exact hardware configuration Intel has optimized and validated the solution against. |
Open Programmable Acceleration Engine | OPAE | The OPAE SDK is a software framework for managing and accessing programmable accelerators (FPGAs). It consists of a collection of libraries and tools to facilitate the development of software applications and accelerators. The OPAE SDK resides exclusively in user-space. |
Memory Mapped Input/Output | MMIO | Users may map and access both control registers and system memory buffers with accelerators. |
FPGA Management Engine | FME | Performs reconfiguration and other infrastructure functions. Each FPGA device only has one FME. |
Input/Output Control | IOCTL | System calls used to manipulate underlying device parameters of special files. |
Virtual Function Input/Output | VFIO | An IOMMU/device agnostic framework for exposing direct device access to userspace. (link) |
Configuration and Status Register | CSR | Communication with the AFU is achieved by reading/writing CSRs and reading/writing shared memory buffers. |
Port | N/A | Represents the interface between the static FPGA fabric and a PR region containing an AFU. |
Advanced Error Reporting | AER | The PCIe AER driver is the extended PCI Express error reporting capability providing more robust error reporting. (link) |
2.0 OPAE Software Development Kit (SDK)
The OPAE C library is a lightweight user-space library that provides abstraction for FPGA resources in a compute environment. Built on top of the OPAE Intel® FPGA driver stack that supports Intel® FPGA platforms, the library abstracts away hardware specific and OS specific details and exposes the underlying FPGA resources as a set of features accessible from within software programs running on the host. The OPAE source code is available on the OPAE SDK repository, under the opae-sdk tag.
These features include the acceleration logic configured on the device, as well as functions to manage and reconfigure the device. The library enables user applications to transparently and seamlessly leverage FPGA-based acceleration.
Most of the information related to OPAE can be found on the official OFS Site page. The following is a summary of the information present on this web page:
- Configuration options present in the OPAE SDK build and installation flow
- The steps required to build a sample OPAE application
- An explanation of the basic application flow
- A reference for the C, C++, and Python APIs
- An explanation of the OPAE Linux Device Driver Architecture
- Definitions for the various user-facing OPAE SDK tools
The remaining sections on OPAE in this document are unique and build on basic principles explained in opae.github.io.
Table : Additional Websites and Links
Document | Link |
---|---|
OPAE SDK on github | OPAE SDK repository |
OPAE Documents | OFS Site |
pybind11 | https://pybind11.readthedocs.io/en/stable/ |
CLI11 | https://github.com/CLIUtils/CLI11 |
spdlog | https://github.com/gabime/spdlog |
2.0 OPAE C API
2.1 libopae-c
2.1.1 Device Abstraction
The OPAE C API relies on two base abstractions concerning how the FIM and accelerator are presented to and manipulated by the user. The FIM is concerned with management functionality. Access to the FIM and its interfaces is typically restricted to privileged (root) users. The accelerator contains the user-defined logic in its reconfigurable region. Most OPAE end-user applications are concerned with querying and opening the accelerator device, then interacting with the AFU via MMIO and shared memory.
2.1.1.1 Device types
The C enum fpga_objtype defines two variants. The FPGA_DEVICE variant corresponds to the FIM portion of the device, and the FPGA_ACCELERATOR refers to the accelerator, also known as the AFU.
An FPGA_DEVICE refers loosely to the sysfs tree rooted at the dfl-fme.X directory, for example /sys/class/fpga_region/region0/dfl-fme.0, and its associated device file /dev/dfl-fme.0.
An FPGA_ACCELERATOR refers loosely to the sysfs tree rooted at the dfl-port.X directory, for example /sys/class/fpga_region/region0/dfl-port.0, and its associated device file /dev/dfl-port.0.
The number X in dfl-fme.X and dfl-port.X refers to a numeric ID that is assigned by the DFL device driver to uniquely identify an instance of the FIM/Accelerator. Systems with multiple FPGA acceleration devices will have multiple dfl-fme.X’s and matching dfl-port.X’s.
2.1.1.2 Tokens and Handles
An fpga_token is an opaque data structure that uniquely represents an FPGA_DEVICE or an FPGA_ACCELERATOR. Tokens convey existence, but not ownership. Tokens are retrieved via the OPAE enumeration process described below using the fpgaEnumerate() call.
An fpga_handle is an opaque data structure that corresponds to an opened device instance, whether FPGA_DEVICE or FPGA_ACCELERATOR. A Handle is obtained from a token via the fpgaOpen() call. A handle conveys that the /dev/dfl-fme.X or /dev/dfl-port.X device file has been opened and is ready for interaction via its IOCTL interface.
2.1.2 Enumeration
Enumeration is the process by which an OPAE application becomes aware of the existence of FPGA_DEVICE’s and FPGA_ACCELERATOR’s. Refer to the signature of the fpgaEnumerate() call:
fpga_result fpgaEnumerate(const fpga_properties *filters,
uint32_t num_filters,
fpaa_token *tokens,
uint32_t max_tokens,
uint32_t *num_matches);
Figure 1 fpgaEnumerate()
The typical enumeration flow involves an initial call to fpgaEnumerate() to discover the number of available tokens.
Figure 2 Discovering Number of Tokens
Once the number of available tokens is known, the application can allocate the correct amount of space to hold the tokens:
fpga_token *tokens;
uint32_t num_tokens = num_matches;
tokens = (fpga_token *)calloc(num_tokens, sizeof(fpga_token));
fpgaEnumerate(NULL, 0, tokens, num_tokens, &num_matches);
Figure 3 Enumerating All Tokens
Note that parameters filters and num_filters were not used in the preceding example, as they were NULL and 0. When no filtering criteria are provided, fpgaEnumerate() returns all tokens that can be enumerated.
2.1.2.1 fpga_properties and Filtering
An fpga_properties is an opaque data structure used to retrieve all of the properties concerning an FPGA_DEVICE or FPGA_ACCELERATOR. These properties can be included in the filters parameter to fpgaEnumerate() to select tokens by specific criteria.
2.1.2.1.1 Common Properties
Table 3 lists the set of properties that are common to FPGA_DEVICE and FPGA_ACCELERATOR:
Property | Description |
---|---|
fpga_guid guid; | FPGA_DEVICE: PR Interface ID FPGA_ACCELERATOR: AFU ID |
fpga_token parent; | FPGA_DEVICE: always NULL FPGA_ACCELERATOR: the token of the corresponding FPGA_DEVICE, if any. Otherwise, NULL. |
fpga_objtype objtype; | FPGA_DEVICE or FPGA_ACCELERATOR |
uint16_t segment; | The segment portion of the PCIe address: ssss:bb:dd.f |
uint8_t bus; | The bus portion of the PCIe address: ssss:bb:dd.f |
uint8_t device; | The device portion of the PCIe address: ssss:bb:dd.f |
uint8_t function; | The function portion of the PCIe address: ssss:bb:dd.f |
uint64_t object_id; | A unique 64-bit value that identifies this token on the system. |
uint16_t vendor_id; | The PCIe Vendor ID |
uint16_t device_id; | The PCIe Device ID |
uint32_t num_errors; | The number of error sysfs nodes available for this token. |
fpga_interface interface; | An identifier for the underlying plugin-based access method. |
Table 3 Common Properties
2.1.2.1.2 FPGA_DEVICE Properties
Table 4 lists the set of properties that are specific to FPGA_DEVICE token types.
Property | Description |
---|---|
uint64_t bbs_id; | FIM-specific Blue Bitstream ID |
fpga_version bbs_version; | BBS version |
Table 4 FPGA_DEVICE Properties
2.1.2.1.3 FPGA_ACCELERATOR Properties
Table 5 lists the set of properties that are specific to FPGA_ACCELERATOR token types.
Property | Description |
---|---|
fpga_accelerator_state state; | Whether the Accelerator is currently open |
uint32_t num_mmio; | The number of MMIO regions available |
uint32_t num_interrupts; | The number of interrupts available |
Table 5 FPGA_ACCELERATOR Properties
Following is an example of using fpga_properties to enumerate a specific AFU:
#define NLB0_AFU "D8424DC4-A4A3-C413-F89E-433683F9040B"
fpga_properties filter = NULL;
fpga_guid afu_id;
fpgaGetProperties(NULL, &filter); // NULL: a new empty properties
fpgaPropertiesSetObjectType(filter, FPGA_ACCELERATOR);
uuid_parse(NLB0_AFU, afu_id);
fpgaPropertiesSetGuid(filter, afu_id);
fpgaEnumerate(&filter, 1, tokens, num_tokens, &num_matches);
Figure 4 Filtering During Enumeration
Note that fpga_properties and fpga_token’s are allocated resources that must be freed by their respective API calls, ie fpgaDestroyProperties() and fpgaDestroyToken().
2.1.3 Access
Once a token is discovered and returned to the caller by fpgaEnumerate(), the token can be converted into a handle by fpgaOpen(). Upon a successful call to fpgaOpen(), the associated /dev/dfl-fme.X (FPGA_DEVICE) or /dev/dfl-port.X (FPGA_ACCELERATOR) is opened and ready for use. Having acquired an fpga_handle, the application can then use the handle with any of the OPAE APIs that require an fpga_handle as an input parameter.
Like tokens and properties, handles are allocated resources. When a handle is no longer needed, it should be closed and released by calling fpgaClose().
2.1.4 Events
Event registration in OPAE is a two-step process. First, the type of event must be identified. The following fpga_event_type variants are defined:
Event | Description |
---|---|
FPGA_EVENT_INTERRUPT | AFU interrupt |
FPGA_EVENT_ERROR | Infrastructure error event (FME/Port Error) |
Table 6 FPGA Event Types
Once the desired event type is known, an fpga_event_handle is created via fpgaCreateEventHandle(). Once the event handle is available, the event notification is registered using fpgaRegisterEvent(). In the example below, note the use of the flags field for passing the desired IRQ vector when the event type is FPGA_EVENT_INTERRUPT. With the event registered, the application can then use fpgaGetOSObjectFromEventHandle() to obtain a file descriptor for use with the poll() system call. When the interrupt occurs, the file descriptor will be set to the signaled state by the DFL driver.
fpga_event_handle event_handle = NULL;
int fd = -1;
fpgaCreateEventHandle(&event_handle);
fpgaRegisterEvent(fpga_handle, FPGA_EVENT_INTERRUPT,
event_handle, irq_vector);
fpgaGetOSObjectFromEventHandle(event_handle, &fd);
Figure 5 Creating and Registering Events
When an event notification is no longer needed, it should be released by calling fpgaUnregisterEvent(). Like device handles, event handles are allocated resources that must be freed when no longer used. To free an event handle, use the fpgaDestroyEventHandle() call.
2.1.5 MMIO and Shared Memory
Communication with the AFU is achieved via reading and writing CSRs and by reading and writing to AFU/host shared memory buffers. An AFU’s CSRs are memory-mapped into the application process address space by way of the fpgaMapMMIO() call.
uint32_t mmio_num = 0;
fpgaMapMMIO(fpga_handle, mmio_num, NULL);
fpgaWriteMMIO64(fpga_handle, mmio_num, MY_CSR, 0xa);
Figure 6 Mapping and Accessing CSRs
The second parameter, mmio_num, is the zero-based index identifying the desired MMIO region. The maximum number of MMIO regions for a particular handle is found by accessing the num_mmio property. Refer to the fpgaPropertiesGetNumMMIO() call.
Once the AFU CSRs are mapped into the process address space, the application can use the fpgaReadMMIOXX() and fpgaWriteMMIOXX() family of functions, eg fpgaReadMMIO64() and fpgaWriteMMIO64(). When an MMIO region is no longer needed, it should be unmapped from the process address space using the fpgaUnmapMMIO() call.
Shared memory buffers are allocated by way of the fpgaPrepareBuffer() call.
fpga_result fpgaPrepareBuffer(fpga_handle handle,
uint64_t len,
void **buf_addr,
uint64_t *wsid,
int flags);
Figure 7 fpgaPrepareBuffer()
Three buffer lengths are supported by this allocation method:
Length | Description |
---|---|
4096 (4KiB) | No memory configuration needed. |
2097152 (2MiB) | Requires 2MiB huge pages to be allocated. |
1073741824 (1GiB) | Requires 1GiB huge pages to be allocated. |
Table 7 fpgaPrepareBuffer() Lengths
echo 8 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
echo 2 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
Figure 8 Configuring Huge Pages
The buf_addr parameter to fpgaPrepareBuffer() is a pointer to a void * that accepts the user virtual base address of the newly-created buffer. The wsid parameter is a pointer to a uint64_t that receives a unique workspace ID for the buffer allocation. This workspace ID is used in subsequent calls to fpgaReleaseBuffer(), which should be called when the buffer is no longer needed and in calls to fpgaGetIOAddress() which is used to query the IO base address of the buffer. The IO base address can be programmed into the AFU by means of the AFU CSR space. For example, here is a code snippet from the hello_fpga sample that demonstrates programming a shared buffer’s IO base address into an AFU CSR in MMIO region 0:
#define LOG2_CL 6
#define CACHELINE_ALIGNED_ADDR(p) ((p) >> LOG2_CL)
fpgaGetIOAddress(accelerator_handle, input_wsid, &iova);
fpgaWriteMMIO64(accelerator_handle, 0, nlb_base_addr + CSR_SRC_ADDR,
CACHELINE_ALIGNED_ADDR(iova));
Figure 9 Programming Shared Memory
If applications need to map a shared buffer that has been allocated by some other means than fpgaPrepareBuffer(), then the flags parameter can be set to FPGA_BUF_PREALLOCATED. This causes fpgaPrepareBuffer() to skip the allocation portion of the call and to only memory map the given buf_addr into the application process address space.
Buffers can also be allocated and mapped as read-only by specifying FPGA_BUF_READ_ONLY.
2.1.6 Management
The management feature in OPAE concerns re-programming the programmable region of the Port. To program the Port bitstream, pass a handle to the FPGA_DEVICE associated with the desired Port. The slot parameter identifies which Port to program in the case of multi-port implementations. Most designs will only pass zero as the slot parameter. The bitstream parameter is a buffer that contains the entire bitstream contents, including the JSON bitstream header information. The bitstream_len field gives the length of bitstream in bytes.
fpgaReconfigureSlot() first checks whether the FPGA_ACCELERATOR corresponding to the FPGA_DEVICE in fme_handle is open. If it is open, then the programming request is aborted with an error code. The application may pass FPGA_RECONF_FORCE in the flags parameter in order to avoid this open check and forcefully program the bitstream.
fpga_result fpgaReconfigureSlot(fpga_handle fme_handle,
uint32_t slot,
const uint8_t *bitstream,
size_t bitstream_len,
int flags);
Figure 10 fpgaReconfigureSlot()
2.1.7 Errors
The OPAE errors API provides a means to query and clear both FPGA_DEVICE and FPGA_ACCELERATOR errors. Each FPGA device exports a collection of error registers via the DFL drivers’ sysfs tree, for both the FME and the Port. Each register is typically an unsigned 64-bit mask of the current errors, where each bit or some collection of bits specifies an error type. An error is signaled if its bit or collection of bits is non-zero. Note that the 32-bit error index may vary from one process execution to the next. Applications should use fpgaGetErrorInfo() and examine the error name returned in the struct fpga_error_info to identify the desired 64-bit error mask.
Figure 11 struct fpga_error_info
Each 64-bit mask of errors is assigned a unique 32-bit integer index and a unique name. Given an fpga_token and an error index, fpgaGetErrorInfo() retrieves the struct fpga_error_info corresponding to the error.
fpga_result fpgaGetErrorInfo(fpga_token token,
uint32_t error_num,
struct fpga_error_info *error_info);
Figure 12 fpgaGetErrorInfo()
fpgaReadError() provides access to the raw 64-bit error mask, given the unique error index. fpgaClearError() clears the errors for a particular index. fpgaClearAllErrors() clears all the errors for the given fpga_token.
2.1.8 Metrics
The OPAE metrics API refers to a group of functions and data structures that allow querying the various device metrics from the Board Management Controller component of the FPGA device. A metric is described by an instance of struct fpga_metric_info.
typedef struct fpga_metric_info {
uint64_t metric_num;
fpga_guid metric_guid;
char qualifier_name[FPGA_METRIC_STR_SIZE];
char group_name[FPGA_METRIC_STR_SIZE];
char metric_name[FPGA_METRIC_STR_SIZE];
char metric_units[FPGA_METRIC_STR_SIZE];
enum fpga_metric_datatype metric_datatype;
enum fpga_metric_type metric_type;
} fpga_metric_info;
Figure 13 fpga_metric_info
The group_name field holds a string describing the broad categorization of the metric. Some sample values for group_name are “thermal_mgmt” and “power_mgmt”. The metric_name field contains the metric’s name. The number and names of metrics may vary from one FPGA platform to the next. The qualifier_name field is a concatenation of group_name and metric_name, with a colon character in between. The metric_units field contains the string name of the unit of measurement for the specific metric. Some examples for metric_units are “Volts”, “Amps”, and “Celsius”.
The metric_datatype field uniquely identifies the underlying C data type for the metric’s value:
enum fpga_metric_datatype {
FPGA_METRIC_DATATYPE_INT,
FPGA_METRIC_DATATYPE_FLOAT,
FPGA_METRIC_DATATYPE_DOUBLE,
FPGA_METRIC_DATATYPE_BOOL,
FPGA_METRIC_DATATYPE_UNKNOWN
};
Figure 14 enum fpga_metric_datatype
The metric_type field classifies the metric into a broad category. This information is redundant with the group_name field.
enum fpga_metric_type {
FPGA_METRIC_TYPE_POWER,
FPGA_METRIC_TYPE_THERMAL,
FPGA_METRIC_TYPE_PERFORMANCE_CTR,
FPGA_METRIC_TYPE_AFU,
FPGA_METRIC_TYPE_UNKNOWN
};
Figure 15 enum fpga_metric_type
In order to enumerate the information for each of the metrics available from the FPGA device, determine the number of metrics using fpgaGetNumMetrics().
Figure 16 Determining Number of Metrics
This call retrieves the number of available metrics for the FPGA_DEVICE that is opened behind the handle parameter to the call. Refer to 2.1.3 Access for information about the fpgaOpen() call. When the number of available metrics is known, allocate a buffer large enough to hold that many fpga_metric_info data structures, and call fpgaGetMetricsInfo() to populate the entries:
fpga_metric_info *metric_info;
uint64_t metric_infos = num_metrics;
metric_info = calloc(num_metrics, sizeof(fpga_metric_info));
fpgaGetMetricsInfo(handle, metric_info, &metric_infos);
Figure 17 Querying Metrics Info
The fpga_metric structure is the representation of a metric’s value:
Relevant Links: - metric_valueFigure 18 struct fpga_metric
The metric_num field matches the metric_num field of the fpga_metric_info structure. value contains the metric value, which is encoded in the C data type identified by the metric_datatype field of fpga_metric_info. Finally, the isvalid field denotes whether the metric value is valid.
There are two methods of obtaining a metric’s value, given the information in the fpga_metric_info structure:
2.1.8.1 Querying Metric Values by Index
fpgaGetMetricsByIndex() retrieves a metric value using the metric_num field of the metric info:
uint64_t metric_num = metric_info[0]->metric_num;
fpga_metric metric0;
fpgaGetMetricsByIndex(handle, &metric_num, 1, &metric0);
Figure 19 Retrieve Metric by Index
This call allows retrieving one or more metric values, each identified by their unique metric_num. The second and fourth parameters allow passing arrays so that multiple values can be fetched in a single call.
2.1.8.2 Querying Metric Values by Name
fpgaGetMetricsByName() retrieves a metric value using the metric_name field of the metric info:
char *metric_name = metric_info[1]->metric_name;
fpga_metric metric1;
fpgaGetMetricsByName(handle, &metric_name, 1, &metric1);
This call also allows retrieving one or more metric values, each identified by their unique metric_name. The second and fourth parameters allow passing arrays so that multiple values can be fetched in a single call.
The fpgaGetMetricsThresholdInfo() call is provided for legacy implementations only. It should be considered deprecated for current and future FPGA designs.
2.1.9 SysObject
When the hardware access method in use is the DFL drivers (see 2.3.2 libxfpga Plugin), the sysfs tree rooted at the struct _fpga_token’s sysfspath member is accessible via the OPAE SDK SysObject API. The SysObject API provides an abstraction to search, traverse, read, and write sysfs entities. These sysfs entities may take the form of directories, which are referred to as containers, or files, which are referred to as attributes. Figure 20 enum fpga_sysobject_type shows the API’s means of distinguishing between the two types.
Figure 20 enum fpga_sysobject_type
The SysObject API introduces another opaque structure type, fpga_object. An fpga_object can be queried from an fpga_token or an fpga_handle by way of the fpgaTokenGetObject() and fpgaHandleGetObject() API’s.
fpga_result fpgaTokenGetObject(fpga_token token, const char *name,
fpga_object *object, int flags);
fpga_result fpgaHandleGetObject(fpga_handle handle, const char *name,
fpga_object *object, int flags);
Figure 21 fpgaTokenGetObject() / fpgaHandleGetObject()
The remainder of the SysObject API is broken into two categories of calls, depending on the fpga_object’s type. The type of an fpga_object is learned via fpgaObjectGetType().
Figure 22 fpgaObjectGetType()
When an fpga_object is no longer needed, it should be freed via fpgaDestroyObject().
2.1.9.1 FPGA_OBJECT_CONTAINER API’s
For directory sysfs entities, passing a value of FPGA_OBJECT_RECURSE_ONE or FPGA_OBJECT_RECURSE_ALL in the flags parameter to fpgaTokenGetObject() or fpgaHandleGetObject() causes these two API’s to treat the target object as either a single-layer or multi-layer directory structure, making its child entities available for query via fpgaObjectGetObject() and fpgaObjectGetObjectAt().
fpga_result fpgaObjectGetObject(fpga_object parent, const char *name,
fpga_object *object, int flags);
fpga_result fpgaObjectGetObjectAt(fpga_object parent, size_t idx,
fpga_object *object);
Figure 23 fpgaObjectGetObject() / fpgaObjectGetObjectAt()
Any child object resulting from fpgaObjectGetObject() or fpgaObjectGetObjectAt() must be freed via fpgaDestroyObject() when it is no longer needed.
2.1.9.2 FPGA_OBJECT_ATTRIBUTE API’s
Attribute sysfs entities may be queried for their size and read from or written to. In order to determine the size of an attribute’s data, use fpgaObjectGetSize().
Figure 24 fpgaObjectGetSize()
Attributes containing arbitrary string data can be read with fpgaObjectRead().
Figure 25 fpgaObjectRead()
If an attribute contains an unsigned integer value, its value can be read with fpgaObjectRead64() and written with fpgaObjectWrite64().
fpga_result fpgaObjectRead64(fpga_object obj,
uint64_t *value, int flags);
fpga_result fpgaObjectWrite64(fpga_object obj,
uint64_t value, int flags);
Figure 26 fpgaObjectRead64() / fpgaObjectWrite64()
2.1.10 Utilities
The fpga_result enumeration defines a set of error codes used throughout OPAE. In order to convert an fpga_result error code into a printable string, the application can use the fpgaErrStr() call.
2.2 DFL Driver IOCTL Interfaces
The DFL drivers export an IOCTL interface which the libxfpga.so plugin consumes in order to query and configure aspects of the FME and Port. These interfaces are used only internally by the SDK; they are not customer-facing. The description here is provided for completeness only.
2.2.1 Port Reset
The DFL_FPGA_PORT_RESET ioctl is used by the fpgaReset() call in order to perform a Port reset. The fpga_handle passed to fpgaReset() must be a valid open handle to an FPGA_ACCELERATOR. The ioctl requires no input/output parameters.
2.2.2 Port Information
The DFL_FPGA_PORT_GET_INFO ioctl is used to query properties of the Port, notably the number of associated MMIO regions. The ioctl requires a pointer to a struct dfl_fpga_port_info.
2.2.3 MMIO Region Information
The DFL_FPGA_PORT_GET_REGION_INFO ioctl is used to query the details of an MMIO region. The ioctl requires a pointer to a struct dfl_fpga_port_region_info. The index field of the struct is populated by the caller, and the padding, size, and offset values are populated by the DFL driver.
2.2.4 Shared Memory Mapping and Unmapping
The DFL_FPGA_PORT_DMA_MAP ioctl is used to map a memory buffer into the application’s process address space. The ioctl requires a pointer to a struct dfl_fpga_port_dma_map.
The DFL_FPGA_PORT_DMA_UNMAP ioctl is used to unmap a memory buffer from the application’s process address space. The ioctl requires a pointer to a struct dfl_fpga_port_dma_unmap.
These ioctls provide the underpinnings of the fpgaPrepareBuffer() and fpgaReleaseBuffer() calls.
2.2.5 Number of Port Error IRQs
The DFL_FPGA_PORT_ERR_GET_IRQ_NUM ioctl is used to query the number of Port error interrupt vectors available. The ioctl requires a pointer to a uint32_t that receives the Port error interrupt count.
2.2.6 Port Error Interrupt Configuration
The DFL_FPGA_PORT_ERR_SET_IRQ ioctl is used to configure one or more file descriptors for the Port Error interrupt. The ioctl requires a pointer to a struct dfl_fpga_irq_set. The values stored in the evtfds field of this struct should be populated with the event file descriptors for the interrupt, as returned by the eventfd() C standard library API.
2.2.7 Number of AFU Interrupts
The DFL_FPGA_PORT_UINT_GET_IRQ_NUM ioctl is used to query the number of AFU interrupt vectors available. The ioctl requires a pointer to a uint32_t that receives the AFU interrupt count.
2.2.8 User AFU Interrupt Configuration
The DFL_FPGA_PORT_UINT_SET_IRQ ioctl is used to configure one or more file descriptors for the AFU interrupt. The ioctl requires a pointer to a struct dfl_fpga_irq_set. The values stored in the evtfds field of this struct should be populated with the event file descriptors for the interrupt, as returned by the eventfd() C standard library API.
2.2.9 Partial Reconfiguration
The DFL_FPGA_FME_PORT_PR ioctl is used to update the logic stored in the Port’s programmable region. This ioctl must be issued on the device file descriptor corresponding to the FPGA_DEVICE (/dev/dfl-fme.X). The ioctl requires a pointer to a struct dfl_fpga_fme_port_pr with each of the fields populated.
2.2.10 Number of FME Error IRQs
The DFL_FPGA_FME_ERR_GET_IRQ_NUM ioctl is used to query the number of FME error interrupt vectors available. The ioctl requires a pointer to a uint32_t that receives the FME error interrupt count.
2.2.11 FME Error Interrupt Configuration
The DFL_FPGA_FME_ERR_SET_IRQ ioctl is used to configure one or more file descriptors for the FME Error interrupt. The ioctl requires a pointer to a struct dfl_fpga_irq_set. The values stored in the evtfds field of this struct should be populated with the event file descriptors for the interrupt, as returned by the eventfd() C standard library API. as returned by the eventfd() C standard library API.
2.3 Plugin Manager
The OPAE Plugin Manager refers to initialization code in libopae-c that examines an FPGA device’s PCIe Vendor and Device ID and makes an association between a particular FPGA device and its access method. OPAE currently supports three device access methods:
|
|
Device Feature List drivers | libxfpga.so |
Virtual Function I/O | libopae-v.so |
AFU Simulation Environment | libase.so |
Table 9 Plugin Device Access Methods
The Plugin Manager allows code that is written to a specific API signature to access FPGA hardware via different mechanisms. In other words, the end user codes to the OPAE API; and the OPAE API, based on configuration data, routes the hardware access to the device via different means.
As an example, consider an API configuration that accesses FPGA device_A via the Device Feature List drivers and that accesses FPGA device_B via VFIO. The application is coded against the OPAE API.
As part of its initialization process, the application enumerates and discovers an fpga_token corresponding to device_A. That fpga_token is opened and its MMIO region 0 is mapped via a call to fpgaMapMMIO().
The API configuration for device_A is such that the fpga_handle corresponding to device_A routes its hardware access calls through libxfpga.so. The call to fpgaMapMMIO() is redirected to libxfpga.so’s implementation of the MMIO mapping function, xfpga_fpgaMapMMIO(). As a result, the call to xfpga_fpgaMapMMIO() uses its AFU file descriptor to communicate with the DFL driver to map the MMIO region.
Subsequently, the application enumerates and discovers an fpga_token corresponding to device_B. That fpga_token is opened and its MMIO region 0 is mapped via a call to fpgaMapMMIO().
The API configuration for device_B is such that the fpga_handle corresponding to device_B routes its hardware access calls through libopae-v.so. The call to fpgaMapMMIO() is redirected to libopae-v.so’s implementation of the MMIO mapping function, vfio_fpgaMapMMIO(). As a result, the call to vfio_fpgaMapMMIO() uses the MMIO mapping performed by libopaevfio.so during initialization of the VFIO session.
2.3.1 Plugin Model
The OPAE SDK plugin model is facilitated by its use of opaque C structures for fpga_token and fpga_handle. These types are both declared as void *; and this allows the parameters to the OPAE SDK functions to take different forms, depending on the layer of the SDK being used.
At the topmost layer, for example when calling fpgaEnumerate(), the output fpga_token parameter array is actually an array of pointers to opae_wrapped_token struct’s.
typedef struct _opae_wrapped_token {
uint32_t magic;
fpga_token opae_token;
uint32_t ref_count;
struct _opae_wrapped_token *prev;
struct _opae_wrapped_token *next;
opae_api_adapter_table *adapter_table;
} opae_wrapped_token;
Figure 27 opae_wrapped_token
An opae_wrapped_token, as the name suggests, is a thin wrapper around the lower-layer token which is stored in struct member opae_token. The adapter_table struct member is a pointer to a plugin-specific adapter table. The adapter table provides a mapping between the top-layer opae_wrapped_token and its underlying plugin-specific API entry points, which are called using the opae_token struct member (the lower-level token).
typedef struct _opae_api_adapter_table {
struct _opae_api_adapter_table *next;
opae_plugin plugin;
...
fpga_result (*fpgaEnumerate)(const fpga_properties *filters,
uint32_t num_filters,
fpga_token *tokens,
uint32_t max_tokens,
uint32_t *num_matches);
...
int (*initialize)(void);
int (*finalize)(void);
} opae_api_adapter_table;
Figure 28 opae_api_adapter_table
When libopae-c loads, the plugin manager uses the plugin configuration data to open and configure a session to each of the required plugin libraries. During this configuration process, each plugin is passed an empty adapter table struct. The purpose of the plugin configuration is to populate this adapter table struct with each of the plugin-specific API entry points.
When the top-level fpgaEnumerate() is called, each adapter table’s plugin-specific fpgaEnumerate() struct member is called; and the output fpga_token’s are collected. At this point, these fpga_token’s are the lower-level token structure types. Before the top-level fpgaEnumerate() returns, these plugin-specific tokens are wrapped inside opae_wrapped_token structures, along with a pointer to the respective adapter table.
After enumeration is complete, the application goes on to call other top-level OPAE SDK functions with the wrapped tokens. Each top-level entry point which accepts an fpga_token knows that it is actually being passed an opae_wrapped_token. With this knowledge, the entry point peeks inside the wrapped token and calls through to the plugin-specific API entry point using the adapter table, passing the lower-level opae_token struct member.
2.3.2 libxfpga Plugin
2.3.1 Plugin Model introduced the concept of an opae_wrapped_token and a corresponding plugin-specific token structure. libxfpga.so is the plugin library that implements the DFL driver hardware access method. Its plugin-specific token data structure is struct _fpga_token.
struct _fpga_token {
fpga_token_header hdr;
uint32_t device_instance;
uint32_t subdev_instance;
char sysfspath[SYSFS_PATH_MAX];
char devpath[DEV_PATH_MAX];
struct error_list *errors;
};
Figure 29 struct _fpga_token
A struct _fpga_token corresponding to the Port will have sysfspath and devpath members that contain strings like the following example paths:
Figure 30 libxfpga Port Token
Likewise, a struct _fpga_token corresponding to the FME will have sysfspath and devpath members that contain strings like the following example paths:
Figure 31 libxfpga FME Token
When a call to the top-level fpgaOpen() is made, the lower-level token is unwrapped and passed to xfpga_fpgaOpen(). In return, xfpga_fpgaOpen() opens the character device file identified by the devpath member of the struct _fpga_token. It then allocates and initializes an instance of libxfpga.so’s plugin-specific handle data structure, struct _fpga_handle.
struct _fpga_handle {
pthread_mutex_t lock;
uint64_t magic;
fpga_token token;
int fddev;
int fdfpgad;
uint32_t num_irqs;
uint32_t irq_set;
struct wsid_tracker *wsid_root;
struct wsid_tracker *mmio_root;
void *umsg_virt;
uint64_t umsg_size;
uint64_t *umsg_iova;
bool metric_enum_status;
fpga_metric_vector fpga_enum_metric_vector;
void *bmc_handle;
struct _fpga_bmc_metric *_bmc_metric_cache_value;
uint64_t num_bmc_metric;
uint32_t flags;
};
Figure 32 struct _fpga_handle
2.3.3 libopae-v Plugin
libopae-v.so is the plugin library that implements the VFIO hardware access method. Its plugin-specific token data structure is vfio_token.
#define USER_MMIO_MAX 8
typedef struct _vfio_token {
fpga_token_header hdr;
fpga_guid compat_id;
pci_device_t *device;
uint32_t region;
uint32_t offset;
uint32_t mmio_size;
uint32_t pr_control;
uint32_t user_mmio_count;
uint32_t user_mmio[USER_MMIO_MAX];
uint64_t bitstream_id;
uint64_t bitstream_mdata;
uint8_t num_ports;
struct _vfio_token *parent;
struct _vfio_token *next;
vfio_ops ops;
} vfio_token;
Figure 33 vfio_token
When a call to the top-level fpgaOpen() is made, the lower-level token is unwrapped and passed to vfio_fpgaOpen(). In return, vfio_fpgaOpen() opens the VFIO device matching the device address found in the input vfio_token. It then allocates and initializes an instance of libopae-v.so’s plugin-specific handle data structure, vfio_handle.
typedef struct _vfio_handle {
uint32_t magic;
struct _vfio_token *token;
vfio_pair_t *vfio_pair;
volatile uint8_t *mmio_base;
size_t mmio_size;
pthread_mutex_t lock;
#define OPAE_FLAG_HAS_AVX512 (1u << 0)
uint32_t flags;
} vfio_handle;
Figure 34 vfio_handle
2.3.3.1 Supporting Libraries
2.3.3.1.1 libopaevfio
libopaevfio.so is OPAE’s implementation of the Linux kernel’s Virtual Function I/O interface. This VFIO interface presents a generic means of configuring and accessing PCIe endpoints from a user-space process via a supporting Linux kernel device driver, vfio-pci.
libopaevfio.so provides APIs for opening/closing a VFIO device instance, for mapping/unmapping MMIO spaces, for allocating/freeing DMA buffers, and for configuring interrupts for the device.
2.3.3.1.2 libopaemem
Each DMA buffer allocation made by libopaevfio.so’s opae_vfio_buffer_allocate() and opae_vfio_buffer_allocate_ex() APIs requires a backing I/O Virtual Address range. These address ranges are discovered at VFIO device open time by way of the VFIO_IOMMU_GET_INFO ioctl.
Each range specifies a large contiguous block of I/O Virtual Address space. The typical DMA buffer allocation size is significantly less than one of these IOVA blocks, so the division of each block into allocatable segments must be managed so that multiple DMA buffer allocations can be made from a single block. In other words, the IOVA blocks must be memory-managed in order to make efficient use of them.
libopaemem.so provides a generic means of managing a large memory space, consisting of individual large memory blocks of contiguous address space. When a DMA buffer allocation is requested, libopaevfio.so uses this generic memory manager to carve out a small chunk of contiguous IOVA address space in order for the DMA mapping to be made. The IOVA space corresponding to the allocation is marked as allocated, and the rest of the large block remains as allocatable space within the memory manager. Subsequent de-allocation returns a chunk of IOVA space to the free state, coalescing contiguous chunks as they are freed. The allocations and de-allocations of the IOVA space can occur in any order with respect to each other. libopaemem.so tracks both the allocated and free block space, carving out small chunks from the large IOVA blocks on allocations, and coalescing small chunks back into larger ones on frees.
2.3.3.2 Configuring PCIe Virtual Functions
Before an AFU can be accessed with VFIO, the FPGA Physical Function must be configured to enable its Virtual Functions. Then, each VF must be bound to the vfio-pci Linux kernel driver.
As of the Arrow Creek program, the FPGA hardware allows multiple AFU’s to co-exist by placing each AFU in its own PCIe Virtual Function. Upon system startup, no PCIe VF’s exist. The pci_device command can be used to enable the VF’s and their AFU’s. First, use the lspci command to examine the current device topology:
# lspci | grep cel
b1:00.0 Processing accelerators: Intel Corporation Device bcce (rev 01)
b1:00.1 Processing accelerators: Intel Corporation Device bcce
b1:00.2 Processing accelerators: Intel Corporation Device bcce
b1:00.3 Processing accelerators: Red Hat, Inc. Virtio network device
b1:00.4 Processing accelerators: Intel Corporation Device bcce
Figure 35 lspci Device Topology
In this example, VF’s are controlled by PF 0, as highlighted in Figure 35 lspci Device Topology. In the figure, each PF is shown as having the Arrow Creek PF PCIe device ID of bcce.
Now, use the pci_device command to enable three VF’s for PF0:
# pci_device 0000:b1:00.0 vf 3
# lspci | grep cel
b1:00.0 Processing accelerators: Intel Corporation Device bcce (rev 01)
b1:00.1 Processing accelerators: Intel Corporation Device bcce
b1:00.2 Processing accelerators: Intel Corporation Device bcce
b1:00.3 Processing accelerators: Red Hat, Inc. Virtio network device
b1:00.4 Processing accelerators: Intel Corporation Device bcce
b1:00.5 Processing accelerators: Intel Corporation Device bccf
b1:00.6 Processing accelerators: Intel Corporation Device bccf
b1:00.7 Processing accelerators: Intel Corporation Device bccf
Figure 36 Enable Virtual Functions
Figure 20 Enable Virtual Functions shows that three VF’s were created. Each VF is shown as having the Arrow Creek VF PCIe device ID of bccf.
Now, each Virtual Function must be bound to the vfio-pci Linux kernel driver so that it can be accessed via VFIO:
# opaevfio -i -u myuser -g mygroup 0000:b1:00.5
Binding (0x8086,0xbccf) at 0000:b1:00.5 to vfio-pci
iommu group for (0x8086,0xbccf) at 0000:b1:00.5 is 318
Figure 37 Bind VF's to vfio-pci
Here, myuser and mygroup identify the unprivileged user/group that requires access to the device. The opaevfio command will change the ownership of the device per the values given.
Once the VF’s are bound to vfio-pci, the OPAE SDK will find and enumerate them with libopae-v.so:
# fpgainfo port
//****** PORT ******//
Object Id : 0xEF00000
PCIe s:b:d.f : 0000:B1:00.0
Device Id : 0xBCCE
Socket Id : 0x00
//****** PORT ******//
Object Id : 0xE0B1000000000000
PCIe s:b:d.f : 0000:B1:00.7
Device Id : 0xBCCF
Socket Id : 0x01
Accelerator GUID : 4dadea34-2c78-48cb-a3dc-5b831f5cecbb
//****** PORT ******//
Object Id : 0xC0B1000000000000
PCIe s:b:d.f : 0000:B1:00.6
Device Id : 0xBCCF
Socket Id : 0x01
Accelerator GUID : 823c334c-98bf-11ea-bb37-0242ac130002
//****** PORT ******//
Object Id : 0xA0B1000000000000
PCIe s:b:d.f : 0000:B1:00.5
Device Id : 0xBCCF
Socket Id : 0x01
Accelerator GUID : 8568ab4e-6ba5-4616-bb65-2a578330a8eb
Figure 38 List VF's with fpgainfo
When the VF’s are no longer needed, they can be unbound from the vfio-pci driver:
# opaevfio -r 0000:b1:00.5
Releasing (0x8086,0xbccf) at 0000:b1:00.5 from vfio-pci
# opaevfio -r 0000:b1:00.6
Releasing (0x8086,0xbccf) at 0000:b1:00.6 from vfio-pci
# opaevfio -r 0000:b1:00.7
Releasing (0x8086,0xbccf) at 0000:b1:00.7 from vfio-pci
Figure 39 Unbind VF's from vfio-pci
Finally, the VF’s can be disabled:
# pci_device 0000:b1:00.0 vf 0
# lspci | grep cel
b1:00.0 Processing accelerators: Intel Corporation Device bcce (rev 01)
b1:00.1 Processing accelerators: Intel Corporation Device bcce
b1:00.2 Processing accelerators: Intel Corporation Device bcce
b1:00.3 Processing accelerators: Red Hat, Inc. Virtio network device
b1:00.4 Processing accelerators: Intel Corporation Device bcce
Figure 40 Disable Virtual Functions
2.4 Application Flow
A typical OPAE application that interacts with an AFU via MMIO and shared memory will have a flow similar to the one described in this section.
2.4.1 Create Filter Criteria
Refer to 2.1.2 Enumeration. When enumerating AFU’s, if no filtering criteria is specified, then fpgaEnumerate() returns fpga_token’s for each AFU that is present in the system. In order to limit the enumeration search to a specific AFU, create an fpga_properties object and set its guid to that of the desired AFU:
#define MY_AFU_GUID “57fa0b03-ab4f-4b02-b4eb-d3fe1ec18518”
fpga_properties filter = NULL;
fpga_guid guid;
fpgaGetProperties(NULL, &filter);
uuid_parse(MY_AFU_GUID, guid);
fpgaPropertiesSetGUID(filter, guid);
Figure 41 Flow: Create Filter Criteria
2.4.2 Enumerate the AFU
With the filtering criteria in place, enumerate to obtain an fpga_token for the AFU:
fpga_token afu_token = NULL;
uint32_t num_matches = 0;
fpgaEnumerate(&filter, 1, &afu_token, 1, &num_matches);
Figure 42 Flow: Enumerate the AFU
2.4.3 Open the AFU
After finding an fpga_token for the AFU using fpgaEnumerate(), the token must be opened with fpgaOpen() to establish a session with the AFU. The process of opening an fpga_token creates an fpga_handle:
Figure 43 Flow: Open the AFU
2.4.4 Map MMIO Region
In order to access the MMIO region of the AFU to program its CSR’s, the region must first be mapped into the application’s process address space. This is accomplished using fpgaMapMMIO():
Figure 44 Flow: Map MMIO Region
2.4.5 Allocate DMA Buffers
If the AFU is DMA-capable, shared memory buffers can be allocated and mapped into the process address space and the IOMMU with fpgaPrepareBuffer(). Refer to Figure 8 Configuring Huge Pages for instructions on configuring 2MiB and 1GiB huge pages.
#define BUF_SIZE (2 * 1024 * 1024)
volatile uint64_t *src_ptr = NULL;
uint64_t src_wsid = 0;
volatile uint64_t *dest_ptr = NULL;
uint64_t dest_wsid = 0;
fpgaPrepareBuffer(afu_handle, BUF_SIZE, (void **)&src_ptr,
&src_wsid, 0);
fpgaPrepareBuffer(afu_handle, BUF_SIZE, (void **)&dest_ptr,
&dest_wsid, 0);
memset(src_ptr, 0xaf, BUF_SIZE);
memset(dest_ptr, 0xbe, BUF_SIZE);
Figure 45 Flow: Allocate DMA Buffers
2.4.6 Make AFU Aware of DMA Buffers
The process by which locations of shared memory buffers and their sizes are made known to the AFU is entirely AFU-specific. This example shows the method used by the Native Loopback AFU. Each buffer I/O virtual address is cacheline-aligned and programmed into a unique AFU CSR; then the buffer size in lines is programmed into a length CSR:
#define CSR_SRC_ADDR 0x000A // AFU-specific
#define CSR_DEST_ADDR 0x000B // AFU-specific
#define CSR_NUM_LINES 0x000C // AFU-specific
uint64_t src_iova = 0;
uint64_t dest_iova = 0;
fpgaGetIOAddress(afu_handle, src_wsid, &src_iova);
fpgaGetIOAddress(afu_handle, dest_wsid, &dest_iova);
fpgaWriteMMIO64(afu_handle, 0, CSR_SRC_ADDR, src_iova >> 6);
fpgaWriteMMIO64(afu_handle, 0, CSR_DEST_ADDR, dest_iova >> 6);
fpgaWriteMMIO32(afu_handle, 0, CSR_NUM_LINES, BUF_SIZE / 64);
Figure 46 Flow: Make AFU Aware of DMA Buffers
2.4.7 Initiate an Acceleration Task
With the shared buffer configuration complete, the AFU can be told to initiate the acceleration task. This process is AFU-specific. The Native Loopback AFU starts the acceleration task by writing a value to its control CSR:
Figure 47 Initiate an Acceleration Task
2.4.8 Wait for Task Completion
Once the acceleration task is initiated, the application may poll the AFU for a completion status. This process is AFU-specific. The AFU may provide a status CSR for the application to poll; or the AFU may communicate status to the application by means of a result code written to a shared buffer.
2.4.9 Free DMA Buffers
When the acceleration task completes and the AFU is quiesced such that there are no outstanding memory transactions targeted for the shared memory, the DMA buffers can be unmapped and freed using fpgaReleaseBuffer():
Figure 48 Flow: Free DMA Buffers
2.4.10 Unmap MMIO Region
The MMIO regions should also be unmapped using fpgaUnmapMMIO():
Figure 49 Flow: Unmap MMIO Region
2.4.11 Close the AFU
The AFU handle should be closed via fpgaClose() to release its resources:
2.4.12 Release the Tokens and Properties
The fpga_token’s returned by fpgaEnumerate() should be destroyed using the fpgaDestroyToken() API. The fpga_properties objects should be destroyed using the fpgaDestroyProperties() API:
Figure 51 Flow: Release the Tokens and Properties
3.0 OPAE C++ API
The OPAE C++ API refers to a C++ layer that sits on top of the OPAE C API, providing object-oriented implementations of the main OPAE C API abstractions: properties, tokens, handles, dma buffers, etc. Like the OPAE C API, the C++ API headers contain Doxygen markup for each of the provided classes.
3.1 libopae-cxx-core
The implementation files for the C++ API are compiled into libopae-cxx-core.so. A convenience header, core.h, provides a quick means of including each of the C++ API headers. Each of the types comprising the C++ API is located within the opae::fpga::types C++ namespace.
3.1.1 Properties
Class properties provides the C++ implementation of the fpga_properties type and its associated APIs.
Figure 52 C++ Create New Empty Properties
Class properties provides member variables for each fpga_properties item that can be manipulated with fpgaPropertiesGet…() and fpgaPropertiesSet…(). For example, to set the AFU ID in a properties instance and to set that instance’s type to FPGA_ACCELERATOR:
#define MY_AFU_ID “8ad74241-d13b-48eb-b428-7986dcbcab14”
filter->guid.parse(MY_AFU_ID);
filter->type = FPGA_ACCELERATOR;
Figure 53 C++ Properties Set GUID and Type
3.1.2 Tokens
Class token provides the C++ implementation of the fpga_token type and its associated APIs. Class token also provides the enumerate() static member function:
std::vector<token::ptr_t> tokens = token::enumerate({filter});
if (tokens.size() < 1) {
// flag error and return
}
token::ptr_t tok = tokens[0];
Figure 54 C++ Enumeration
3.1.3 Handles
Class handle provides the C++ implementation of the fpga_handle type and its associated APIs. The handle class provides member functions for opening and closing a token, for reading and writing to MMIO space, and for reconfiguring the FPGA’s Programmable Region.
Figure 55 C++ Opening a Handle
3.1.4 Shared Memory
The shared_buffer class provides member functions for allocating and releasing DMA buffers, for querying buffer attributes, and for reading and writing buffers.
#define BUF_SIZE (2 * 1024 * 1024)
shared_buffer::ptr_t input = shared_buffer::allocate(accel, BUF_SIZE);
shared_buffer::ptr_t output = shared_buffer::allocate(accel, BUF_SIZE);
std::fill_n(input->c_type(), BUF_SIZE, 0xaf);
std::fill_n(output->c_type(), BUF_SIZE, 0xbe);
Figure 56 C++ Allocate and Init Buffers
Once DMA buffers have been allocated, their IO addresses are programmed into AFU-specific CSRs to enable the DMA. Here, the IO address of each buffer is aligned to the nearest cache line before programming it into the AFU CSR space. The number of cache lines is then programmed into the appropriate AFU CSR.
#define CSR_SRC_ADDR 0x000A // AFU-specific
#define CSR_DEST_ADDR 0x000B // AFU-specific
#define CSR_NUM_LINES 0x000C // AFU-specific
#define LOG2_CL 6
accel->write_csr64(CSR_SRC_ADDR, input->io_address() >> LOG2_CL);
accel->write_csr64(CSR_DEST_ADDR, output->io_address() >> LOG2_CL);
accel->write_csr32(CSR_NUM_LINES, BUF_SIZE / 64);
Figure 57 C++ Make the AFU Aware of DMA Buffers
3.1.5 Events
The event class provides member functions for event registration. In order to register an event, provide the handle::ptr_t for the desired device, along with the event type and optional flags.
int vect = 2;
event::ptr_t evt = event::register_event(accel,
FPGA_EVENT_INTERRUPT,
vect);
int evt_fd = evt.os_object();
Figure 58 C++ Event Registration
3.1.6 Errors
Class error provides a means of querying the device errors given a token::ptr_t. The token and integer ID provided to the error::get() static member function uniquely identify one of the 64-bit error masks associated with the token.
Figure 59 C++ Query Device Errors
3.1.7 SysObject
Class sysobject is the C++ implementation of the OPAE SysObject API. sysobject provides a means of creating class instances via its two sysobject::get() static member functions. A third non-static sysobject::get() enables creating a sysobject instance given a parent sysobject instance. The read64() and write64() member functions allow reading and writing the sysobject’s value as a 64-bit unsigned integer. The bytes() member functions allow reading a sysobject’s value as a raw byte stream.
4.0 OPAE Python API
The OPAE Python API refers to a Python layer that sits on top of the OPAE C++ API, providing Python implementations of the OPAE C++ API abstractions: properties, tokens, handles, dma buffers, etc.
4.1 _opae
The Python API is coded as a pybind11 project, which allows C++ code to directly interface with Python internals. Each C++ API concept is encoded into a Python equivalent. The functionality exists as a Python extension module, compiled into _opae.so.
4.1.1 Enumeration
Enumeration is somewhat simplified as compared to the OPAE C/C++ APIs. The fpga.enumerate() function accepts keyword arguments for each of the property names that are defined in the C++ API. As an example, to enumerate for an FPGA_ACCELERATOR by its GUID:
from opae import fpga
MY_ACCEL = “d573b29e-176f-4cb7-b810-efbf7be34cc9”
tokens = fpga.enumerate(type=fpga.ACCELERATOR, guid=MY_ACCEL)
assert tokens, “No accelerator matches {}”.format(MY_ACCEL)
Figure 60 Python Enumeration
The return value from the fpga.enumerate() function is a list of all the token objects matching the search criteria.
4.1.2 Properties
Querying properties for a token or handle is also a bit different in the Python API. In order to query properties for one of these objects, pass the object to the fpga.properties() constructor. The return value is a properties object with each of the property names defined as instance attributes.
Figure 61 Python Get Token Properties
Properties objects may also be created by invoking the fpga.properties() constructor, passing the same keyword arguments as those to fpga.enumerate(). Properties objects created in this way are also useful for enumeration purposes:
Figure 62 Python Properties Constructor
4.1.3 Tokens
Tokens overload both the __getitem__ and __getattr__ methods to enable the SysObject API. Both of the following are valid forms of accessing the ‘errors/first_error’ sysfs attribute, given a token object:
tok = tokens[0]
ferr = tok['errors/first_error']
print(f'first error: 0x{ferr.read64():0x}')
print('0x{:0x}'.format(tok.errors.first_error.read64()))
Figure 63 Python Tokens and SysObject API
Tokens also implement a find() method, which accepts a glob expression in order to search sysfs. The following example finds the “id” sysfs entry in the given token’s sysfs tree.
Figure 64 Python Token Find
4.1.4 Handles
Tokens are converted to handles by way of the fpga.open() function. The flags (second) parameter to fpga.open() may be zero or fpga.OPEN_SHARED.
Figure 65 Python Open Handle
Like token objects, handle objects overload __getitem__ and __getattr__ methods to enable the SysObject API. handle also provides a find() method similar to token’s find().
err = handle['errors/errors']
print(f'errors: 0x{err.read64():0x}')
print('first error: 0x{:0x}'.format(
handle.errors.first_error.read64()))
my_id = handle.find('i?')
print(f'{my_id.read64()}')
Figure 66 Python Handles and SysObject API
Partial reconfiguration is provided by class handle’s reconfigure() method. The first parameter, slot, will be zero in most designs. The second parameter is an opened file descriptor to the file containing the GBS image. The third parameter, flags, defaults to zero.
Figure 67 Python Partial Reconfiguration
Device reset is accomplished by means of handle’s reset() method, which takes no parameters.
Finally for handles, CSR space reads are accomplished via read_csr32() and read_csr64(). Both methods accept the register offset as the first parameter and an optional csr_space index, which defaults to zero, as the second parameter. CSR space writes are accomplished via write_csr32() and write_csr64(). Both methods accept the register offset as the first parameter, the value to write as the second, and an optional csr_space index, which defaults to zero, as the third.
print(’0x{:0x}’.format(handle.read_csr32(0x000a)))
print(‘0x{:0x}’.format(handle.read_csr64(0x000c)))
handle.write_csr32(0x000b, 0xdecafbad, 2)
handle.write_csr64(0x000e, 0xc0cac01adecafbad, 2)
Figure 68 Python Read/Write CSR
4.1.5 Shared Memory
The fpga.allocate_shared_buffer() function provides access to the OPAE memory allocator. The allocation sizes and required huge page configurations are the same as those noted in 2.1.5 MMIO and Shared Memory.
The fpga.allocate_shared_buffer() function returns an object instance of type shared_buffer. The shared_buffer class implements methods size(), wsid(), and io_address(), which return the buffer size in bytes, the unique workspace ID, and the IO address respectively:
buf = fpga.allocate_shared_buffer(handle, 4096)
print(f’size: {buf.size()}’)
print(f’wsid: 0x{buf.wsid():0x}’)
print(f’io_address: 0x{buf.io_address():0x}’)
Figure 69 Python Allocate Shared Memory
The shared_buffer class implements a fill() method which takes an integer parameter which is applied to each byte of the buffer (similar to C standard library’s memset()). The compare() method compares the contents of the first size bytes of one buffer to another. The value returned from compare() is the same as the C standard library’s memcmp(). The copy() method copies the first size bytes of the calling buffer into the argument buffer.
b0 = fpga.allocate_shared_buffer(handle, 4096)
b1 = fpga.allocate_shared_buffer(handle, 4096)
b0.fill(0xa5)
b1.fill(0xa5)
print(f'compare: {b0.compare(b1, 4096)}')
b1.fill(0xa0)
b0.copy(b1, 4096)
print(f'compare: {b0.compare(b1, 4096)}')
Figure 70 Python Buffer Fill, Copy, Compare
shared_buffer’s read32() and read64() methods read a 32- or 64-bit value from the given offset. The write32() and write64() methods write a 32- or 64-bit value to the given offset.
print(f'value at 0: 0x{b0.read32(0):0x}')
print(f'value at 4: 0x{b0.read64(4):0x}')
b0.write32(0xabadbeef, 0)
b0.write64(0xdecafbadabadbeef, 4)
print(f'value at 0: 0x{b0.read32(0):0x}')
print(f'value at 4: 0x{b0.read64(4):0x}')
Figure 71 Python Buffer Read and Write
The shared_buffer class provides three polling methods: poll(), poll32(), and poll64(). Each method takes an offset as its first parameter. The second parameter is a value and the third is a mask. The value and mask parameters are 8 bits wide for poll(), 32 bits wide for poll32(), and 64 bits wide for poll64(). The fourth and last parameter is a timeout value which defaults to 1000 microseconds.
Each polling method reads the n-bit wide item at offset and applies (logical AND) the mask to that value. The masked value created in the previous step is then compared to the second parameter, value. If the two values are equal, then the method returns true immediately. Otherwise, the method continues to loop, attempting the same comparison over and over without sleeping. Finally, if the elapsed time from the beginning of the call to the current time is greater than or equal to the timeout value, then the method times out and returns false.
Figure 72 Python Buffer Poll
The shared_buffer split() method allows creating two or more buffer objects from one larger buffer object. The return value is a list of shared_buffer instances whose sizes match the arguments given to split().
b1, b2 = b1.split(2048, 2048)
print(f'b1 io_address: 0x{b1.io_address():0x}')
print(f'b2 io_address: 0x{b2.io_address():0x}')
Figure 73 Python Splitting Buffer
Finally, the shared_buffer class implements the Python buffer protocol to support memoryview objects. The Python buffer protocol allows access to an object’s underlying memory without copying that memory. As a brief example:
mv = memoryview(b1)
assert mv
assert mv[0] == 0xbe
b1[15] = int(65536)
assert struct.unpack('<L', bytearray(b1[15:19]))[0] == 65536
Figure 74 Python memoryview
4.1.6 Events
Given a handle and an event type, the fpga.register_event() function returns an object of type event. The event class implements one method, os_object(), which returns the underlying file descriptor that can be used to poll for the event:
import select
evt = fpga.register_event(handle, fpga.EVENT_ERROR)
os_object = evt.os_object()
received_event = False
epoll = select.epoll()
epoll.register(os_object, select.EPOLLIN)
for fileno, ev in epoll.poll(1):
if fileno == os_object:
received_event = True
print(f'received: {received_event}')
Figure 75 Python Events
In addition to fpga.EVENT_ERROR, fpga.EVENT_INTERRUPT, and fpga.EVENT_POWER_THERMAL are also supported.
4.1.7 Errors
Given a token, the fpga.errors() function returns a list of objects of type error. Each error instance represents a 64-bit mask of error values. The error bit masks are platform-dependent. Each error instance has two attributes: name and can_clear and one method: read_value() which returns the 64-bit error mask.
for e in fpga.errors(tok):
print(f’name: "{e.name}"’)
print(f’can_clear: {e.can_clear}’)
print(f’value: {e.read_value()}’)
Figure 76 Python Get Errors
4.1.8 SysObject
The Python API’s SysObject implementation is introduced in 4.1.3 Tokens and 4.1.4 Handles. When the index operator (__getitem__) or attribute reference (__getattr__) is used and the referenced string or attribute name corresponds to a sysfs entry in the sysfs path of either a token or a handle, then an object of type sysobject is returned.
The size() method returns the length of the sysfs entry in bytes. Note that a typical sysfs entry is terminated with a ‘\n’ followed by the ‘\0’ NULL terminator. The bytes() method returns the sysfs entry’s value as a string.
afu_id = tok['afu_id']
assert afu_id
print(f'size: {afu_id.size()} bytes: {afu_id.bytes().rstrip()}')
Figure 77 Python sysobject as Bytes
The sysobject read64() and write64() methods provide a means to read and write a sysfs entry’s value as an unsigned 64-bit integer. The sysobject class itself also implements the __getitem__ and __getattr__ methods so that a sysobject of type FPGA_OBJECT_CONTAINER can retrieve sysobject instances for child sysfs entries.
Figure 78 Python sysobject Container
5.0 Management Interfaces - opae.admin
While the OPAE SDK C, C++, and Python APIs focus on presenting the AFU and all its related functionality to the end user, there is also a need for a maintenance functionality to aid in configuring the platform and performing secure firmware updates for the FPGA device and its components. opae.admin is a Python framework which provides abstractions for performing these types of maintenance tasks on FPGA devices. opae.admin provides Python classes which model the FPGA and the sysfs interfaces provided by the DFL drivers.
5.1 sysfs
opae.admin’s sysfs module provides abstractions for interacting with sysfs nodes, which comprise the base entity abstraction of opae.admin.
5.1.1 sysfs_node
A sysfs_node is an object that tracks a unique path within a sysfs directory tree. sysfs_node provides methods for finding and constructing other sysfs_node objects, based on the root path of the parent sysfs_node object. sysfs_node also provides a mechanism to read and write sysfs file contents. sysfs_node serves as the base class for many of the sysfs module’s other classes.
5.1.2 pci_node
A pci_node is a sysfs_node that is rooted at /sys/bus/pci/devices. Each pci_node has a unique PCIe address corresponding to the PCIe device it represents. Methods for finding the pci_node’s children, for determining the PCIe device tree rooted at the node, for manipulating the node’s PCIe address, for determining the vendor and device ID’s, and for removing, unbinding, and rescanning the device are provided.
5.1.3 sysfs_driver
A sysfs_driver is a sysfs_node that provides a method for unbinding a sysfs_device object.
5.1.4 sysfs_device
A sysfs_device is a sysfs_node that is located under /sys/class or /sys/bus. sysfs_device provides the basis for opae.admin’s FPGA enumeration capability.
5.1.5 pcie_device
A pcie_device is a sysfs_device that is rooted at /sys/bus/pci/devices.
5.2 fpga
opae.admin’s fpga module provides classes which abstract an FPGA and its components.
5.2.1 region
A region is a sysfs_node that has an associated Linux character device, rooted at /dev. Methods for opening the region’s character device file and for interacting with the character device via its IOCTL interface are provided.
5.2.2 fme
An fme is a region that represents an FPGA device’s FME component. An fme provides accessors for the PR interface ID, the various bus paths that may exist under an FME, and the BMC firmware revision information.
5.2.3 port
A port is a region that represents an FPGA device’s Port component. A port provides an accessor for the Port AFU ID.
5.2.4 fpga_base
An fpga_base is a sysfs_device that provides accessors for the FPGA device’s FME, for the FPGA device’s Port, and for the secure update sysfs controls. fpga_base provides routines for enabling and disabling AER and for performing device RSU.
5.2.5 fpga
An fpga (derived from fpga_base) is the basis for representing the FPGA device in opae.admin. Utilities such as fpgasupdate rely on fpga’s enum classmethod to enumerate all of the FPGA devices in the system. In order for a device to enumerate via this mechanism, it must be bound to the dfl-pci driver at the time of enumeration.
5.3 opae.admin Utilities
Several utilities are written on top of opae.admin’s class abstractions. The following sections highlight some of the most commonly-used utilities.
5.3.1 fpgasupdate
fpgasupdate, or FPGA Secure Update, is used to apply firmware updates to the components of the FPGA. As the name implies, these updates target a secure FPGA device, one that has the ability to implement a secure root of trust. The command-line interface to fpgasupdate was designed to be as simple as possible for the end user. The command simply takes a path to the firmware update file to be applied and the PCIe address of the targeted FPGA device.
Figure 79 fpgasupdate Interface
fpgasupdate can apply a variety of firmware image updates. | Image| Description| | -----| -----| |Programmable Region Image| .gbs or Green BitStream| |SR Root Key Hash| Static Region RKH| |PR Root Key Hash| Programmable Region RKH| |FPGA Firmware Image| Static Region Device Firmware| |PR Authentication Certificate| Programmable Region Auth Cert| |BMC Firmware Image| Board Management Controller Firmware| |SR Thermal Image| Static Region Thermal Sensor Thresholds| |PR Thermal Image| Programmable Region Thermal Sensor Thresholds| |CSK Cancelation| Code Signing Key Cancelation Request| |SDM Image| Secure Device Manager Firmware|
Table 10 fpgasupdate Image Types
5.3.2 pci_device
pci_device is a utility that provides a convenient interface to some of the Linux Kernel’s standard PCIe device capabilities.
5.3.2.1 pci_device aer subcommand
The aer dump subcommand displays the Correctable, Fatal, and NonFatal device errors.
Figure 80 pci_device aer dump
The aer mask subcommand displays, masks, or unmasks errors using the syntax of the setpci command.
# pci_device 0000:b2:00.0 aer mask show
0x00010000 0x000031c1
# pci_device 0000:b2:00.0 aer mask all
# pci_device 0000:b2:00.0 aer mask off
# pci_device 0000:b2:00.0 aer mask 0x01010101 0x10101010
Figure 81 pci_device aer mask
The aer clear subcommand clears the current errors.
Figure 82 pci_device aer clear
5.3.2.2 pci_device unbind subcommand
The unbind subcommand unbinds the target device from the currently-bound device driver.
Figure 83 pci_device unbind
In order to re-bind the device to a driver, eg dfl-pci, use the following commands:
Figure 84 Re-binding a Driver
5.3.2.3 pci_device rescan subcommand
The rescan subcommand triggers a PCIe bus rescan of all PCIe devices.
Figure 85 pci_device rescan
5.3.2.4 pci_device remove subcommand
The remove subcommand removes the target device from Linux kernel management.
Figure 86 pci_device remove
Note: a reboot may be required in order to re-establish the Linux kernel management for the device.
5.3.2.5 pci_device topology subcommand
The topology subcommand shows a tab-delimited depiction of the target device as it exists in the PCIe device tree in the Linux kernel.
# pci_device 0000:b2:00.0 topology
[pci_address(0000:3a:00.0), pci_id(0x8086, 0x2030)] (pcieport)
[pci_address(0000:3b:00.0), pci_id(0x10b5, 0x8747)] (pcieport)
[pci_address(0000:3c:09.0), pci_id(0x10b5, 0x8747)] (pcieport)
[pci_address(0000:b2:00.0), pci_id(0x8086, 0x0b30)] (dfl-pci)
[pci_address(0000:3c:11.0), pci_id(0x10b5, 0x8747)] (pcieport)
[pci_address(0000:43:00.0), pci_id(0x8086, 0x0b32)] (no driver)
[pci_address(0000:3c:08.0), pci_id(0x10b5, 0x8747)] (pcieport)
[pci_address(0000:3d:00.1), pci_id(0x8086, 0x0d58)] (i40e)
[pci_address(0000:3d:00.0), pci_id(0x8086, 0x0d58)] (i40e)
[pci_address(0000:3c:10.0), pci_id(0x10b5, 0x8747)] (pcieport)
[pci_address(0000:41:00.0), pci_id(0x8086, 0x0d58)] (i40e)
[pci_address(0000:41:00.1), pci_id(0x8086, 0x0d58)] (i40e)
Figure 87 pci_device topology
The green output indicates the target device. The other endpoint devices are shown in blue.
5.3.2.6 pci_device vf subcommand
The vf subcommand allows setting the value of the sriov_numvfs sysfs node of the target device. This is useful in scenarios where device functionality is presented in the form of one or more PCIe Virtual Functions.
Figure 88 pci_device vf
5.3.3 rsu
rsu is a utility that performs Remote System Update. rsu is used subsequent to programming a firmware update or other supported file type with fpgasupdate, in order to reset the targeted FPGA entity so that a newly-loaded firmware image becomes active.
5.3.3.1 rsu bmc subcommand
The bmc subcommand causes a Board Management Controller reset. This command is used to apply a previous fpgasupdate of a BMC firmware image. The --page argument selects the desired boot image. Valid values for --page are ‘user’ and ‘factory’.
Figure 89 rsu bmc
5.3.3.2 rsu retimer subcommand
The retimer subcommand causes a Parkvale reset (specific to Vista Creek). This command is used to apply a previous fpgasupdate of a BMC firmware image (the Parkvale firmware is contained within the BMC firmware image). The retimer subcommand causes only the Parkvale to reset.
Figure 90 rsu retimer
5.3.3.3 rsu fpga subcommand
The fpga subcommand causes a reconfiguration of the FPGA Static Region. This command is used to apply a previous fpgasupdate of the Static Region image. The --page argument selects the desired boot image. Valid values for --page are ‘user1’, ‘user2’, and ‘factory’.
# rsu fpga --page user1 0000:b2:00.0
# rsu fpga --page user2 0000:b2:00.0
# rsu fpga --page factory 0000:b2:00.0
Figure 91 rsu fpga
5.3.3.4 rsu sdm subcommand
The sdm subcommand causes a reset of the Secure Device Manager. This command is used to apply a previous fpgasupdate of the SDM image.
Figure 92 rsu sdm
5.3.3.5 rsu fpgadefault subcommand
The fpgadefault subcommand can be used to display the default FPGA boot sequence; and it can be used to select the image to boot on the next reset of the FPGA. When given without additional parameters, the fpgadefault subcommand displays the default FPGA boot sequence:
Figure 93 rsu Displaying FPGA Boot Sequence
The parameters to the fpgadefault subcommand are --page and --fallback. The --page parameter accepts ‘user1’, ‘user2’, or ‘factory’, specifying the desired page to boot the FPGA from on the next reset. Note that this subcommand does not actually cause the reset to occur. Please refer to rsu fpga subcommand for an example of resetting the FPGA using the rsu command.
# rsu fpgadefault --page user1 0000:b2:00.0
# rsu fpgadefault --page user2 0000:b2:00.0
# rsu fpgadefault --page factory 0000:b2:00.0
Figure 94 rsu Select FPGA Boot Image
The --fallback parameter accepts a comma-separated list of the keywords ‘user1’, ‘user2’, and ‘factory’. These keywords, in conjunction with the --page value are used to determine a fallback boot sequence for the FPGA. The fallback boot sequence is used to determine which FPGA image to load in the case of a boot failure. For example, given the following command, the FPGA would attempt to boot in the order ‘factory’, ‘user1’, ‘user2’. That is to say, if the ‘factory’ image failed to boot, then the ‘user1’ image would be tried. Failing to boot ‘user1’, the ‘user2’ image would be tried.
Figure 95 rsu Select FPGA Boot Image and Fallback
6.0 Sample Applications
6.1 afu-test Framework
afu-test refers to a test-writing framework that exists as a set of C++ classes written on top of the OPAE C++ bindings. The first class, afu, serves as the base class for the test application abstraction. Class afu provides integration with CLI11, a C++ ’11 command-line parsing framework, and with spdlog, a C++ logging library. The second class, command represents a unique test sequence that is called by the afu object. Instances of the command class implement the test-specific workload.
class afu {
public:
afu(const char *name,
const char *afu_id = nullptr,
const char *log_level = nullptr);
int open_handle(const char *afu_id);
int main(int argc, char *argv[]);
virtual int run(CLI::App *app, command::ptr_t test);
template<class T>
CLI::App *register_command();
};
Figure 96 C++ class afu
The afu class constructor initializes the CLI11 command parser with some general, application-wide parameters.
Subcommand | Description |
---|---|
-g,--guid | Accelerator AFU ID. |
-p,--pci-address | Address of the accelerator device. |
-l,--log-level | Requested spdlog output level. |
-s,--shared | Open the AFU in shared mode? |
-t,--timeout | Application timeout in milliseconds. |
Figure 97 class afu Application Parameters
The register_command() member function adds a test command instance to the afu object. Each test command that an afu object is capable of executing is registered during the test’s startup code. For instance, here is the hssi application’s use of register_command():
hssi_afu app;
int main(int argc, char *argv[])
{
app.register_command<hssi_10g_cmd>();
app.register_command<hssi_100g_cmd>();
app.register_command<hssi_pkt_filt_10g_cmd>();
app.register_command<hssi_pkt_filt_100g_cmd>();
…
app.main(argc, argv);
}
Figure 98 hssi's app.register_command()
Next, the afu instance’s main() member function is called. main() initializes the spdlog instance, searches its database of registered commands to find the command matching the test requested from the command prompt, uses the open_handle() member function to enumerate for the requested AFU ID, and calls its run() member function, passing the CLI::App and the test command variables. The run() member function initializes a test timeout mechanism, then calls the command parameter’s run() to invoke the test-specific logic.
With all the boiler-plate of application startup, configuration, and running handled by the afu class, the test-specific command class is left to implement only a minimum number of member functions:
class command {
public:
virtual const char *name() const = 0;
virtual const char *description() const = 0;
virtual int run(afu *afu, CLI::App *app) = 0;
virtual void add_options(CLI::App *app) { }
virtual const char *afu_id() const { return nullptr; }
};
Figure 99 class command
The name() member function gives the unique command name. Some examples of names from the hssi app are hssi_10g, hssi_100g, pkt_filt_10g, and pkt_filt_100g. The description() member function gives a brief description that is included in the command-specific help output. add_options() adds command-specific command-line options. afu_id() gives the AFU ID for the command, in string form. Finally, run() implements the command-specific test functionality.
6.2 afu-test Based Samples
6.2.1 dummy_afu
The dummy_afu application is a afu-test based application that implements three commands: mmio, ddr, and lpbk.
Target | Description |
---|---|
# dummy_afu mmio | Targets special scratchpad area implemented by the AFU. |
# dummy_afu ddr | Execute dummy_afu-specific DDR test. |
# dummy_afu lpbk | Execute a simple loopback test. |
6.2.2 host_exerciser
host_exerciser markdown document.
6.2.3 hssi
hssi markdown document.
7.0 Other Utilities
7.1 opae.io
opae.io markdown document.
7.2 bitstreaminfo
The bitstreaminfo command prints diagnostic information about firmware image files that have been passed through the PACSign utility. PACSign prepends secure block 0 and secure block 1 data headers to the images that it processes. These headers contain signature hashes and other metadata that are consumed by the BMC firmware during a secure update.
To run bitstreaminfo, pass the path to the desired firmware image file:
Figure 100 Running bitstreaminfo
7.3 fpgareg
The fpgareg command prints the register spaces for the following fpga device components:
Command | Description |
---|---|
# fpgareg 0000:b1:00.0 pcie | Walks and prints the DFL for the device. |
# fpgareg 0000:b1:00.0 bmc | Prints the BMC registers for the device. |
# fpgareg 0000:b1:00.0 hssi | Prints the HSSI registers for the device. |
# fpgareg 0000:b1:00.0 acc | Prints the AFU register spaces. |
Figure 101 fpgareg Commands
Note that fpgareg is only available as of Arrow Creek ADP and forward. It will not work with prior platforms, eg N3000.
7.4 opaevfio
opaevfio markdown document.
8.0 Building OPAE
The OPAE SDK uses the cmake build and configuration system, version >= 3.10. The basic steps required to build the SDK from source are:
Install prerequisite packages.
$ git clone <https://github.com/OFS/opae-sdk.git>
$ cd opae-sdk
$ mkdir build
$ cd build
$ cmake .. <cmake options>
$ make
8.1 Installing Prerequisite Packages
The OPAE SDK is intended to build and run on modern Linux distributions. The SDK contains a set of system configuration scripts to aid the system configuration process.
Script | Target Operating System |
---|---|
centos.sh | CentOS 8.x |
fedora.sh | Fedora 33/34 |
ubuntu.sh | Ubuntu 20.04 LTS |
Table 11 System Configuration Scripts
8.2 Cloning the SDK repository
8.3 CMake Options
Option | Description | Values | Default |
---|---|---|---|
-DCMAKE_BUILD_TYPE | Configure debugging info | Debug Release Coverage RelWithDebInfo |
RelWithDebInfo |
-DCMAKE_INSTALL_PREFIX | Root install path | /usr/local | |
-DOPAE_BUILD_SPHINX_DOC | Enable/Disable docs | ON/OFF | OFF |
-DOPAE_BUILD_TESTS | Enable/Disable unit tests | ON/OFF | OFF |
-DOPAE_ENABLE_MOCK | Enable/Disable mock driver for unit tests | ON/OFF | OFF |
-DOPAE_INSTALL_RPATH | Enable/Disable rpath for install | ON/OFF | OFF |
-DOPAE_VERSION_LOCAL | Local version string | ||
-DOPAE_PRESERVE_REPOS | Preserve local changes to external repos? | ON/OFF | OFF |
-D OPAE_BUILD_LIBOPAE_CXX | Enable C++ bindings | ON/OFF | ON |
-DOPAE_WITH_PYBIND11 | Enable pybind11 | ON/OFF | ON |
-D OPAE_BUILD_PYTHON_DIST | Enable Python bindings | ON/OFF | OFF |
-DOPAE_BUILD_LIBOPAEVFIO | Build libopaevfio.so | ON/OFF | ON |
-D OPAE_BUILD_PLUGIN_VFIO | Build libopae-v.so | ON/OFF | ON |
-DOPAE_BUILD_LIBOPAEUIO | Build libopaeuio.so | ON/OFF | ON |
-DOPAE_BUILD_LIBOFS | Build OFS Copy Engine | ON/OFF | ON |
-DOPAE_BUILD_SAMPLES | Build Samples | ON/OFF | ON |
-DOPAE_BUILD_LEGACY | Build legacy repo | ON/OFF | OFF |
-DOPAE_LEGACY_TAG | Specify legacy build tag | master | |
-DOPAE_WITH_CLI11 | Enable apps which use CLI11 | ON/OFF | ON |
-DOPAE_WITH_SPDLOG | Enable apps which use spdlog | ON/OFF | ON |
-DOPAE_WITH_LIBEDIT | Enable apps which use libedit | ON/OFF | ON |
-DOPAE_WITH_HWLOC | Enable apps which use hwloc | ON/OFF | ON |
-DOPAE_WITH_TBB | Enable apps which use Thread Building Blocks | ON/OFF | ON |
-DOPAE_MINIMAL_BUILD | Enable/Disable minimal build. When set to ON, disable CLI11, spdlog, libedit, hwloc, tbb | ON/OFF | OFF |
Table 12 CMake Options
8.4 Building OPAE for Debug
8.5 Creating RPMs
To ease the RPM creation process, the OPAE SDK provides a simple RPM creation script. The parameters to the RPM create script are fedora or rhel, depending on which distribution is targeted. For rhel, the build flag -DOPAE_MINIMAL_BUILD is set to ON, omitting the binaries which have dependencies on external components that RHEL does not include in its base repositories.
In order to create RPMs for Fedora, run the create script on a system loaded with all the Fedora build prerequisites. If prerequisites are missing, the create script will complain until they are resolved.
In order to create RPMs for RHEL, run the create script on a system loaded with all the RHEL build prerequisites. If prerequisites are missing, the create script will complain until they are resolved.
Figure 102 RPM Creation
After running the create script, the RPM files will be located in the packaging/opae/rpm directory.
8.5.1 Updating the RPM Version Information
The RPMs will be versioned according to the information found in the file packaging/opae/version. Edit this file to update the version information, then re-run the create script to create the RPMs.
9.0 Debugging OPAE
9.1 Enabling Debug Logging
The OPAE SDK has a built-in debug logging facility. To enable it, set the cmake flag -DCMAKE_BUILD_TYPE=Debug
and then use the following environment variables:
| Variable| Description|
| ----- | ----- |
|LIBOPAE_LOG=1| Enable debug logging output. When not set, only critical error messages are displayed.|
|LIBOPAE_LOGFILE=file.log| Capture debug log output to file.log. When not set, the logging appears on stdout and stderr. The file must appear in a relative path or it can be rooted at /tmp.|
Table 13 Logging Environment Variables
9.2 GDB
To enable gdb-based debugging, the cmake configuration step must specify a value for -DCMAKE_BUILD_TYPE of either Debug or RelWithDebInfo so that debug symbols are included in the output binaries. The OPAE SDK makes use of dynamically-loaded library modules. When debugging with gdb, the best practice is to remove all OPAE SDK libraries from the system installation paths to ensure that library modules are only loaded from the local build tree:
Figure 103 Debugging with GDB
10.0 Adding New Device Support
As of OPAE 2.2.0 the SDK has transitioned to a single configuration file model. The libraries, plugins, and applications obtain their runtime configuration during startup by examining a single JSON configuration file. In doing so, the original configuration file formats for libopae-c and fpgad have been deprecated in favor of the respective sections in the new configuration file.
10.1 Configuration File Search Order
By default the OPAE SDK will install its configuration file to /etc/opae/opae.cfg.
Figure 104 Default Configuration File
The SDK searches for the configuration file during startup by employing the following search algorithm:
First, the environment variable LIBOPAE_CFGFILE is examined. If it is set to a path that represents a valid path to a configuration file, then that configuration file path is used, and the search is complete.
Next, the HOME environment variable is examined. If its value is valid, then it is prepended to the following set of relative paths. If HOME is not set, then the search continues with the value of the current user’s home path as determined by getpwuid(). The home path, if any, determined by getpwuid() is prepended to the following set of relative paths. Searching completes successfully if any of these home-relative search paths is valid.
Figure 105 HOME Relative Search Paths
Finally, the configuration file search continues with the following system-wide paths. If any of these paths is found to contain a configuration file, then searching completes successfully.
Figure 106 System Search Paths
If the search exhausts all of the possible configuration file locations without finding a configuration file, then an internal default configuration is used. This internal default configuration matches that of the opae.cfg file shipped with the OPAE SDK.
10.2 Configuration File Format
The OPAE SDK configuration file is stored in JSON formatted text. The file has two main sections: “configs” and “configurations”. The “configs” section is an array of strings. Each value in the “configs” array is a key into the data stored in the “configurations” section. If a key is present in “configs”, then that key is searched for and processed in “configurations”. If the key is not found in “configs”, then that section of “configurations” will not be processed, irrespective of whether it exists in “configurations”.
Figure 107 Keyed Configurations
Each keyed section in “configurations” has four top-level entries: “enabled”, “platform”, “devices”, “opae”.
{
“configurations”: {
“c6100”: {
“enabled”: true,
“platform”: “Intel Acceleration Development Platform C6100”,
“devices”: [
{ “name”: “c6100_pf”, “id”: [ ... ] },
{ “name”: “c6100_vf”, “id”: [ ... ] }
],
“opae”: {
...
}
}
},
}
Figure 108 Configurations Format
The “enabled” key holds a Boolean value. If the value is false or if the “enabled” key is omitted, then that configuration is skipped when parsing the file. The “platform” key holds a string that identifies the current configuration item as a product family. The “devices” key contains the device descriptions.
“devices” is an array of objects that contain a “name” and an “id” key. The “name” is a shorthand descriptor for a device PF or VF. The value of “name” appears elsewhere in the current “configurations” section in order to uniquely identify the device. “id” is an array of four strings, corresponding to the PCIe Vendor ID, Device ID, Subsystem Vendor ID, and Subsystem Device ID of the device. The entries corresponding to Vendor ID and Device ID must contain valid 16-bit hex integers. The entries corresponding to Subsystem Vendor ID and Subsystem Device ID may be 16-bit hex integers or the special wildcard string “*”, which indicates a don’t care condition.
The remaining sections in this chapter outline the format of the “opae” configurations key.
“plugin”: libopae-c and libopae-v
The “plugin” key in the “opae” section of a configuration is an array of OPAE SDK plugin configuration data. Each item in the array matches one or more PF or VF devices to a plugin library module.
{
“configurations”: {
“c6100”: {
...
“opae”: {
“plugin”: [
{
“enabled”: true,
“module”: “libxfpga.so”,
“devices”: [ “c6100_pf” ],
“configuration”: {}
},
{
“enabled”: true,
“module”: “libopae-v.so”,
“devices”: [ “c6100_pf”, “c6100_vf” ],
“configuration”: {}
}
],
}
}
},
}
Figure 109 "opae" / "plugin" key/
If the “enabled” key is false or if it is omitted, then that “plugin” array entry is skipped, and parsing continues. The “module” key is a string that identifies the desired plugin module library for the entry. The “devices” array lists one or more PF/VF identifiers. Each array value must be a string, and it must match a device that is described in the “configurations” “devices” section. The “configuration” key of the “plugin” section specifies a unique plugin-specific configuration. Currently, libopae-c and libopae-v use no plugin-specific config, so these keys are left empty.
“fpgainfo”: fpgainfo application
The “fpgainfo” key in the “opae” section of a configuration is an array of fpgainfo plugin configuration data. Each item in the array matches one or more PF or VF devices to an fpgainfo plugin library module.
{
“configurations”: {
“c6100”: {
...
“opae”: {
“fpgainfo”: [
{
“enabled”: true,
“module”: “libboard_c6100.so”,
“devices”: [
{ “device”: “c6100_pf”, “feature_id”: “0x12” },
{ “device”: “c6100_vf”, “feature_id”: “0x12” }
]
}
],
}
}
},
}
Figure 110 "opae" / "fpgainfo" key
If the “enabled” key is false or if it is omitted, then that “fpgainfo” array entry is skipped, and parsing continues. The “module” key is a string that identifies the desired fpgainfo module library for the entry. Each “devices” array entry gives a PF/VF identifier in its “device” key and a DFL feature ID in its “feature_id” key.
“fpgad”: fpgad daemon process
The “fpgad” key in the “opae” section of a configuration is an array of fpgad plugin configuration data. Each item in the array matches one or more PF or VF devices to an fpgad plugin library module.
{
“configurations”: {
“c6100”: {
...
“opae”: {
“fpgad”: [
{
“enabled”: true,
“module”: “libfpgad-vc.so”,
“devices”: [ “c6100_pf” ],
“configuration”: {
...
}
}
],
}
}
},
}
Figure 111 "opae" / "fpgad" key
If the “enabled” key is false or if it is omitted, then that “fpgad” array entry is skipped, and parsing continues. The “module” key is a string that identifies the desired fpgad plugin module library for the entry. The “devices” array lists one or more PF/VF identifiers. Each array value must be a string, and it must match a device that is described in the “configurations” “devices” section. The “configuration” key of the “fpgad” section specifies a unique plugin-specific configuration.
“rsu”: rsu script
The “rsu” key in the “opae” section of a configuration is an array of rsu script configuration data. Each item in the array matches one or more PF devices to an rsu configuration.
{
“configurations”: {
“c6100”: {
...
“opae”: {
“rsu”: [
{
“enabled”: true,
“devices”: [ “c6100_pf” ],
“fpga_default_sequences”: “common_rsu_sequences”
}
],
}
}
},
“common_rsu_sequences”: [
...
]
}
Figure 112 "opae" / "rsu" key
If the “enabled” key is false or if it is omitted, then that “rsu” array entry is skipped, and parsing continues. When disabled, the device(s) mentioned in that array entry will not be available for the rsu command. The “devices” array lists one or more PF identifiers. Each array value must be a string, and it must match a device that is described in the “configurations” “devices” section. The “fpga_default_sequences” key of the “rsu” section specifies a JSON key. The configuration searches for that JSON key at the global level of the configuration file, and when found applies its value as the valid set of fpga boot sequences that can be used with the rsu fpgadefault subcommand.
C
“fpgareg”: fpgareg script
The “fpgareg” key in the “opae” section of a configuration is an array of fpgareg script configuration data. Each item in the array matches one or more PF/VF devices to an fpgareg configuration.
```C {
“configurations”: {
“c6100”: {
...
“opae”: {
“fpgareg”: [
{
“enabled”: true,
“devices”: [ “c6100_pf”, “c6100_vf” ]
}
],
}
}
},
} ```
Figure 113 "opae" / "fpgareg" key
If the “enabled” key is false or if it is omitted, then that “fpgareg” array entry is skipped, and parsing continues. When disabled, the device(s) mentioned in that array entry will not be available for the fpgareg command. The “devices” array lists one or more PF/VF identifiers. Each array value must be a string, and it must match a device that is described in the “configurations” “devices” section.
The “opae.io” key in the “opae” section of a configuration is an array of opae.io configuration data. Each item in the array matches one or more PF/VF devices to an opae.io platform string.
{
“configurations”: {
“c6100”: {
...
“opae”: {
“opae.io”: [
{
“enabled”: true,
“devices”: [ “c6100_pf”, “c6100_vf” ]
}
],
}
}
},
}
Figure 114 "opae" / "opae.io" key
If the “enabled” key is false or if it is omitted, then that “opae.io” array entry is skipped, and parsing continues. When disabled, the device(s) mentioned in that array entry will continue to be available for the opae.io command. The device(s) platform string will not be shown in the opae.io ls
command. The “devices” array lists one or more PF/VF identifiers. Each array value must be a string, and it must match a device that is described in the “configurations” “devices” section.
Libxfpga – Updating the Metrics API
Edit libraries/plugins/xfpga/sysfs.c. Find the definition of the opae_id_to_hw_type() function. Update the function to add the new vendor/device ID to hw_type mapping.
This mapping is used by the SDK’s metrics API to determine the method of accessing the board sensor information and is very specific to the underlying BMC implementation. It may be necessary to add a new hw_type value and to update the logic in libraries/plugins/xfpga/metrics.
11.0 DFL Linux Kernel Drivers
OFS DFL driver software provides the bottom-most API to FPGA platforms. Libraries such as OPAE and frameworks such as DPDK are consumers of the APIs provided by OFS. Applications may be built on top of these frameworks and libraries. The OFS software does not cover any out-of-band management interfaces. OFS driver software is designed to be extendable, flexible, and provide for bare-metal and virtualized functionality.
The OFS driver software can be found in theOFS repository - linux-dfl, under the linux-dfl specific category. This repository has an associated OFS repository - linux-dfl - wiki page that includes the following information: - An description of the three available branch archetypes - Configuration tweaks required while building the kernel - A functional description of the available DFL framework - Descriptions for all currently available driver modules that support FPGA DFL board solutions - Steps to create a new DFL driver - Steps to port a DFL driver patch
Notices & Disclaimers
Intel® technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Performance varies by use, configuration and other factors. Your costs and results may vary. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that you may publish an unmodified copy. You may create software implementations based on this document and in compliance with the foregoing that are intended to execute on the Intel product(s) referenced in this document. No rights are granted to create modifications or derivatives of this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. You are responsible for safety of the overall system, including compliance with applicable safety-related requirements or standards. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.