资源与支持

SiFive 博客

来自 RISC-V 专家的最新洞察与深度技术解析

January 30, 2020

Part 3: High-Bandwidth Accelerator Access to Memory: Enabling Optimized Data Transfers with RISC-V

This is the third in a series of blogs about Domain-specific accelerators (DSAs), which are becoming increasingly common in system-on-chip (SoC) designs. Part #1 addressed the challenges associated with data transfers between DSAs and the core complex, and showed how RISC-V offers a unique opportunity to optimize fine-grain communication between them and improve core-DSA interaction performance. Part #2 addressed the challenges associated with point-to-point ordering between cores and DSA memory, and how RISC-V offers a unique opportunity to optimize high-bandwidth communication between cores and DSAs. This third installment will focus on the challenges associated with data transfers between DSA and memories, such as DDR, LPDDR or HBM, and explain how SoCs based on RISC-V can use an alternate approach to write the data directly to memory.

To recap, a DSA provides higher performance per watt by optimizing the specialized function it implements. Examples of DSAs include compression/decompression units, random number generators and network packet processors. A DSA is typically connected to the core complex using a standard IO interconnect, such as an AXI bus (Figure 1).

High Banswidth Accelerator

SoCs based on RISC-V offer a unique opportunity to optimize high-bandwidth data transfers between a DSA and memory. DSAs often need to transfer their data to memory, such as DDR, LPDDR or HBM memories. Often this is accomplished using a DMA (Direct Memory Access) engine.

The difficulty in the traditional approach (Figure 1) is that such data transfers often involve allocating the data in the Last-Level Cache first. This can significantly slow down accesses, particularly if the volume of transferred data is greater than the size of the Last-Level Cache.

Figure 2 shows that SoCs based on RISC-V can use an alternate approach where they can write the data directly to memory, bypassing the Last-Level Cache. This can be achieved by marking the data being written as uncached. Alternatively, the DMA engine can provide a hint to the Last-Level Cache to not allocate the data in the Last-Level Cache, but to write directly to memory. In this scenario, the data is still marked as cacheable, so any other cached copy of the data must be invalidated within the processor complex.

See more details about SiFive’s standard cores, or to customize and build domain-specific RISC-V cores, please visit sifive.com/risc-v-core-ip


Read the other posts in this series:

Shubu Mukherjee
Shubu Mukherjee
Chief SoC Architect, SiFive

Read more Insights from the RISC-V Experts

RISC-V 代码模型(2026 版)
Blog Post
RISC-V 代码模型(2026 版)
RISC-V 指令集架构 (ISA) 在设计上兼顾简洁与模块化。为了实现上述设计目标,RISC-V 有意识地减少了寻址方式的种类,从而降低了实现复杂 ISA 时的一项重大成本。寻址方式成本高昂:在小型设计中,会增加解码开销;在大型设计中,则会引入隐式依赖成本。
模块化是 AI 的未来:为何 SiFive-NVIDIA 的里程碑意义重大
Blog Post
模块化是 AI 的未来:为何 SiFive-NVIDIA 的里程碑意义重大
AI 的巨大潜力目前正受限于一个主要瓶颈:数据传输。在当今系统中,GPU 的处理速度往往受到互联技术以及 CPU、加速器与系统其余部分间数据流动效率的限制。
X100 系统安全防护:RISC-V 边缘端的 AI
Blog Post
X100 系统安全防护:RISC-V 边缘端的 AI
边缘 AI 是多种技术的融合,包括人工智能、物联网、边缘计算和嵌入式系统。它们共同发挥关键作用,使智能处理和决策能够在网络边缘实现。边缘 AI 利用嵌入式算法监控远程系统的活动,并处理由传感器及其他数据采集装置收集的非结构化数据,如温度、语言、脸部、运动、图像、距离及其他模拟输入信号。