资源与支持

SiFive 博客

来自 RISC-V 专家的最新洞察与深度技术解析

January 21, 2020

Part 2: High-Bandwidth Core Access to Accelerators: Enabling Optimized Data Transfers with RISC-V

This is the second in a series of blogs about Domain-specific accelerators (DSAs), which are becoming increasingly common in SoCs. Part #1 addressed the challenges associated with data transfers between DSAs and the core complex, and showed how RISC-V offers a unique opportunity to optimize fine-grain communication between them and improve core-DSA interaction performance.

To recap, a DSA provides higher performance per watt by optimizing the specialized function it implements. Examples of DSAs include compression/decompression units, random number generators and network packet processors. A DSA is typically connected to the core complex using a standard IO interconnect, such as an AXI bus (Figure 1).

AXI Bus

RISC-V offers a unique opportunity to optimize high-bandwidth communication between cores and DSAs. Cores often issue fine-grain load and store instructions in the IO space to access DSA memory. The problem, however, is that these loads and stores to DSA memory might have side effects. For example, a load to a specific DSA memory address might trigger a network message as a side effect of the load. Typically, because of such side effects, loads and stores from a core to an IO device are required to be observed by the IO device in order. This is also known as point-to-point ordering.

A naive way to implement such point-to-point ordering is to issue a load to a DSA and wait for the result to return to the core (Figure 2). This is highly inefficient because successive loads or stores to DSA memory cannot be issued back-to-back in a pipelined fashion. A RISC-V implementation would typically implement such IO loads in a pipelined fashion with help from the interconnect between the core and DSA (Figure 3). For example, if a mesh topology uses a fixed path (e.g., X-Y routing) from the core to the DSA (perhaps via the IO bridge), then the interconnect can guarantee the ordering and thereby allow very high bandwidth access to DSA memory

The RISC-V architecture itself offers two other modes of optional IO ordering. First, RISC-V offers a very conservative IO ordering mode, which can be selectively used to guarantee strong ordering when necessary. Second, RISC-V offers a high-bandwidth relaxed ordering mode where IO loads and stores can be reordered. This mode would typically be used for DSA memory that does not have side effects.

See more details about SiFive’s standard cores, or to customize and build domain-specific RISC-V cores, please visit sifive.com/risc-v-core-ip


Read the other posts in this series:

Read more Insights from the RISC-V Experts

X100 系统安全防护:RISC-V 边缘端的 AI
Blog Post
X100 系统安全防护:RISC-V 边缘端的 AI
边缘 AI 是多种技术的融合,包括人工智能、物联网、边缘计算和嵌入式系统。它们共同发挥关键作用,使智能处理和决策能够在网络边缘实现。边缘 AI 利用嵌入式算法监控远程系统的活动,并处理由传感器及其他数据采集装置收集的非结构化数据,如温度、语言、脸部、运动、图像、距离及其他模拟输入信号。
在智能加速器上构建 AI 的未来 
Blog Post
在智能加速器上构建 AI 的未来 
在之前的《本地 AI 的完美解决方案》文章中,我们介绍了 SiFive Intelligence X100 产品系列的部分高层设计理念,并展示了与其他成熟厂商的性能对比。我们还讨论了 AI 市场的快速创新,以及这如何使设计“完美”的硬件加速器变得极具挑战性。而从客户那里可以看到的是,他们希望在加速器之外配备一个可编程的前端,我们称之为加速器控制单元(ACU)。这使得客户能将更多精力(和研发支出)集中在加速器的数据处理能力上,而控制和管理功能则交由 SiFive 基于 RISC-V 的方法来实现。
赋能远端边缘的 AI 创新
Blog Post
赋能远端边缘的 AI 创新
当前行业的焦点,更多投向那些能够将数据中心 AI 性能推向更高峰的硬件技术上。在 HotChips 2025 大会期间,对超大规模计算性能提升的需求占据绝大多数议程,而功能强大的大型芯片则成为了焦点。
Got a question?

Our AI chatbot can help!

Chat Now