资源与支持

SiFive 博客

来自 RISC-V 专家的最新洞察与深度技术解析

February 24, 2020

Part 4: High-Performance Interconnect for Accelerators: Enabling Optimized Data Transfers with RISC-V

This is the fourth in a series of blogs about Domain-specific accelerators (DSAs), which are becoming increasingly common in systems-on-chip (SoCs). Parts 1, 2 and 3 addressed key challenges such as data transfers between DSAs and the core complex, point-to-point ordering between cores and DSA memory, and data transfers between DSA and memories. This fourth instalment in the series will focus on the frequent interaction with and amongst cores, which is required by DSAs, and how the TileLink specification can be utilized to build interconnection networks.

To recap, a DSA provides higher performance per watt by optimizing the specialized function it implements. Examples of DSAs include compression/decompression units, random number generators and network packet processors. A DSA is typically connected to the core complex using a standard IO interconnect, such as an AXI bus (Figure 1).

AXI Bus

SOCs based on RISC-V offer a unique opportunity to optimize data transfers between cores and DSAs. Many high-performance DSAs require frequent interaction with and amongst cores. Standard memory interconnects often are limited by how fast they can transfer data. Such an interconnection can be designed to the TileLink specification [1], which is a free and open standard to build interconnection networks.

Designing one’s own memory interconnection network offers several advantages to a DSA (Figure 2):

  • The DSA can connect to the memory interconnect to reduce latency of interaction with cores by directly participating in the memory coherence protocol.
  • The interconnect channel width can be optimized to the data transfer rates required by the DSA. For example, one could envision extremely wide 1024-bit wide channels. The interconnection channels can also be run at a higher frequency than what a standard interconnect might allow.
  • The Last-Level Cache (LLC) can have bigger cache block sizes than the core caches. For example, core caches typically have 64-byte blocks, whereas the LLC could be designed for 128-byte or 256-byte cache blocks. The LLC can also support special prefetch mechanisms optimized for the DSA.
  • The LLC and interconnect can offer different levels of QoS (Quality of Service). These QoS levels can be used, for example, by the LLC controller, to offer lower latency and higher bandwidth to DSAs in the presence of cross-traffic from different applications.

[1] SiFive TileLink Specification, tilelink spec 1.8.1 PDF

See more details about SiFive’s standard cores, or to customize and build domain-specific RISC-V cores, please visit sifive.com/risc-v-core-ip

Read more Insights from the RISC-V Experts

X100 系统安全防护:RISC-V 边缘端的 AI
Blog Post
X100 系统安全防护:RISC-V 边缘端的 AI
边缘 AI 是多种技术的融合,包括人工智能、物联网、边缘计算和嵌入式系统。它们共同发挥关键作用,使智能处理和决策能够在网络边缘实现。边缘 AI 利用嵌入式算法监控远程系统的活动,并处理由传感器及其他数据采集装置收集的非结构化数据,如温度、语言、脸部、运动、图像、距离及其他模拟输入信号。
在智能加速器上构建 AI 的未来 
Blog Post
在智能加速器上构建 AI 的未来 
在之前的《本地 AI 的完美解决方案》文章中,我们介绍了 SiFive Intelligence X100 产品系列的部分高层设计理念,并展示了与其他成熟厂商的性能对比。我们还讨论了 AI 市场的快速创新,以及这如何使设计“完美”的硬件加速器变得极具挑战性。而从客户那里可以看到的是,他们希望在加速器之外配备一个可编程的前端,我们称之为加速器控制单元(ACU)。这使得客户能将更多精力(和研发支出)集中在加速器的数据处理能力上,而控制和管理功能则交由 SiFive 基于 RISC-V 的方法来实现。
赋能远端边缘的 AI 创新
Blog Post
赋能远端边缘的 AI 创新
当前行业的焦点,更多投向那些能够将数据中心 AI 性能推向更高峰的硬件技术上。在 HotChips 2025 大会期间,对超大规模计算性能提升的需求占据绝大多数议程,而功能强大的大型芯片则成为了焦点。
Got a question?

Our AI chatbot can help!

Chat Now