SiFive’s new XM Series offers an extremely scalable and efficient AI compute engine. By integrating scalar, vector, and matrix engines, XM Series customers can take advantage of very efficient memory bandwidth. The XM Series also continues SiFive’s legacy of offering extremely high performance per watt for compute-intensive applications. To speed up development time, SiFive is also open sourcing the SiFive Kernel Library.
SiFive Intelligence
XM Series

XM Series

SiFive Intelligence
XM Series Key Features
SiFive Matrix Engine
- Fat Outer Product design
- Tightly integrated with 4 X-Cores
- Deep fusion with vector units
4 X-Cores per cluster
- Each with dual vector units
- Executes all other layers e.g. activation functions
- New exponential acceleration instructions
New matrix instructions
- Fetched by scalar unit
- Source data comes from vector registers
- Destination to each matrix accumulator
1 Cluster = 16 TOPS (INT8), 8 TFLOPS (BF16) per GHz
1TB/s sustained bandwidth per XM Series cluster
XM clusters connect to memory in 2 ways:
- CHI port for coherent memory access
- High bandwidth port connected to SRAM for model data
Host CPU can be RISC-V, x86 or Arm (or not present)
System can scale across multiple dies using CHI
SiFive Intelligence Family
- 512-bit Vector length
- Single Vector ALU
- VCIX (1024-bit)
- 1024-bit Vector Length
- Single/Dual Vector ALU
- VCIX (2048-bit)