

SCUniversity of outhern California

# USCViterbi School of Engineering

Ming Hsieh Department of Electrical Engineering

# **A Highly Parallel FPGA Implementation** of Sparse Neural Network Training Sourya Dey, Diandian Chen, Zongyang Li, Souvik Kundu, Kuan-Wen Huang, Keith Chugg, Peter Beerel, Hardware Accelerated Learning group, USC

#### **Motivation & Introduction**

Neural networks too big to be trained on-chip Cloud resources are costly

Our Solution: Pre-defined sparsity Reduces edges, hardware friendly Fixed in-, out-degree of each node

Train neural networks on FPGAs



## Methodology

3 operations:

- > Feedforward (FF)
- Backpropagate (BP) ➤ Update (UP)
- $\checkmark$  Process *z* edges in 1 clock cycle ✓ 1 **block cycle** = Total clock cycles to process all edges in any junction Ideal throughput = (Block cycle)<sup>-1</sup>  $\checkmark$



All use weighted junction edges

## Hardware Acceleration – Parallelism and Pipelining



accessed at most once in a cycle



| <b>FPGA Implementation – MNIST</b>       |      |    |                             |         |
|------------------------------------------|------|----|-----------------------------|---------|
| <b>Training and Inference on Artix-7</b> |      |    |                             |         |
|                                          |      |    |                             |         |
| Junction Number                          | 1    | 2  | <b>Overall Density</b>      | 7.576%  |
| Left Neurons                             | 1024 | 64 | Fixed Point Bit Width       | 12      |
| <b>Right Neurons</b>                     | 64   | 32 | Clock Frequency             | 15 MHz  |
| Out-degree                               | 4    | 16 | <b>Block Cycle Duration</b> | 2.27 µs |



Contact Information: souryade@usc.edu, chugg@usc.edu, pabeerel@usc.edu This work is partly supported by National Science Foundation, USA, Grant #1763747. Ming Hsieh Institute Ming Hsieh Department of Electrical Engineering

96.5%