연구 분야: Artificial Intelligence
학회: HEART '23: Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies
When implementing neural networks on FPGAs, existing methods for resource optimisation are closely tied to the design and performance of the neural network itself. We wish to independently control the individual Processing Elements (PEs) responsible for processing neural network data. We introduce a new framework that more stringently defines neural networks as a series of successive layers. By isolating layers, we can create stand-alone compute pools to process each layer, with an initial focus on activation functions. A pool with P PEs serves the N neurons of that layer, with the entire compute pool, not individual PEs, charged with performing the activation function. This means the number of PEs in a layer, their implementation, functionality, and range of inputs they serve can all be configured and specialised independently of the higher-level neural network. We can now tailor a neural network’s implementation to specific FPGA devices, adding PEs to make use of all the heterogeneous processing elements present on the FPGA. In addition to customising the resource footprint of a neural network, this greater range of control over each PE’s functionality allows performance optimisations arising from the distribution of the input data itself. More PEs can be added to the compute pool to serve more common inputs, or removed for less used inputs to free up resources. We manage inter-layer data flow to support non-deterministic processing times, a key requirement for de-coupling the design of neural networks from that of the underlying PEs. We demonstrate our framework by showing (1) large PEs can be efficiently segmented into many smaller PEs, (2) latency reduction of 1.47x or greater is achievable for activation function layers in existing neural networks, and (3) that PEs serving more than 50% of all possible function inputs see vastly diminishing returns on performance as they scale to the more traditional 100% support.
| 발행 연도 | 2023년 |
|---|---|
| 인용수 | 1 |
| 출판 국가 | United Kingdom |
| 사이트 | ACM |
| 좋아요 수 | 0 |