The 8-bit floating-point (FP8) data format has been increasingly adopted in neural network (NN) computations due to its superior dynamic range compared to traditional 8-bit integer (INT8). Nevertheless, the heavy reliance on multiplication in neural network workloads leads to considerable energy consumption, even with FP8, particularly in the context of FPGA-based deployments. To this end, this paper presents FP8ApproxLib, an FPGA-based approximate multiplier library for FP8. Firstly, we conduct a bit-level analysis of the prior approximation method and introduce improvements to reduce the resulting computational error. Based on these, we implement a fine-grained optimized design on mainstream FPGAs (Altera and AMD) using primitives and templates combined with physical layout constraints. Moreover, an automated tool is developed to support user configuration and generate HDL code. We then evaluate the accuracy and hardware efficiency of the FP8 approximate multipliers. The results show that our proposed method achieves an average error reduction of 53.15% (36.74%∼72.82%) compared to previous FP8 approximation method. Moreover, compared to prior 8-bit approximate multipliers, our FP8 designs exhibit the lowest resource utilization. Finally, we integrate the design into the inference phase of three representative NN models (CNN, LLM, and Diffusion), demonstrating its excellent power efficiency. This is the first FP8 approximate multiplier design with architecture-aware fine-grained optimization and deployment for modern FPGA platforms, which can serve as a benchmark for future designs and comparisons of FPGA-based low-precision floating-point approximate multipliers. The code of this work is available in our GitLab∗.
Chen, R, Lyu, Y, Bao, H, Tang, S, Li, J, Zhu, Y, Ling, M & da Silva, B 2026, 'FP8ApproxLib: An FPGA-based approximate multiplier library for 8-bit floating point', Journal of Systems Architecture, vol. 173, 103686. https://doi.org/10.1016/j.sysarc.2026.103686
Chen, R., Lyu, Y., Bao, H., Tang, S., Li, J., Zhu, Y., Ling, M., & da Silva, B. (2026). FP8ApproxLib: An FPGA-based approximate multiplier library for 8-bit floating point. Journal of Systems Architecture, 173, Article 103686. https://doi.org/10.1016/j.sysarc.2026.103686
@article{4124fc95a82f4362b304f33e63eac120,
title = "FP8ApproxLib: An FPGA-based approximate multiplier library for 8-bit floating point",
abstract = "The 8-bit floating-point (FP8) data format has been increasingly adopted in neural network (NN) computations due to its superior dynamic range compared to traditional 8-bit integer (INT8). Nevertheless, the heavy reliance on multiplication in neural network workloads leads to considerable energy consumption, even with FP8, particularly in the context of FPGA-based deployments. To this end, this paper presents FP8ApproxLib, an FPGA-based approximate multiplier library for FP8. Firstly, we conduct a bit-level analysis of the prior approximation method and introduce improvements to reduce the resulting computational error. Based on these, we implement a fine-grained optimized design on mainstream FPGAs (Altera and AMD) using primitives and templates combined with physical layout constraints. Moreover, an automated tool is developed to support user configuration and generate HDL code. We then evaluate the accuracy and hardware efficiency of the FP8 approximate multipliers. The results show that our proposed method achieves an average error reduction of 53.15% (36.74%∼72.82%) compared to previous FP8 approximation method. Moreover, compared to prior 8-bit approximate multipliers, our FP8 designs exhibit the lowest resource utilization. Finally, we integrate the design into the inference phase of three representative NN models (CNN, LLM, and Diffusion), demonstrating its excellent power efficiency. This is the first FP8 approximate multiplier design with architecture-aware fine-grained optimization and deployment for modern FPGA platforms, which can serve as a benchmark for future designs and comparisons of FPGA-based low-precision floating-point approximate multipliers. The code of this work is available in our GitLab∗.",
author = "Ruiqi Chen and Yangxintong Lyu and Han Bao and Shidi Tang and Jindong Li and Yanxiang Zhu and Ming Ling and {da Silva}, Bruno",
note = "Publisher Copyright: {\textcopyright} 2026",
year = "2026",
month = apr,
doi = "10.1016/j.sysarc.2026.103686",
language = "English",
volume = "173",
journal = "Journal of Systems Architecture",
issn = "1383-7621",
publisher = "Elsevier",
}