ETROVUB

Shidi Tang, Pengwei Zheng, Ruiqi Chen, Yuxuan Lv, Bruno da Silva Gomes, Ming Ling

Chapter in Book/ Report/ Conference proceeding

Abstract ■

Diffusion Transformer (DiT) models have shown superior generative capabilities in image and video synthesis, yet their high computational cost during inference remains a critical bottleneck. Temporal differential computation offers a promising solution to low-bit quantization by exploiting the temporal similarity in activations. However, applying this technique to DiT's Attention layers introduces substantial memory and computation overheads.In this paper, we present Diff-DiT, the first FPGA accelerator designed for low-bit DiT inference with differential computation. To overcome the unique challenges of DiT quantization and hardware acceleration, we propose: (1) an approximated differential attention (ADA) method that selectively approximates attention computations across time steps using a significance score, enabling low-bit on-chip execution while minimizing memory overhead; (2) an optimal cross-cast data accessing pattern with flexible data reuse to maximize computational intensity during matrix multiplications; and (3) a half-condition splitting (HCS) dataflow optimization and fine-grained pipelining to reduce the computation and memory access latency.Extensive experiments show that Diff-DiT outperforms NVIDIA V100 GPU by 1.39× in end-to-end throughput and 5.60× in energy efficiency. When compared with state-of-the-art diffusion model accelerators, Diff-DiT also achieves 2.81× and 2.77× improvements in throughput and energy efficiency, respectively.

Reference ■

Tang, S, Zheng, P, Chen, R, Lv, Y, da Silva, B & Ling, M 2025, Diff-DiT: Temporal Differential Accelerator for Low-bit Diffusion Transformers on FPGA. in 2025 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2025 - Conference Proceedings. IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, IEEE, pp. 1-9. https://doi.org/10.1109/ICCAD66269.2025.11240791

Tang, S., Zheng, P., Chen, R., Lv, Y., da Silva, B., & Ling, M. (2025). Diff-DiT: Temporal Differential Accelerator for Low-bit Diffusion Transformers on FPGA. In 2025 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2025 - Conference Proceedings (pp. 1-9). (IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD). IEEE. https://doi.org/10.1109/ICCAD66269.2025.11240791

@inproceedings{d64da85a8bb1477ab8e830406901021b,
title = "Diff-DiT: Temporal Differential Accelerator for Low-bit Diffusion Transformers on FPGA",
abstract = "Diffusion Transformer (DiT) models have shown superior generative capabilities in image and video synthesis, yet their high computational cost during inference remains a critical bottleneck. Temporal differential computation offers a promising solution to low-bit quantization by exploiting the temporal similarity in activations. However, applying this technique to DiT's Attention layers introduces substantial memory and computation overheads.In this paper, we present Diff-DiT, the first FPGA accelerator designed for low-bit DiT inference with differential computation. To overcome the unique challenges of DiT quantization and hardware acceleration, we propose: (1) an approximated differential attention (ADA) method that selectively approximates attention computations across time steps using a significance score, enabling low-bit on-chip execution while minimizing memory overhead; (2) an optimal cross-cast data accessing pattern with flexible data reuse to maximize computational intensity during matrix multiplications; and (3) a half-condition splitting (HCS) dataflow optimization and fine-grained pipelining to reduce the computation and memory access latency.Extensive experiments show that Diff-DiT outperforms NVIDIA V100 GPU by 1.39× in end-to-end throughput and 5.60× in energy efficiency. When compared with state-of-the-art diffusion model accelerators, Diff-DiT also achieves 2.81× and 2.77× improvements in throughput and energy efficiency, respectively.",
author = "Shidi Tang and Pengwei Zheng and Ruiqi Chen and Yuxuan Lv and \{da Silva\}, Bruno and Ming Ling",
note = "Publisher Copyright: {\textcopyright} 2025 IEEE.",
year = "2025",
month = nov,
day = "20",
doi = "10.1109/ICCAD66269.2025.11240791",
language = "English",
isbn = "979-8-3315-1561-4",
series = "IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD",
publisher = "IEEE",
pages = "1--9",
booktitle = "2025 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2025 - Conference Proceedings",
}