A High-Speed CRC-32 Implementation On FPGA
A High-Speed CRC-32 Implementation On FPGA
School of Information Science&Technology, Northwest University, Xi’an Microelectronics Technology Institute, Xi’an, Shaanxi,
Xi’an, Shaanxi, China China
[email protected] [email protected]
Yanyan Li
School of Information Science&Technology, Northwest University,
Xi’an, Shaanxi, China
[email protected]
Abstract—Cyclic Redundancy Check (CRC) is widely used for compared to a typical parallel CRC algorithms [14]. The
transmission error detection in various communication interfaces. proposed implementations can be used as IP in the design of
As the transmission rate increases, accelerating CRC with lower high-speed interfaces.
resource consumption for high-speed interfaces becomes
significant. This paper analyzes and implements a typical CRC II. ARCHITECTURE DESIGN
algorithm (Stride-x) and designs a padding-zero strategy to
The basic principle of CRC is modulo-2 algorithm, which
support the input data length with multiples of byte. Besides,
experiments are conducted to validate the proposed algorithm on
performs long division on the input data using a given
Xilinx FPGA platforms. When stride is 1, the proposed algorithm polynomial. The remainder of the division is the CRC result.
outperforms a typical parallel CRC algorithm in throughput and The CRC-k algorithm generally refers to a k bit CRC polar
resource consumption with various input bus widths (32/128/256 that has a specified value. This article mainly uses the idea of
bits). Stride-x to reduce logic latency, save resource consumption,
improve bandwidth, and support input data with any multiple
Keywords-CRC; Lookup table; FPGA of bytes.
1666
Authorized licensed use limited to: BMS College of Engineering. Downloaded on October 18,2024 at 09:05:17 UTC from IEEE Xplore. Restrictions apply.
In Figure 1, part 1 is the top module, T (k bit) is the CRC III. EXPERIMENTAL RESULTS
result of the current stage; C (k bit) represents T as the input of We conduct experiment for CRC-32, which is a common
the next stage (which is initialized to all 1s at the first stage); I CRC polar. The experiments mainly study the impact of the
(n bit) is another input for each stage; F (1 bit) indicates stride value (x) and the input data width (n) on throughput and
whether the current stage is the final stage; O (k bit) is the final resource consumption.
CRC result after all stages end.
Different works have focused on the implementation of
Part 2 consists of the padding-zero module and the Stride-x CRC on FPGA platforms [10] [11] [12] in recent years. This
module, while the padding-zero module includes pre-XOR paper implements the stride-x algorithm on Xilinx Virtex,
module and post-XOR module. The pre-XOR divides C into Kintex, and Zynq series. The synthesis is performed using
two parts. The higher v bits are left-padded with zeros to get H Vivado 2021 software. The clock frequency is 100MHz. We
(n bits), and the lower (k-v) bits are right-padded with zeros to obtain the result of resource consumption, maximum clock
get L (k bits). The dividend G (n bits) comes from the XOR frequency and transmission throughput.
operation of H and I. In each stage, the Stride-x module
calculates the CRC remainder R (k bits) for G. Then the Firstly, we conducts experiments for the Stride-x module
post-XOR performs XOR operation for L and R and gets the on Virtex (xc7vx690t) with different values for x and n. The
CRC result T. The result T in the final stage is the output data results are shown in Figure 2 and Figure 3. The LUT
O. consumption and throughput increase linearly with n, and vary
slightly with x. When x=1, the implementation achieves the
maximum throughput and minimum LUT consumption. The
same conclusion applies to Kintex and Zynq.
1667
Authorized licensed use limited to: BMS College of Engineering. Downloaded on October 18,2024 at 09:05:17 UTC from IEEE Xplore. Restrictions apply.
Table 2. The performance under different bus widths on Zynq
Secondly, we conduct experiments to compare the Stride-1 [9] Dong X , He Y .CRC Algorithm for Embedded System Based on Table
with a typical parallel CRC algorithm [14]. The results are Lookup Method[J].Microprocessors and Microsystems, 2020,
74:103049.DOI:10.1016/j.micpro.2020.103049
shown in Table 1 and Table 2. When the input data width are
[10] Q. Clark Shen, J. C. Vega and P. Chow, "Parallel CRC On An FPGA At
32 bit/128 bit/256 bit, the stride-1 achieves a higher Terabit Speeds," 2022 International Conference on Field-Programmable
throughput than the parallel CRC algorithm. After adding Technology (ICFPT), Hong Kong, 2022, pp. 1-6, doi:
padding-zero, the critical path of the Stride-x algorithm 10.1109/ICFPT56656.2022.9974233.
becomes longer, resulting in a slight decrease in throughput. [11] J. Mitra and T. K. Nayak, "Reconfigurable Concurrent VLSI (FPGA)
Design Architecture of CRC-32 for High-Speed Data Communication,"
IV. CONCLUSIONS 2015 IEEE International Symposium on Nanoelectronic and Information
Systems, Indore, India, 2015, pp. 112-117, doi: 10.1109/iNIS.2015.66.
This paper analyzes the Stride-x method, designs the logic [12] J. Cabal, L. Kekely and J. Kořenek, "High-Speed Computation of CRC
modules and adds the padding-zero mechanism. In the Codes for FPGAs," 2018 International Conference on
experiments for CRC-32 on FPGA, the Stride-x algorithm has Field-Programmable Technology (FPT), Naha, Japan, 2018, pp. 234-237,
good scalability for different bus widths. When x=1, the LUT doi: 10.1109/FPT.2018.00042.
resource utilization is the lowest and the throughput is the [13] H. Liu, Z. Qiu, W. Pan, J. Li, L. Zheng and Y. Gao, "Low-Cost and
highest. In addition, the FPGA performance of the stride-1 Programmable CRC Implementation Based on FPGA," in IEEE
Transactions on Circuits and Systems II: Express Briefs, vol. 68, no. 1,
algorithm is better than that of the typical parallel CRC pp. 211-215, Jan. 2021, doi: 10.1109/TCSII.2020.3008932.
algorithm.
[14] Stavinov E .A Practical Parallel CRC Generation Method[J].
[2023-12-21].
REFERENCES Available:https://www.researchgate.net/publication/265523608_A_Practi
[1] Peterson, W. W. and Brown, D. T. (January 1961). "Cyclic Codes for cal_Parallel_CRC_Generation_Method
Error Detection". Proceedings of the IRE 49 (1): 228–235.
doi:10.1109/JRPROC.1961.287814.
[2] T. . -B. Pei and C. Zukowski, "High-speed parallel CRC circuits in
VLSI," in IEEE Transactions on Communications, vol. 40, no. 4, pp.
653-657, April 1992, doi: 10.1109/26.141415.
[3] N. N. Qaqos, "Optimized FPGA Implementation of the CRC Using
Parallel Pipelining Architecture," 2019 International Conference on
Advanced Science and Engineering (ICOASE), 2019, pp. 46-51, doi:
10.1109/ICOASE.2019.8723800.
[4] C. E. Kennedy and M. Mozaffari-Kermani, "Generalized parallel CRC
computation on FPGA," 2015 IEEE 28th Canadian Conference on
Electrical and Computer Engineering (CCECE), 2015, pp. 107-113, doi:
10.1109/CCECE.2015.7129169.
[5] M. Walma, "Pipelined Cyclic Redundancy Check (CRC) Calculation,"
2007 16th International Conference on Computer Communications and
Networks, Honolulu, HI, USA, 2007, pp. 365-370, doi:
10.1109/ICCCN.2007.4317846.
[6] [1]Sarwate, D. V .Computation of cyclic redundancy checks via table
look-up[J].Communications of the ACM, 1988, 31(8): 1008-1013.
DOI:10.1145/63030.63037.
[7] Y. Huo, X. Li, W. Wang and D. Liu, "High performance table-based
architecture for parallel CRC calculation," The 21st IEEE International
Workshop on Local and Metropolitan Area Networks, Beijing, China,
2015, pp. 1-6, doi: 10.1109/LANMAN.2015.7114717.
[8] F M. E. Kounavis and F. L. Berry, "Novel Table Lookup-Based
Algorithms for High-Performance CRC Generation," in IEEE
Transactions on Computers, vol. 57, no. 11, pp. 1550-1560, Nov. 2008,
doi: 10.1109/TC.2008.85.
1668
Authorized licensed use limited to: BMS College of Engineering. Downloaded on October 18,2024 at 09:05:17 UTC from IEEE Xplore. Restrictions apply.