High Throughput and Fully Pipelined FPGA Implementation of AES-192 Algorithm
High Throughput and Fully Pipelined FPGA Implementation of AES-192 Algorithm
Abstract— AES (Advanced Encryption Standard) is one of the To do that, we apply the loop-unrolling, fully pipelining, and
most common and secured symmetric key cryptographic sub-pipelining techniques. Also, other efficient methods are
algorithms. AES has received considerable attention from utilized for the most complex parts of AES-192 such as Mix-
scientists in latest years due to its broad spectrum of applications columns, S-boxes. Such implementation is highly fast and thus
such as communication, military, network, electronic banking,
it can be utilized in high speed networks. The rest of this paper
Internet of Things (IoT), etc. AES can be implemented using
software and hardware. Hardware implementation can be based is organized as follows: Section II presents an overview of the
on Field Programmable Gate Array (FPGA). By using hardware, AES algorithm. Section III presents the proposed high
a higher data rate for fast applications such as routers can be throughput implementation for AES-192 algorithm. Section IV
achieved compared to software implementation. In this paper we presents results and comparison. Section V highlights some
present an FPGA implementation for AES-192. We employ loop- conclusions and at the end of the paper, a list of the used
unrolling, fully pipelining, and sub-pipelining techniques and references is given.
other efficient methods for the most complex parts of AES-192
such as Mix-columns, S-boxes. Our AES-192 implementation II. OVERVIEW OF ADVANCED ENCRYPTION STANDARD
using Xilinx Defense-Grade Virtex-7(XQ7VX330T-RF1157)
FPGA achieves high throughput of 54.52 Gbps and maximum
The National Institute of Standards and Technology (NIST)
operational frequency of 425.996 MHZ. published the Advanced Encryption Standard (AES) in 2002
[2]. It is released to replace Data Encryption Standard (DES)
Keywords— High-throughput, AES-192, fully pipelining, sub- [13-14]. The overall structure of the AES encryption process is
pipelining, FPGA shown in Fig. 1. The plain text message is divided into blocks
each of which is 16 bytes (128 bits) long. The key length is
I. INTRODUCTION taken 16, 24, or 32 bytes (128, 192, or 256 bits). Thus, the
algorithm is referred to as AES-128, AES-192, or AES-256,
Cryptography is the art and science of Encryption and
based on the key length [15]. In this paper we will present an
decryption. Encryption converts clear understandable
implementation for AES-192. AES-192 gives higher
information (i.e., plain text, audio or video) into a ciphered
percentage of encryption which makes it more secure than
form which is unclear and not understandable. Decryption
AES-128.
converts the ciphered form into the original information [1].
All operations in AES (i.e., addition, multiplication, and
Cryptographic algorithms are classified into symmetric and
division) are performed on 8-bits over the finite field GF(28)
asymmetric. In symmetric systems, the decryption and
decryption use the same keys. In asymmetric systems, the with the irreducible polynomial represented in Eq. 1 [15-16]:
encryption and decryption use different keys which are related ݉ሺܺሻ ൌ ܺ ଼ ܺ ସ ܺ ଷ ܺ ͳሺͳሻ
by a certain function. AES is a familiar popular symmetric key
Thus, each processed block is 16 bytes which are arranged as
cryptographic algorithm. Recently, AES has earned a
a 4 * 4 square matrix. This matrix is copied into the state array,
considerable attention from researchers due to its wide range
which is modified at each stage of encryption or decryption.
of applications in communication, military, network,
After the final stage, the resulting state array is copied into the
electronic banking, Internet of Things (IoT), etc. AES can be
output matrix [15-16].
implemented in either hardware or software form. Hardware
From Fig. 1, we can deal with AES in four main parts [16]:
can be implemented using reconfigurable devices such as
1) Adding an initial key
Field-Programmable Gate Arrays (FPGA) which gives high
performance requirements. 2) The middle rounds (1:11 in case of the AES-192)
Hardware implementation can also use the loop-unrolling 3) The final round
[2– 11] and partial rolling [12] techniques in order to increase 4) The key expansion unit
throughput to area ratio and to decrease the area cost.
In the first part, an initial key is added to the plain text block
Moreover, to increase running frequency and throughput,
using Add Round Key operation. In the second part, each
pipelining and sub-pipelining techniques can be applied. This
middle round contains four transformations
paper aims to introduce a FPGA implementation of AES-192.
The state array is added to the round key array over GF(28).
As mentioned before, this addition in GF(28) is performed
using a bitwise XOR operation [16].
E. Key expansion
Fig. 1. The structure of AES-192 Each round has its own key which is generated from the
original key using key expansion. A total of Nb(Nr+ 1) words
are generated from the key expansion. The AES algorithm
A. Substitute Bytes Transformation (S-Box) performs Nr rounds and each round needs an initial set of Nb
S-box is an invertible byte substitution transformation. It words of key data. The resulting key schedule consists of a
replaces each byte of the state array independently by a linear array of 4-byte words, denoted [wi], with i in the range
corresponding byte value using a look-up table with a fixed Ͳ ݅ ൏ ܾܰሺܰ ݎ ͳሻ [21].
size of 256 bytes. [16]. The S-box offers non-linearity and
KeyExpansion (byte key[4*Nk], word w[Nb*(Nr+1)],
confusion based on multiplicative inverse and affine
Nk)
transformation as shown in Eq.2 and 3 [16], [18].
begin
word temp
ܵሺܺሻ ൌ ݉ݎ݂ݏ݊ܽݎݐ݂݂݁݊݅ܣሺܺ ିଵ ሻሺʹሻ
i=0
while (i < Nk)
ͳ ͳ ͳ ͳ ͳ Ͳ Ͳ Ͳ ݅ Ͳ
w[i] = word(key[4*i], key[4*i+1], key[4*i+2],
Ͳ ͳ ͳ ͳ ͳ ͳ Ͳ Ͳ ݅ ͳ
Ͳۇ ۊ ۇ ۊ ۇ key[4*i+3])
Ͳ ͳ ͳ ͳ ͳ ͳ Ͳۊ ݅ହ
ۋͳ ۈ ۋ ۈ
ۈ ۋ ݅ i = i+1
Ͳ Ͳ Ͳ ͳ ͳ ͳ ͳ ͳۋ Ͳ
ܶܣൌ ۈ ൈ ۈସ ۋ ۋ ۈሺ͵ሻ end while
ͳۈ Ͳ Ͳ Ͳ ͳ ͳ ͳ ͳۋ ݅ۈଷ ۋͲۈ ۋ i = Nk
ͳۈ ͳ Ͳ Ͳ Ͳ ͳ ͳ ͳۋ ݅ۈଶ ۋͲۈ ۋ
ͳ ͳ ͳ Ͳ Ͳ Ͳ ͳ ͳ ݅ଵ ͳ while (i < Nb * (Nr+1))
ͳۉ ͳ ͳ ͳ Ͳ Ͳ Ͳ ͳی ݅ۉ یͳۉ ی temp = w[i-1]
if (i mod Nk = 0)
temp = SubWord(RotWord(temp)) xor Rcon[i/Nk]
Where, AT refers to Affine transform, ݅ ൌ ܺ ିଵ .
else if (Nk > 6 and i mod Nk = 4)
B. Shift Rows Transformation temp = SubWord(temp)
This transformation, cyclically left shifts each row of the end if
state array with different offsets where the values of the offset w[i] = w[i-Nk] xor temp
depend on the row number. The ith row is circularly left shifted i=i+1
by (i-1) bytes. For instance, the second row is circularly left end while
shifted by one byte and the fourth row is circularly left shifted end
Fig. 2. Pseudo Code for Key Expansion [21].
by three bytes [15], [19].
138
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 06,2025 at 03:36:31 UTC from IEEE Xplore. Restrictions ap
The pseudo code of input key expansion is given in Fig. 2 ሺܽ ܺ ଼ ܽ ܺ ܽହ ܺ ܽସ ܺ ହ ܽଷ ܺ ସ ܽଶ ܺ ଷ ܽଵ ܺ ଶ
[21]. For AES-192, Nk is 6, Nb is 4, Nr is 12, SubWord() is a ܽ ܺሻ ൌ ܽ ൈ ሺܺ ଼ ܺ ସ ܺ ଷ ܺ ͳሻ ܽ ܺ ܽହ ܺ
function that takes a four-byte input word and applies the S- ܽସ ܺ ହ ሺܽଷ ܽ ሻܺ ସ ሺܽଶ ܽ ሻܺ ଷ ܽଵ ܺ ଶ ሺܽ ܽ ሻܺ
box (discussed earlier) to each byte and produces an output ܽ ሺͻሻ
word, RotWord() is a function that performs a cyclic rotation
by one byte and Rcon[i] is the round constant word array. By substituting Eq. 8 in Eq. 7 and then simplifying, Eq. 9 is
obtained and it provides an efficient implementation of
multiplication by 02.
III. PROPOSED HIGH-THROUGHPUT
IMPLEMENTATION FOR AES-192 ALGORITHM B. Efficient S-box method using logic optimization based on
Our implementation is based on some previous work for truth table
AES128 [16]. In this paper, we modify this work by using
loop-unrolling to remove all required loops in AES algorithm Implementation of S-box based on composite field
which leads us to modify the critical path. We also employ full approach has high hardware complexities. Thus, an efficient
and sub-pipelining techniques to increase operational pipelined S-box implementation is used [16],[23]. It utilizes
frequency. Fig. 3 shows the general block diagram of loop- the combinational logic to solve the unbreakable delay
unrolled and pipelined AES-192. Fig. 4 shows a general sub- incurred by look-up tables. It also reduces the critical path
pipelined round of AES-192 [16],[22]. delay caused by composite field arithmetic. The S-box
transformation has a 16 ×16 bytes table. Thus, its truth table
contains 128 rows. This truth table provides an 8-bit output.
So, it is very difficult to simplify this big and complex table.
The solution is to divide the truth table of S-box into 16 sub-
truth table based on 4 least significant (or most) bits of the main
truth table. These 4 bits will be the input of sixteen module
logic functions (M1, M2, M3… M16) as shown in Fig. 5.
139
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 06,2025 at 03:36:31 UTC from IEEE Xplore. Restrictions ap
IV. RESULTS AND COMPARISON
A. Simulation Results
B. Implementation Results
140
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 06,2025 at 03:36:31 UTC from IEEE Xplore. Restrictions ap
TABLE II. Comparison with previous work.
V. CONCLUSIONS
In this paper we presented an FPGA implementation for
Fig. 10. Power analysis of AES-192 algorithm at maximum freq.
AES-192 using loop-unrolling, fully pipelining, and sub-
pipelining techniques and other efficient methods for the most
complex parts of AES-192 such as Mix-columns and S-box.
Table I gives the device utilization summary of AES-192 Our implementation of AES-192 using Xilinx Defense-Grade
implementation. Virtex-7 (XQ7VX330T-RF1157) FPGA achieves high
throughput of 54.52 Gbps and maximum operational frequency
TABLE I. Device utilization summary of AES-192. of 425.996 MHZ.
141
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 06,2025 at 03:36:31 UTC from IEEE Xplore. Restrictions ap
Resources and Execution Time Reduction, International Journal of
Research in Electrical & Electronics Engineering, , 2013, pp. 08-12.
[14] Cheng Wang and Howard M. Heys, Using a Pipelined S-Box in
Compact AES Hardware Implementations, Proceedings of the 8th IEEE
International NEWCAS Conference, pp. 101-104.
[15] William Stallings, Cryptography and Network Security Principles and
Practices, Sixth Edition, Pearson Education, Inc., 2013, pp. 129-173.
[16] Abolfazl Soltani, Saeed Sharifian, An ultra-high throughput and fully
pipelined implementation of AES algorithm on FPGA, Microprocess.
Microsyst.39, 2015, pp. 480-493.
[17] Liting Yu, Dongrong Zhang, Liang Wu, Shuguo Xie, Donglin Su,
Xiaoxiao Wang, AES Design Improvements Towards Information
Security Considering Scan Attack, 17th IEEE International Conference
On Trust, Security And Privacy In Computing And Communications/
12th IEEE International Conference On Big Data Science And
Engineering, 2018, pp. 322-326.
[18] Julia Juremi, Ramlan Mahmod, Salasiah Sulaiman, A Proposal for
Improving AES S-box with Rotation and KeyDependent, International
Conference on Cyber Security, Cyber Warfare and Digital Forensic
(CyberSec), 2012, pp. 38-42.
[19] Soufiane Oukili, Seddik Bri, A.V. Senthil Kumar, High speed efficient
FPGA implementation of pipelined AES S-Box, 4th IEEE International
Colloquium on Information Science and Technology (CiSt), 2016, pp.
901-905.
[20] Rizky Riyaldhi, Rojali, Aditya Kurniawan, Improvement of Advanced
Encryption Standard with Shift Row and S.Box Modification Mapping
in Mix Column, 2nd International Conference on Computer Science and
Computational Intelligence, 2017, pp. 401-407.
[21] Rijmen and J. Daemen, Advanced encryption standard. Federal
Information Processing Standards Publications, National Institute of
Standards and Technology, 2001, pp. 19-22.
[22] Anuroop K.B, Neema M, Fully Pipelined-Loop unrolled AES with
Enhanced Key Expansion, IEEE International Conference On Recent
Trends In Electronics Information Communication Technology, 2016,
pp. 988-992.
[23] Nabihah Ahmad, Rezaul Hasan, Warsuzarina Mat Jubadi, Design of aes
s-box using combinational logic optimization, in: IEEE Symposium on
Industrial Electronics & Applications (ISIEA), 2010, pp. 696–699.
[24] N. S. SAI SRINIVAS, MD. AKRAMUDDIN, FPGA Based Hardware
Implementation of AES Rijndael Algorithm for Encryption and
Decryption, International Conference on Electrical, Electronics, and
Optimization Techniques (ICEEOT), 2016, pp. 1769-1776.
142
orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 06,2025 at 03:36:31 UTC from IEEE Xplore. Restrictions ap