Design_and_Implementation_of_Gray-Coded_Bit-Plane_Based_Reconfigurable_Motion_Estimation_Architecture_Using_Binary_Content_Addressable_Memory_for_Video_Encoder
Design_and_Implementation_of_Gray-Coded_Bit-Plane_Based_Reconfigurable_Motion_Estimation_Architecture_Using_Binary_Content_Addressable_Memory_for_Video_Encoder
1, FEBRUARY 2022 85
I. I NTRODUCTION
Fig. 1. Partitioning of a CTB into CB and TB.
ONSUMER electronics (CE) devices pose major chal-
C lenges that are required by multimedia applications
such as data storage, data handling, bandwidth requirement information. The nodes of a transform quad-tree structure rep-
and quality. Although various optimized software and hard- resents a transform block (TB). In H.265, the maximum size
ware for video encoders and decoders (codecs) have been of the basic encoding block is 64 × 64 compared to 16 × 16
developed to satisfy the needs of applications viz. virtual real- in H.264, i.e., 16 times more number of pixels compared to
ity, video conferencing, etc. [1]. Several organization bodies H.264.
like international standard organization (ISO) and interna- Motion estimation (ME) is a crucial component of video
tional telecommunication union (ITU) developed MPEG-4 and coding process with 80% of the total computational load [3].
H.265 compression standards. As compared to the preceding To procure video compression in CE devices, optimized strat-
standards, the latest standard H.265 compresses the video into egy approaches for computing ME is inevitable. Often CE
a compact space and provides 59% better coding efficiency [2], devices having several bottlenecks such as limited computing
by incorporating larger block sizes and various partitioning power, bandwidth and low power consumption. So algorithms
modes. Both H.264 and H.265 follow a block based hybrid of video encoder need to be fine tuned to a specific comput-
video coding technique as shown in Fig. 1. A picture is par- ing platform by software optimizations or by implementing
titioned into coding tree blocks (CTB) of equal sizes. These as special purpose hardware architectures such as hardware
CTBs are further sub-partitioned into coding blocks (CB) of accelerators or co-processors. A computationally effective
various sizes. A CB is again subdivided into prediction blocks architecture for video algorithms evolving based on necessi-
(PB). PBs represent a picture region that contains prediction ties mentioned for CE devices are essential. So the developed
architectures need to be implemented on field programmable
Manuscript received June 21, 2021; revised October 21, 2021 and gate array (FPGA) platforms before making commercialized
November 24, 2021; accepted December 29, 2021. Date of publication
January 3, 2022; date of current version April 8, 2022. (Corresponding author: chip on application specific integrated circuit (ASIC). FPGA
Rangababu Peesapati.) contains configurable logic blocks (CLBs) which can be pro-
The authors are with the Department of Electronics and Communication grammed to realize a function. Owing to their large logic
Engineering, National Institute of Technology Meghalaya, Shillong 793003,
India (e-mail: [email protected]; [email protected]). gates resources, FPGAs can implement complex digital com-
Digital Object Identifier 10.1109/TCE.2021.3139944 putations. The standard computing procedure is to accelerate
1558-4127
c 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on November 24,2022 at 06:18:12 UTC from IEEE Xplore. Restrictions apply.
86 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 68, NO. 1, FEBRUARY 2022
the algorithms of application by designing architectures of it minimizes the redundant search area with MV distribu-
computational intensive algorithms by employing pipeline and tion. Luo et al. [7] proposed a low delay parallel motion
parallel processing approaches and thereafter to implement estimation design based on graphic processing unit (GPU)
these onto FPGA. Recently CE devices have seen a huge for fast HEVC encoder optimization. A three layer hierar-
growth in the multimedia industry. Recording devices such as chical parallel structure is the novelty of the proposed ME.
action cameras, mirrorless cameras etc. and online streaming It uses a coding tree unit (CTU) layer with a novel index-
platforms support high resolution videos like 4K and 8K UHD. ing table in the prediction unit (PU) layer which is used
This involves processing of huge amount of pixels, as a result it to realize an efficient SAD derivation to accelerate the ME
increases the computational complexity. Therefore, hardware scheme. A compact descriptor is designed to avoid the redun-
solutions are important that reduce the computational com- dant branches in MV search. Similarly, Purnachand et al. [8]
plexity and hence enhance the performance of ME hardware proposed test zone search (TZS) for ME algorithm. An 8-point
in terms of power, area and coding efficiency. Content address- diamond and 8-point square patterns are implemented which
able memories (CAMs) are mostly used in networking systems outperforms the fast ME algorithm implemented in H.265
for routing tables, where huge amount of data need to be pro- reference software. The work contributed towards reduction
cessed at high speed. Since ME involves extensive searching in 53.1%, on average in the computational complexity with
operation, there is a scope where ME hardware can be imple- negligible loss in rate distortion (RD) performance compared
mented using Binary content addressable memory (BCAM) to TZS of H.265. Jia et al. [9] proposed a ME architecture
hardware. BCAM is a high speed search engine implemented with a novel SAD calculation scheme that reduces computa-
in hardware that search pre-loaded data by input data instead tional complexity by 50%. The hardware architecture supports
of an address in one clock cycle. This provides a fast search 8K UHD @ 30 fps at maximum operating frequency of
operation. The resulting hardware will reduce the ME compu- 290 MHz. Kim et al. [10] proposed an IME hardware archi-
tation time to a great extent. So, with a motivation to reduce tecture that reduces computational complexity by 72.25%. The
the computational complexity of ME hardware design for real proposed architecture supports real time encoding of 8K UHD
time CE applications, this work investigates BCAM based @ 30 fps at maximum operating frequency of 500 MHz.
ME architecture in conjunction with Gray-Coded bit-plane Gogoi and Peesapati [11] proposed a hybrid search pattern
based technique. The major contributions of this paper are ME algorithm and its hardware architecture. The architec-
summarized as follows. ture processes a CTB in parallel and requires 59.5 clock
• The proposed work presents a hardware oriented gray- cycles. The architecture supports 8K UHD @ 78 fps with
coded bit-plane based ME technique that reduces com- maximum frequency of 162 MHz. Singh and Ahamed [12]
putational complexity of hardware encoder. proposed a low power hardware architecture for modified
• The developed hardware architecture is based on BCAM hexagonal grid search algorithm. The architecture uses a small
engine that increases the ME performance by utilizing the amount of memory of 8.192 kB with power consumption
on-chip memory operation. of 151.76 mW. It supports 4K UHD @ 30 fps at an oper-
The organization of rest of the paper is as follows. ating frequency of 250 MHz. In the recent years, some of
Section II discusses the related works. Bit-Plane based ME the studies on ME using low bit-depth technique have pro-
and the proposed technique is presented in Section III. The vided better results in picture quality without consumption of
proposed ME hardware architecture using BCAM is described much hardware. A low bit-depth representation is defined as
in Section IV. Section V discusses the results and analysis representation of each pixel with lesser number of bits com-
followed by conclusion of the work in Section VI. pared to all the 8 bits. In addition, to minimize the matching
computation, exclusive-OR (EX-OR) operations are used with
the array of boolean instead of SAD computation. Several
II. R ELATED W ORKS works have been performed using one bit transform (1BT)
Several works have been published with the objective to and two bit transform (2BT) approaches that are filtered by
accelerate the ME process with various hardware architecture. multi-band pass filter and resultant binary frames are obtained
Zheng et al. [4] proposed a hardware efficient block match- between filtered and original frames. 2BT approach found to
ing algorithm (BMA) for variable block size ME (VBSME). be better than 1BT due to usage of local mean and stan-
A small ME hardware is effectively used for different search dard deviation for improving the estimation accuracy [13].
strategies and early termination to cover a predictive search Yavuz et al. [14] developed a selective-gray coding-based
window and to improve the speed of the search operation, ME approach and proposed a related hardware architecture
Thang and Nam Dinh [5], presented a two dimensional matrix to obtain a single bit-plane by choosing certain bits of gray
array integer motion estimation (IME) architecture using full coded pixels. This approach reduces binarization cost and
search ME (FSME) algorithm for H.265. This architecture increases ME performance considerably. Celebi et al. [15]
processes 4K resolution video at 30 fps with latency as low proposed an one dimensional filtering based ME having low
as 1219 clock cycles. Ito et al. [6], used an adaptive search complexity. Two bit-planes were constructed for the matching
range selection algorithm to reduce computational complex- operation and the proposed method has low hardware com-
ity of hierarchical FSME. The algorithm modifies the search plexity. Aggarwal and Khare [16] proposed a ternary content
range and predicts the best point within the search area addressable memory (TCAM) based ME using one prediction
using MV distribution of the neighboring blocks. Further, unit. The search strategy is used to reduce don’t care bits from
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on November 24,2022 at 06:18:12 UTC from IEEE Xplore. Restrictions apply.
GOGOI AND PEESAPATI: DESIGN AND IMPLEMENTATION OF GRAY-CODED BIT-PLANE 87
The nth order gray-coded bit-plane gtn (x, y) consists of all IV. P ROPOSED H ARDWARE A RCHITECTURE
the gn bits of the frame. For a frame of 8-bit pixels, the eight In this section, a reconfigurable ME architecture based
one bit gray coded bit-planes from g0 to g7 , where g0 is the on BCAM engine using the proposed gray-coded bit-plane
least significant and g7 is the most significant gray coded bit- matching method has been presented. the proposed work has
plane. used a motion analyzer module to categorize the motion of
ME based on gray-coded bit-plane matching is determined a video sequence. Based on the motion analyzer value differ-
by a correlation measure (Ck ) between the current block ent algorithms can be applied to different video sequences. In
(CB) located in the current frame (CF) and reference block Chandran et al. [19], author has proposed motion analyzer to
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on November 24,2022 at 06:18:12 UTC from IEEE Xplore. Restrictions apply.
88 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 68, NO. 1, FEBRUARY 2022
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on November 24,2022 at 06:18:12 UTC from IEEE Xplore. Restrictions apply.
GOGOI AND PEESAPATI: DESIGN AND IMPLEMENTATION OF GRAY-CODED BIT-PLANE 89
1: for n ← 255 to 0 by 2 do
2: gc1cf [n] ← g7_cf
3: gc1rf [n] ← g7_rf
4: gc1cf [n − 1] ← g6_cf
Fig. 7. Conceptual schematic of algorithm decision through motion analyzer.
5: gc1rf [n − 1] ← g6_rf
6: gc2cf [n] ← g5_cf
7: gc2rf [n] ← g5_rf
8: gc2cf [n − 1] ← g4_cf
9: gc2rf [n − 1] ← g4_rf
10: end for
11: X1 = gc1cf ⊕ gc1rf
12: Y1 = gc2cf ⊕ gc2rf
13: Z1 = X1 Y1
14: Final = X1 + Y1 + Z1
15: return Final
Fig. 8. BCAM search engine unit.
design, eight 16×16 BCAM cells have been used to store eight
16 × 16 reference blocks. Reference block data are uploaded RF data register. It consists of 8 bit-plane registers to contain
in sequential manner and it takes 8 clock cycles. 4 gray-coded bit-planes of CF and RF respectively. This unit
requires a total memory of 256 byte and it requires 5 clock
B. Overall System Architecture cycles to fill the all the bit-plane register arrays. The hardware
The top level hardware architecture of the system is shown implementation of the binarization unit is explained through
in Fig. 6. It consists of total seven modules: i) motion analyzer, Algorithm 1. This module consists of two sub-modules: CF
ii) RF and CF data register, iii) BP register array, iv) bina- bit-plane mixer and RF bit-plane mixer. This module pro-
rization unit, v) controller, vi) BCAM engine and vii) MV duces two resultant bit-planes for CF and RF respectively. It
calculation unit. Initially, both the CF and RF are applied takes 4 clock cycles. The resultant bit-planes are stored in four
to the motion analyzer. The controller decides whether a full 256×256 registers (g76_cf , g54_cf , g76_rf , g54_rf ). Then CB and
search or fast search algorithm need to be applied based on RB of size 16 × 16 are uploaded to the four 16 × 16 registers.
the motion value from motion analyzer, i.e., based on the The BCAM search engine unit is shown in Fig. 8. The archi-
motion analyzer value the hardware architecture works as re- tecture consists of eight 16 × 16 cells. The RBs of g76 and
configurable system for the search operation. The conceptual g54 planes are stored in the cells through look up data ports
diagram of this operation is shown in Fig. 7. This module lu_data_76 and lu_data_54 port in a sequential manner.
requires 94 clock cycles. The corresponding gray-coded bit- After 4 clock cycles it raises a flag of completion. The CB is
planes are uploaded to the bit-plane register array from CF and searched through word data ports w_data_54 and w_data_76
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on November 24,2022 at 06:18:12 UTC from IEEE Xplore. Restrictions apply.
90 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 68, NO. 1, FEBRUARY 2022
TABLE I
PSNR C OMPARISON W ITH OTHER L OW-C OMPLEXITY ME T ECHNIQUES
TABLE II TABLE IV
BD-PSNR AND BD-BR C ALCULATION ASIC A REA C OST U TILIZATION R EPORT
TABLE V
ASIC P OWER U TILIZATION R EPORT
TABLE III
FPGA D EVICE U TILIZATION R EPORT
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on November 24,2022 at 06:18:12 UTC from IEEE Xplore. Restrictions apply.
GOGOI AND PEESAPATI: DESIGN AND IMPLEMENTATION OF GRAY-CODED BIT-PLANE 91
Fig. 9. Rate distortion curve for (a) foreman_352 × 288, (b) container_352 × 288, (c) coastguard_352 × 288, (d) flower_352 × 288 video sequences with
QP 22, 24, 27 and 32.
TABLE VI
C OMPARISON R ESULT FOR THE P ROPOSED A RCHITECTURE
it can be observed that, for the proposed technique there is of bit-depth, technology, maximum operating frequency, on
negligible increment of BD-BR by 0.1021% with negligible chip memory, search range, throughput, clock cycles per CTU,
BD-PSNR loss of 0.893 dB compared to traditional SAD gate count, power consumption, BD-BR increase, supported
technique in H.264. PUs and supported resolution. As the state-of-the-art works
The proposed BCAM based ME architecture was written have been carried out on different process technologies, so
in verilog HDL and the performance analysis was carried out the parameters of the design are normalized to 65 nm tech-
on both FPGA and ASIC platform. The design uses FPGA nology to demonstrate a fair comparison. The proposed design
resources of 10835 look up tables (LUTs) and 8356 flip supports 8K video at 53.71 fps which is higher compared
flops (FFs). The design was synthesized using FPGA EDA to [10]–[13]. It uses dedicated on chip memory of 33 kB for on
tools. Table III shows the FPGA device utilization report. The chip memory computation. The proposed bit-plane technique
proposed design was also synthesized using 90 nm process show better coding performance with a negligible degradation
technology on ASIC platform. The maximum frequency of the of 0.1021% compared to the state-of-the-art works. The archi-
design is 155 MHz with power consumption of 78.017 mW. tecture proposed by Singh and Ahamed [12] has throughput
The design requires a total area of 152.78 K in terms of of 248.8 Mpixels/s which is lesser compared to 1.9 Gpixels/s
NAND2XL gate equivalent. Table IV and Table V shows in the proposed work. Also, in terms of gate count and power
ASIC area cost utilization report and power utilization report consumption the proposed design occupies 10.6% lesser gate
respectively. The hardware design works for both H.264 and count with 51.6% reduction in power consumption compared
HEVC. However, to observe the coding efficiency using the to Singh and Ahamed [12]. In Gogoi and Peesapati [11], the
proposed technique, the BD-BR and BD-PSNR analysis have architecture occupies higher area of 2784.4 K with power con-
been carried out using H.264 encoder. Table VI shows the sumption of 463.4 mW compared to 152.78 K and 78.01 mW
comparison of the proposed work with various state-of-the-art in the proposed architecture. Also the power consumed by the
ME architectures. The comparison was carried out in terms design is 48.87 mW at 100 MHz, which is lesser compared to
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on November 24,2022 at 06:18:12 UTC from IEEE Xplore. Restrictions apply.
92 IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 68, NO. 1, FEBRUARY 2022
Celebi et al. [13]. Similarly, the proposed design shows better [16] G. Aggarwal and R. Khare, “Tertiary content addressable memory based
performance in terms of gate count, power consumption and motion estimator,” U.S. Patent 7 873 107, Jan. 18, 2011.
[17] P. Ghosh and R. Peesapati, “ Design and implementation of ternary con-
clock cycles spent per CTU compared to Kim et al. [10]. tent addressable memory (TCAM) based hierarchical motion estimation
for video processing,” in Proc. Int. Symp. VLSI Design Test, Singapore,
2017, pp. 557–569.
VI. C ONCLUSION [18] S. Erturk, “Locally refined gray-coded bit-plane matching for block
In this paper, we have proposed a gray-coded bit-plane motion estimation,” in Proc. 3rd Int. Symp. Image Signal Process. Anal.,
Rome, Italy, 2003, pp. 128–133.
based ME technique and its hardware implementation using [19] K. R. S. Chandran and P. V. Chandramani, “Hardware - software co-
BCAM for faster on chip memory computation. The proposed design framework for sum of absolute difference based block matching
technique utilizes the four most significant bit-planes of a in motion estimation,” Microprocess. Microsyst., vol. 74, Apr. 2020,
Art. no. 103012.
frame and provides similar performance compared to the state- [20] K. Pagiamtzis and A. Sheikholeslami, “Content-addressable memory
of-the-art low-bit-depth ME techniques. The novel BCAM (CAM) circuits and architectures: A tutorial and survey,” IEEE J.
based ME hardware provides similar performance compared Solid-State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006.
[21] Z. Ullah, “LH-CAM: Logic-based higher performance binary CAM
to the other hardware architectures of same category but with architecture on FPGA,” IEEE Embedded Syst. Lett., vol. 9, no. 2,
faster computation speed. The hardware architecture supports pp. 29–32, Jun. 2017.
8K @53.71 fps at maximum operating frequency of 155 MHz [22] H. Mahmood, Z. Ullah, O. Mujahid, I. Ullah, and A. Hafeez, “Beyond
the limits of typical strategies: Resources efficient FPGA-based TCAM,”
with power consumption of 78.01 mW using 90 nm process IEEE Embedded Syst. Lett., vol. 11, no. 3, pp. 89–92, Sep. 2019.
technology. [23] A. Erturk and S. Erturk, “Two-bit transform for binary block motion
estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 7,
pp. 938–946, Jul. 2005.
R EFERENCES [24] A. Celebi, O. Akbulut, O. Urhan, and S. Erturk, “Truncated graycoded
bit-plane matching based motion estimation and its hardware architec-
[1] I. E. Richardson, The H.264 Advanced Video Compression Standard, ture,” IEEE Trans. Consum. Electron., vol. 55, no. 3, pp. 1530–1536,
2nd ed. Hoboken, NJ, USA: Wiley, 2011. Aug. 2009.
[2] Y. Ye, Y. He, and X. Xiu, “Manipulating ultra-high definition video [25] S.-Y. Jou, S.-J. Chang, and T.-S. Chang, “Fast motion estimation
traffic,” IEEE MultiMedia, vol. 22, no. 3, pp. 73–81, Jul.–Sep. 2015. algorithm and design for real time QFHD high efficiency video
[3] Z. Chen, J. Xu, Y. He, and J. Zheng, “Fast integer-pel and fractional-pel coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 9,
motion estimation for H.264/AVC,” J. Vis. Commun. Image Represent., pp. 1533–1544, Sep. 2015.
vol. 17, no. 2, pp. 264–290, Apr. 2006. [26] G. He, D. Zhou, Y. Li, Z. Chen, T. Zhang, and S. Goto, “High-throughput
[4] J. Zheng, C. Lu, J. Guo, D. Chen, and D. Guo, “A hardware-efficient power-efficient VLSI architecture of fractional motion estimation for
block matching algorithm and its hardware design for variable block ultra-HD HEVC video encoding,” IEEE Trans. Very Large Scale Int.
size motion estimation in ultra-high-definition video encoding,” ACM (VLSI) Sys., vol. 23, no. 12, pp. 3138–3142, Dec. 2015.
Trans. Design Autom. Electron. Syst., vol. 24, no. 2, p. 15, Mar. 2019. [27] A. Al Muhit. “H.264 Codec (Encoder/Decoder) (MATLAB).”
[5] N. V. Thang and V. Nam Dinh, “High throughput and low cost memory (2009). [Online]. Available: https://sites.google.com/site/almuhit/codes
architecture for full search integer motion estimation in HEVC,” in Proc. (Accessed: Oct. 2, 2021).
Int. Conf. Adv. Technol. Commun. (ATC), 2018, pp. 174–178. [28] G. Bjontegaard, Calculation of Average PSNR Differences Between RD
[6] Y. Ito, T. Song, W. Shi, T. Katayama, and T. Shimamoto, “Hardware- Curves, document ITUT-T Q6/SG16 VCEG-M33, Int. Telecommun.
oriented low complexity motion estimation for HEVC,” in Proc. IEEE Union, Geneva, Switzerland, Apr. 2001.
Int. Conf. Consum. Electron. (ICCE), Las Vegas, NV, USA, 2018,
pp. 1–5.
[7] F. Luo, S. Wang, S. Wang, X. Zhang, S. Ma, and W. Gao, “GPU based
hierarchical motion estimation for high efficiency video coding,” IEEE
Trans. Multimedia, vol. 21, no. 4, pp. 851–862, Apr. 2019.
[8] N. Purnachand, L. N. Alves, and A. Navarro, “Improvements to TZ
search motion estimation algorithm for multiview video coding,” in Proc.
19th Int. Conf. Syst. Signals Image Process. (IWSSIP), Vienna, Austria, Sushanta Gogoi received the M.Tech. degree
2012, pp. 388–391. in VLSI systems from the Electronics and
[9] L. Jia, C. Tsui, O. C. Au, and K. Jia, “A low-power motion estima- Communication Engineering Department, National
tion architecture for HEVC based on a new sum of absolute difference Institute of Technology Nagaland, Dimapur, India,
computation,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 1, in 2016. He is currently pursuing the Ph.D.
pp. 243–255, Jan. 2018. degree with the Department of Electronics and
[10] T. S. Kim, C. E. Rhee, and H.-J. Lee, “Fast hardware-based IME Communication Engineering, National Institute of
with idle cycle and computational redundancy reduction,” IEEE Trans. Technology Meghalaya, Shillong, India. His current
Circuits Syst. Video Technol., vol. 30, no. 6, pp. 1732–1744, Jun. 2020. research interests include high performance video
[11] S. Gogoi and R. Peesapati, “A hybrid hardware oriented motion esti- architectures like H.264 and HEVC video codecs.
mation algorithm for HEVC/H.265,” in J. Real-Time Image Process.,
vol. 18, pp. 953–966, Jan. 2021.
[12] K. Singh and S. R. Ahamed, “Low power motion estimation algo-
rithm and architecture of HEVC/H.265 for consumer applications,” IEEE
Trans. Consum. Electron, vol. 64, no. 3, pp. 267–275, Aug. 2018.
[13] A. T. Celebi, S. Yavuz, A. Celebi, and O. Urhan, “Selective gray-coded
bit-plane-based two-bit transform and its efficient hardware architecture
for low-complexity motion estimation,” IEEE Trans. Consum. Electron., Rangababu Peesapati (Senior Member, IEEE)
vol. 64, no. 3, pp. 259–266, Aug. 2018. received the Ph.D degree from the University of
[14] S. Yavuz, A. Celebi, M. Aslam, and O. Urhan, “Selective gray-coded Hyderabad, Hyderabad, in 2014. He is an Assistant
bit-plane based low-complexity motion estimation and its hardware Professor with the Department of Electronics and
architecture,” IEEE Trans. Consum. Electron., vol. 62, no. 1, pp. 76–84, Communication Engineering, National Institute of
Feb. 2016. Technology Meghalaya, Shillong, since 2014. His
[15] A. T. Celebi, S. Yavuz, A. Celebi, and O. Urban, “One-dimensional primary research interests are Design of FPGA
filtering based two-bit transform and its efficient hardware architecture based reconfigurable systems for multimedia, signal
for fast motion estimation,” IEEE Trans. Consum. Electron., vol. 63, processing, and evolutionary computing.
no. 4, pp. 377–385, Nov. 2017.
Authorized licensed use limited to: National Institute of Technology- Meghalaya. Downloaded on November 24,2022 at 06:18:12 UTC from IEEE Xplore. Restrictions apply.