100% found this document useful (1 vote)
235 views5 pages

NB IoT NPSS NSSS Acquisition ETHgroup PDF

This document proposes a maximum likelihood (ML) detector for timing acquisition in narrowband IoT (NB-IoT) devices. Timing acquisition must be performed efficiently to minimize RF transceiver power consumption during frequent wake-up periods. The ML detector achieves an average detection latency that is half that of an auto-correlation detector and can reduce the required energy per timing acquisition by up to 34%. The ML detector is based on computing cross-correlation metrics in the frequency domain using the overlap-save method. This provides low latency detection at the cost of higher computational complexity compared to auto-correlation detectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
235 views5 pages

NB IoT NPSS NSSS Acquisition ETHgroup PDF

This document proposes a maximum likelihood (ML) detector for timing acquisition in narrowband IoT (NB-IoT) devices. Timing acquisition must be performed efficiently to minimize RF transceiver power consumption during frequent wake-up periods. The ML detector achieves an average detection latency that is half that of an auto-correlation detector and can reduce the required energy per timing acquisition by up to 34%. The ML detector is based on computing cross-correlation metrics in the frequency domain using the overlap-save method. This provides low latency detection at the cost of higher computational complexity compared to auto-correlation detectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Maximum-Likelihood Detection for

Energy-Efficient Timing Acquisition in NB-IoT


Harald Kröll† , Matthias Korb† , Benjamin Weber‡ , Samuel Willi‡ and Qiuting Huang‡
† ACP AG, Zurich, Switzerland
‡ Integrated Systems Laboratory, ETH Zurich, Switzerland

{kroell,mkorb}@newacp.ch {weberbe,huang}@iis.ee.ethz.ch,

Abstract—Initial timing acquisition in narrow-band IoT (NB- bandwidth reduction to 200 kHz was the main simplification
IoT) devices is done by detecting a periodically transmitted of NB-IoT compared to the minimal bandwidth requirement of
known sequence. The detection has to be done at lowest possible
arXiv:1608.02427v2 [cs.NI] 7 Oct 2016

1.4 MHz in LTE. But, the RF-transceiver power consumption


latency, because the RF-transceiver, which dominates downlink
power consumption of an NB-IoT modem, has to be turned is rather proportional to the carrier frequency and to sensitivity
on throughout this time. Auto-correlation detectors show low requirements than bandwidth. While adjacent channel leakage
computational complexity from a signal processing point of ratio was reduced by 5dB compared to 1.4MHz LTE [4], the
view at the price of a higher detection latency. In contrast maximum carrier frequency is only slightly reduced from 2.6
a maximum likelihood cross-correlation detector achieves low to 2.2 GHz. Thus, the RF-transceiver is still dominating the
latency at a higher complexity as shown in this paper. We present
a hardware implementation of the maximum likelihood cross- downlink power consumption. However, power consumption
correlation detection. The detector achieves an average detection of digital baseband processing scales well with bandwidth,
latency which is a factor of two below that of an auto-correlation which is useful for NB-IoT timing acquisition.
method and is able to reduce the required energy per timing Besides data decoding timing acquisition is the most com-
acquisition by up to 34%. plex baseband task along the downlink path [3]. Hereby
energy-efficient timing acquisition is important because timing
I. I NTRODUCTION
acquisition has to be done frequently, mainly for two reasons:
Various estimates predict tens of billions devices connected Firstly, NB-IoT is designed for the exchange of short mes-
to the Internet in 2020 in what is called the Internet of sages, thus devices are in deep sleep mode most of the time
Things (IoT). IoT does not only take place in our homes and wake up e.g. every hour for a short period of time to
or in areas which are covered by WiFi and other low-range receive and transmit a few hundred bytes. To ensure years
networks, but also in remote places which are only covered of battery life, circuits providing accurate timing are turned
by cellular or satellite networks. Cellular network coverage is off during deep sleep mode, which requires timing acquisition
almost ubiquitous and does not depend on proprietary end-user after every wake-up. Hereby timing acquisition has a relatively
infrastructure. large share on the short reception interval, which requires an
To realize an IoT in which the requirements for low-power, energy-efficient implementation. Secondly, NB-IoT is likely
low-cost, and extended-coverage IoT devices will be met, to be used on vehicles and drones where devices are prone
the 3GPP consortium agreed on an LTE-Release-13 extension to timing synchronization loss due to their relatively high
called Narrow Band (NB)-IoT or LTE Cat-NB1 [1]. On the mobility and the absence of handover capability in NB-IoT.
downlink and uplink side NB-IoT mainly reuses LTE technol- For timing acquisition a periodically transmitted a pri-
ogy. However, cell search and timing acquisition procedures ori known Narrowband Primary Synchronization Sequence
have undergone major adaptions to fit into the narrow 200 kHz (NPSS) has to be detected [5]. The latency of a successful tim-
bandwidth and to meet coverage extension requirements. ing acquisition (NPSS detection) is the relevant performance
The energy efficiency of an NB-IoT device preferably metric, because it determines how long the RF-transceiver,
implemented as a system-on-chip is of great importance to which consumes the major part of the power, has to be
achieve years of battery life as aimed for emerging cellular turned on to receive data. Therefore, using low-complexity
IoT standards. Besides the power amplifier for the uplink, NPSS detectors which achieve suboptimal performance can be
which holds the lions share of overall power consumption, disadvantageous for the overall downlink energy efficiency.
it is well known that the downlink baseband signal processing Contributions: We present a maximum-likelihood (ML)
consumes only a fraction of the RF-transceiver power in re- NPSS detector which achieves an average timing acquisition
ceive mode [2]. This appears because RF-transceivers are dom- latency of 140 ms (in-band deployment, TU1.2 channel, SNR
inated by analog integrated circuits whose power consumption = -12.6 dB). Our ML detector is based on cross-correlation
especially does not scale as well with the CMOS technology metrics which are computed in frequency domain via the
feature size as it scales for the digital integrated baseband overlap-save method. The detector has high computational
circuits. Therefore, NB-IoT has undergone various simplifi- complexity but allows to reduce the required energy by up to
cations to allow energy-efficient implementations. Significant 34% per timing acquisition for state-of-the-art RF-transceivers.
NPSS NPSS ML correlation function
40
10 ms NSSS x SC0 NSSS x SC2
frame NSSS x SC1 NSSS x SC3
12 OFDM

30
tones

20
11 OFDM
symbols 80 ms block
10

Fig. 1. NPSS and NSSS resource mapping onto an NB-IoT frame.


0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
normalized fo
II. T IMING ACQUISITION IN NB-I OT
Fig. 2. ML function over normalized frequency offset.
The first step after power-on (or after a wake-up from a
sleep cycle) of an NB-IoT device is the detection of an NB-
IoT capable base-station. In case such a base-station exists, the after power-on or after wake-up from a sleep cycle. This
receiver does not know which OFDM symbol of the frame is heavily affects the detection complexity because the device
currently transmitted. On top of that, the frequency relation needs to analyze various frequency-offset candidates within
between the base-station and the local receiver clock is also a specified boundary, as well. To reduce the complexity it
unknown. In NB-IoT as well as in other LTE device categories, is possible to perform a coarse frequency and timing offset
the detection of a suitable base-station and the estimation of estimation on a down-sampled version of the received signal.
the timing and frequency offset is based on two periodically For example in [6] the coarse estimation is done via auto-
transmitted sequences: the NPSS and the Narrowband Sec- correlations at a sampling frequency of 240 kHz. Then, one
ondary Synchronization Sequence (NSSS). While the NPSS sub-frame consists of only 2,400 samples.
is transmitted repeatedly every sub-frame of length 10 ms, III. ML T IMING ACQUISITION WITH C ORRELATIONS
the NSSS is repeated in every second sub-frame as shown
in Fig. 1. For NB-IoT the transmitted NPSS is identical in There are two main algorithms to perform timing acqui-
every sub-frame for all base-stations. In contrast, the NSSS sition, namely auto-correlation and cross-correlation. While
depends on the base-stations cell ID and is scrambled with a auto-correlation is the only option if the transmitted, periodic
frame-dependent sequence code. sequence is unknown, for NPSS detection both algorithms can
The NPSS is used to verify the existence of an NB-IoT be applied as the transmitted sequence is known to the receiver.
capable base-station. Additionally, it enables the estimation of Auto-correlation approaches are in general more hardware
the frequency offset and timing offset with respect to the sub- efficient than cross-correlation approaches. But, since the
frame boundary. The NSSS is then used to detect the frame auto-correlation algorithm does not exploit the fact that the
boundary and cell ID. transmitted sequence is known, its performance is sub-optimal.
The NPSS is defined in frequency domain as a Zadoff-Chu In fact, cross-correlation detectors are ML detectors [7]. This is
sequence of length 11 for each sub-carrier index n given by the reason, why many applications like radar systems or GPS
receivers use a cross-correlation for signal detection [8]. In this
−j5π(n+1)
S[n] = e 11 c[l], n = 0 . . . 10 (1) paper we focus on low latency rather than low complexity.
Thus, the ML detector [7] (Page 244), which projects the
where c[l] is an element of the code cover vector received signal vector onto each of the Nf possible frequency
candidates, is a viable option for NPSS detection.
c = [1, 1, 1, 1, −1, −1, 1, 1, 1, −1, 1],
The NPSS ML detector correlation metrics are given by
with l being the symbol index in a sub-frame. This sequence θ+189
X
is mapped to 11 subsequent OFDM symbols each consisting C (r | θ, fo ) = r[k]s∗ [k]e−j2πfo k/fs , (2)
of 12 OFDM sub-carriers holding one copy of the NPSS. k=θ
After zero-padding each of the 11 copies to 128 symbols, where the received signal vector
time-domain conversion, and cyclic-prefix insertion of either T
r = [r[0] r[1] . . . r[189 − 1]]
length 9 or 10, the NPSS results in 1,508 time domain samples.
With a sub-frame length of 10 ms and a sampling rate of has a sampling rate of fs = 240 kHz and s[k] for k = 0 . . . 188
1.92 MHz 19,200 samples need to be captured in order to get is the time domain NPSS sequence given in Eq. (1) at 240 kHz.
exactly one copy of the NPSS. As the sub-frame boundary is The ML function C(r 0 | θ = 0, fo ) for a distortion-free
unknown, the NPSS can start at any of the 19,200 samples. received signal vector r 0 over the frequency offset fo is plotted
One task of the receiver is to estimate the beginning of the in Fig. 2. The ML frequency- and timing-offset estimation fˆo
NPSS to acquire sub-frame boundary timing information. In and t̂o can then be calculated according to
addition, an NB-IoT device has a random frequency offset
(fˆo , t̂o ) = arg max {C(r | fo , to )} .
because the crystal oscillator on the device is not yet tuned fo ,to
includes the digitalN --1lik vKeyconsumption
baseband
en coleads of 80 mW for
m processing in receive
characteristics cell-search. mode is203
Thus assumed, whi
are shown inthatFig. Table I.shown 30 percent ofNPSS inthe ncan nlfrequency
be implemented Naturally,
efficiently the
andin powerVLSI consumption
and thus has ofneed the
aeffort
very autoa
con which
ahcross- pn=
and mW assum noNmethod zn
we[5] ·itt
a ycl nCand nsFig. length
tperiodic of [9].the nNPSS of
time this domain method
tifsequences
is is
tocoffset that samples. aimpracticable
cross-correlation
fadditions tthatmethod in
tmultiplications dependstime domain
can on be its is
estimated implementat average
to 135.0
I ato
ed 2urez502 namely rseBlock ethe epwell mterms gthe N parison,
aaccelerator 189 we
tfor assume method ethe depends low-complexity fon its auto-correlatio
implementation Natura
(VLSI) Iby a VLSI implementation aV lower energy oresult aisub-frame
}radar for every hypothesis which cfor defines cHence,the

m of NPSS hardware acceleratorfor every N frequency


Esystems

Auto-correlation approaches
as5. diagram of NPSS hardware
NP ols, croE nAthe area is
Nis occupied NB-IoT
by isdevices.nthe FFT and IFFT units. O

Fig.
N
sthe ethe

dliedbyasthe
in But, since auto-correlation algorithm does Naturally, the power as cons
narch].
is a viable )option for icNPSS detection. oo than NPSS osequence (189 samples).
2 like with the reduced when using an overlap-save (OLS) method [8]. This of not
-12.6 dB. many
The
tfdifferent frequency offset candidates performed,

cts the received signal vector


applications 5t 0onot exploit or
h GPS
the cross-correlation.
fact receivers transmitted ause sequence volutionis ofknown, iisNbut can also be applied cross-correlation. oThe
ebecomputational
iHowever the computational complexity can significantly

various frequency offset options within


as

optimal. In fact, cross-correlation


complexity. Thus, the MLas
5. oBlock diagram hardware accelerator
cCQUISITION

T
cupied
aracteristics
,

tection [10]. In this paper stream

gorithms to perform timing acquisition


requires operations. In every sub-frame we receive

t the transmitted sequence


,isKey characteristics are shown sin Table h I. 30 percent of can be fimplemented efficiently in VLSI and thu

N BASED
PSS detection.

ient than cross-correlation approaches


auto-correlation TWhile auto- toffset

ristics
Itoespecially

in case of by
established for discrete convolutions its overall computational per tim
m tishpoint-wise tlength tdepends

ystems or GPS receivers use a cross- volution but it can also be applied to cross-correlation. The

andom frequency offset because the


m parison, we assume

dates with different frequency


dethat ithe low-complexity =auto-corre

ce the auto-correlation algorithm


c sBASED e
N

hown

e device is not tuned after power-


s [6]. This is the reason, why many This method is well established especially for discrete con-
in a dCQUISITION destimation ifaccuracy. Larger FFT o sizes with smaller grid
a voltage ranging from 1.2 and 3.7 V . In a power ofmethod depends on

Block diagram of NPSS

the detection complexity because the spond to sub-frame boundaries, have to be evaluated
istde- Tc
III. Ais
ei Fiis o
sition, .performance
namely 5..the
auto-correlation y Cand cross-correlation. i While .samples
revery efrequency hypothesis which defines the

well.
b er h s the oma imumzMPLEMENTATION e - S
ORRELATION BASED stream
In IMING total
s divided
e into
cross-correlations overlapping
a-range of
asequence length
S of
are required. method N 135.4 The performance
MOPS, odiscrete
respectively
on its in terms leading
implementation. timing
to an acquisition
overall To latency
computation
make

option
eDWARE hFig. cdiagram
l r
N ·fact of frequency ,length offsets
N =the detector shall support.
in the following. r carea g occupied
c by the FFT and IFFT units. One ready low power consumption.

and cross-correlation. While auto- requires Np · Nf operations. In parison,


l 6 6 w l =Fig. 5. c Block e diagram of p NPSS hardware d(OLS)accelerator t
e mincludes ow the digital baseband processing a for cell-search. Th dnot replaced c by a multiplication in frequency domain. true? the
exploit the that the transmitted known, However the
p l T s e e parison, complexity
we assumecan be significantly
that the low-co
f In every we receive 2,400 and the Choosing anVLSI FFT size ofthat the
is we
e
LmW Key
correlation f ofor signal
characteristics
its
the only detection w
option [10].
isifare
.
In
sub-optimal.
y this
shown
Block o paper
transmitted, In wein
fact,
III. focus
e Table d
of on
cross-correlation NPSS I. method s 30
e percent
hardware
2,400 efficient
l reduced
samples inof
accelerator l andwhen
L
of
the
u computational
using can
method anNr sbe
is
of well
the complexity
implemented
overlap-and-save
h established
NPSS in one
time especially
parison, efficiently
method [8].
fori we in
assume convolutions and
that thus
the Hence,
N has
1,its024
low-comp a
overallver
as consider owina srange
p from 50 to 250 mWe for the RX-power p
N
. F w y s method dp
ORRELATION but it can also IMING
be applied cross-correlations. lies actually below 10x of the low-com
b

and ifare
rrelation NPSS detector
m
sy80 ems ti h frame i d x e r i n
VmW
10
Key
|
area g
voltage
isFioccupied
characteristics
w assumed, ranging
is There occupied
i s ,
are two i g
d main
are by i n
algorithms
shown
d
the a
rrmand
an
FFT
d
its as
toConsidering
in w
perform
performance
e
Table n
illustrated
and timing IFFT
Fig.
ifm isNI. Fin whichthe
facquisition
m 5.30
=
sub-optimal. units.
Block
31 t i
percent p
lNodepends
right
different can
Fig. fi
vfrequires
part
One
diagram
In l y
Hence,
fact, be of
frequency
5. done
Figure
resultBlock
VSberto.periodic
of ofthe
the h
cross-correlation NPSSnumber 3. in
candidates
ready
diagram
In pcana
The frequency
everyof
hardware ings
de- number
be
range
aig·total
low
of
sub-frame
cross-correlations NPSS
improve of
accelerator
implemented
reduced
of L domain.
power
frequency
we
when shown
complexity
hardware
receive
in i
usingt in The
offsets
consumption.
time-domain
frequency Nu
an Figure
of i the
=270.5 main
accelerator
efficiently 2,400
soverlap-and-save
detector
4
offset
n method
gfor
MOPS.samplesbenefit
shall
in-band Thus,
estimation
in (OLS) depends
support.
andthe
VLSI BW: thesettings
method in-band
computational
and on
Choosing
accuracy
[8]. thusitsused
is imple
mention
aneffort
but
has fo
FFT

5. Block
a e lof low-latency r rather
auto-correlation tectors than are complexity.
ML e isn the
detectors Thus, y[6].the
only ML
This detector
option
is the [6]
reason, theof why the l[6].
sequences
transmitted,
many
e eperform
This to bemethod T
cross-correlated is well is
established very
hthe long,especially
p M
while the x
for discrete con- iNthat parison, welow-complex
assume
consumption shown the
i following. s auto-correlation
u ecross-correlation. qofreceive
domain
length is
of the NPSS samples.
in time In domaintotal is cross- samples. additions and multiplications can be estimate
e
tim he m ion f lotted ing
r to sequence (
o
is unknown,
heapplied
in case
Ftasreceived
alike of thew
t
NPSS
c and
ncharacteristics oreo
detection
p
both
i ndetectors
u By uone
applying N =
Additionally,
imethod OLS 189
the but
fm
NPSS the it can N
generationalso
cross-correlation, NN
parison,
be ·= applied
N 189
of ethe
lthe towe input cassume
cross-correlations.
NPSS ism reference
acquisition. thecompared low-complexity
signals in lies actually
tothat differeauto
belo
f time {area by the FFT p There IFFT units.
a
arecross-correlations
two main One
algorithms
isresult
to
n ready timing o low
acquisition power
x Hence,
consumption. the number offirst cross-correlations in time-domain
p p s f

consumption
p e 80
n in receive hmode l
C 244),
is which
cannamely t uthe Key Key etransmitted
aer approaches
characteristics etheIninsthis
overlapping
74,400
sgtoapaper
are s are samples
shown
While m ashown
e auto-
in tneed in
iGPS
Table isto Table
on the
I.
performed I.
NPSS
30 30
operations.
percent percent
length
inevery In32and
millisec-ofeevery of
is
the the
p
sub-frame
here iscan
sub-frame for acan
we the
be the be i
implemented
ML implemented
time, =remove
detector itdiscrete
and
efficientlyefficiently
just
to themention in in
the
VLS SN V
sispd
i l u o tectors are ML
sbetheaisntradar of this c This Pthe
o
.31into
N reason,
is ·itN
c length
that why
a0 of
many
cross-correlation h NPSS
This method in time well domain
inestablished
time t especially
parison,
domain samples.
for we
is assume
average additions
con-
parison, and
ratio the mu o w
ntdetection
(Page which projects signal vector onto each other short. This is the case our example for which 189
o
t algorithms applications
obe a radar
othe systems
dofifare
GPS
sequence receivers is use
hfocus
correlations cross- are volution
required.
=tsequence
but By can
choosing also be applied
trequires for to cross-correlation.
example The

the FFT
t ea result 1189 m fgoperations.
th consider i f rdevices.
q
eoption
p

we a range from ioffsets, In total


treplaced cross-correlations of length are required. 135.4 MOPS, respectively leading to an overa
of 50 6.dymW to 250consumption
mW for the RX-pow
x hNPSS N = acquisition.
pcorrelation vAuto-correlation t eFFT
l-frN eedsfrequency nthe By applying
longer OLS delay to the·itand NPSS cross-correlation,
require In more the
memory. input Short FFT siz
· p

ram of IFFT
cross-correlation NPSS e nisi t[10]. dimpractical N
aO N
of detector area isNoccupied byunknown, isthe and IFFT units. One ready low gpower consumption.

T IMING diagram
T ort 1 s p ord sequence e eis for namely
Key
iauto-correlation stream
teboth
characteristics and
ycross-correlation.
dividedalgorithms isctare oofoverlapping can
shown
While auto-
sin
sequences Table cointo of I. length Nncan
i30 percent ywhich The of every
Nisthe
performance sub-frame
can ininwe
be
terms receive
implemented
ofSNR timing acqui
f
on in a of the ycorrelation laasignal ,· computational N
tof
the thRF-transceiver tlas
y shown in Figure

in
Key rocharacteristics shown in Table I. 30 percent of the can be implemented efficiently

IFFT
the only
h
option transmitted, periodic
l
s
rsnumber
f
fothe
2,400 samples ytotal and the length p of pthe NPSS N in
iand time VLSI and thus
and the I a single-path- 0 rg mt o known tof theh receiver. pdetection
n frequency domain l189 gets fsimplified: Different frequency offsets
applications
chosen like 188. systems
Afterwards, or N
M receivers
block-wise use aterms
cross-
cross-correlation volution but deployment also ifbe applied has to
the cross-correlation.
most demanding The As
requireme can
respm
ist fzo- i acc attheiedigital
possible
of 80
candidates for enunknown,
area with different
is t
occupied ond,
n o e
which
general
by issthe
a
the
we
results
FFT received
r in
and t
on total for
sample
emethod
iIFFT NB-IoT efficient
units. 76,800In
(2,400
h in
One samples)
N of
cross-correlations
result NN is -=total
cross-correlations
much
ready timing
longer
m acomplexity
of acquisition
xNterms
low length
one
power [9] ps roughly
are
consumption.
i required. 10x higher. 135.4 MOPS,

the NPSS
bfor
o e l
t e n i

the shown
r ethan ncase Considering
n different frequency candidates total of complexity of 270.5 MOPS. Thus, the computa
(VLSI)s
o n area
o is r occupied
,dthbyasreceived
fi o
correlation by
oKey the h Ois FFT
characteristics Key
only , and IFFT5
characteristics
.e N ifare
by = units.
a shown
t point-wise
transmitted, Fig.
gOne in
stream 5.
are result
Table
periodic i Block
is multiplication
divided
shown I. ready
sdiagram
w 30 percent
in
2,400 t low
of
overlapping
Table
samples in NPSS
l eof power
the
frequency
I. hardware
sequences
30
the consumption.
can
percent
length domain. be
of accelerator
implemented
length
the
of
NPSS true?
the
N iniftime Thethe
can efficient
plot
performanc
be ims

transmitted
o as illustrated in the
e right part of Figure o 3. The number of shown in Figure 4 for in-band BW: in-ban
RE e f MPLEMENTATION
d
, includes baseband t t sequence
t is a in
aiw
the
h sequence NPSS ntand detection
ea2each
both
c0isprelate domain is uneed w samples.
that In
fthe peak-to-average cross-
npower y ratio ispower not reliable obM
e ast reshown in Figure 6. Thus
processing for cell-search.
l ldomain =ptowe
h c e Soccupied
correlation for signal detection [10]. In this paper we focus on method is efficient · in· of computational complexity one
a i N =
risN N N
cy m icomplexity.
is a viable option
be applied NPSS
racas
0the detection. transmitted than
2is the NPSS sequencedOne (189 samples).
coknown to the receiver.
o low-latency rather
t occupied complexity.
s1the with Thus, the the ML
different detector frequency [6] of offset
the sequences candidates to be is performed,
cross-correlated of
very -12.6
long, dB. while The the different cdiagram curves relate to different thresho
,
p onto an eNB-IoT o
t a
are moret -1 hardware efficient than cross-correlation
nHowever,
approaches
ioisdoes
tearea
the length computational
189 that
by
need complexity
to
thebe
FFT
performed
ocyclic can
Consideringoon be
andevery significantly
IFFT
millisecond,
punits.
different
which AsOne
Fig. 5. will
frequencyresult
Block seecandidates in
ready the following,
a ·total
low
of of
NPSS the ML
complexity
hardware detector
consump of 270
toacc do

Table
et shion area ns be
-0.5 is
ilfprojects
ming aframe.
y C FFT IFFT
P74,400 c7caseunits. result ready low power
esequences consumption.
p

n
N =the 31 p s f
e algorithms
s fse hinav[9]. m,areduced
rand lillustrated
o e t. the linevector to shifts e[8]. which ccan
pm be easily implemented onin
-2 f o -1.5 0.5 1.5
ofisd
hlm
tchyenormalized dodwith m latency
ll end me(VLSI)
cross-correlations to be performed every
)= happlied sequence
isignal unknown, in of Lfor the NPSS detection
as lThe both in right omillisec-
part of samples.
Figuresub-frame In
3. total
The the p ML
number Nsdetector
· of f cross-showncompared in Figure the p
of c theIRF-transceiver ehit-detection
N= 189 N N
at ffs iowhich n histhe eN

the FFT
theaauto-correlation correlations sthe are required. By choosing for example
5. For elements
the FFT was and the IFFT aurasingle-path-
can the
dtransmitted fiausing
overlapping
sequence samples depends othe the NPSS length and is here tNPSS for the first time, remove itactual
and justthema

transmitted,
fsince
ofshown rehn
low-latency
oarea eis
rather
occupied than
u
fwhich
isby the Thus,
FFT ML detector
IFFT tunits.
[6] of tthe
One result be ascross-correlated
ready low powervery long, while
consumption. the
32
t Additionally, the generation of the NPSS reference signals in to different SNR
V. tH lfatthe ihardware foverlapping nlength
(Page 244), onto other one is short. This is case in our f example for which
adix-2 g r to ge s o L fa
chosen ffi , t̂ o
r c as
ARDWARE
s,Auto-correlation
But,
r esequence algorithm
which
dm pcan when
M area
be
f is impracticable
done
MPLEMENTATION s
e in
an occupied M
asfrequency
overlap-save NB-IoT
]signalin a by
domain. (OLS) devices.
74,400 method sFFTmain
cross-correlations
the andrThis
obenefit
average IFFT settings
-isnot need
powerneed units.
to used
onbelonger
many laOne
for
performed
is sub-frames
ofcomputed oinresult
every as ready
based
=millisec-
onlythe low-complexity
over the
sub-framelow
afor
10xshort powe
of peak-
detect

ACQUISITION
nefact e r
u i
sul to 250 mW for theincludes the digital approaches are general more correlations are required. By choosing for example
e itknownnthe e t w p t i s
oAuto-correlation algorithms
u v
m 4 can be ond, y n
applied e the
is188.results transmitted
impractical
9 e
total for sequence
NB-IoT
number a of devices.f
samples
76,800 189 w depends
cross-correlations the timing caseacquisition
t [9]
32
and isthe
is roughly here higher.
the first
d e n. a range from
t
pe mW
e to receiver. approaches in general f n R

units.
are In
n r p - (Page 244),
i . which projects
chosen the
to h be hardware.
received [ Afterwards, h vector onto
the each
block-wise other one
o
cross-correlation short. This is the
deployment our
whichf example
has for which
most takes
demanding 620 S
i
e s i i n
ssu consider
of possible candidates different frequency t
offsets, the received sample sequence (2,400 samples) is much
ep awe
r s

and74,400
i f̂
50 m not r exploit s
RX-power
the a arecomoreahardwarethat the transmitted p r e is known,
d o However the
Key
computational t
characteristicscomplexity cane be
are l
significantly ashown o
in ofTable I. 30 percent ofof the c
N e
l er o
d sarer ebut of this 5 method is
o.applied
that a cross-correlation in time domain is average ratio of e all cross N correlations BW: is this rea
f cross-correlation NPSS detector
p
sam n ord ndar les.
y i
a was dis
M
e
(
10 t i
nt msuframe,
e d a
nits performance l e
plo2,400 - l
w onisisatime-domain
efficient Nsthan
viable f
is option
o s u
dfor n
efficient
a pfact,
lelcross-correlation asamples
NPSS m .
ti lcross-correlation
peethe
than known
of e method
aeauto-correlation
sdetection. c
the 3 to e
1tiitmfthardware
cross-correlation
N
is
emoption
the
possible
prrby
aapproaches.
well x
w
caireduced
receiver.
e
ldalso
established
frequency
enNPSS
approaches
candidates
with f
Auto-correlation
However, thewhen
itibe
with
different theespecially
n domain
omultiplication
But,
length
thandifferent the
approaches
computationaln NPSS
189
ian
since
frequency
for
that
rainm
frequency
discrete
ond,
gets
need e
chosen
the
sequenceoffsetto which
incomplexity
general
naof
be
offsets,
convolutions
simplified:
to lperformed
(189be
candidates
is188. impractical
results
can
samples.
thKey
the ecomputational
samples). be
every
Afterwards,
received
Hence,
inDifferent total
significantlyfor number
millisecond,
An
is characteristics
sample
performed,
its
NB-IoT
the
FFT t i
itoplotoverall
frequency
scomplexity
sequence
ioffset
which
block-wiseof
devices.
m
size
76,800
As computational
(2,400
-12.6 we of
189
Soffsets
will
cross-correlation
are
edB. 1,024
samples)
The shown is
effort
cross-correlations
see in the
results
t different
much
different
per
As
inwhich
timingtiming
following, can
deployment
longer
curves in relate
Table
acquisiti
acquisition
be the
a correspo
grid see
which
I.toThe ML
30sesd
dif

detection
rchitecture with radix-2 N
elements chosen t 3 0 r
ere e bo tectors sMaLmare ti asdetectors
sub-optimal.
the rins[6]. ald In
sBut, inissince b areplaced
more
e n can
e
de-
t o
reduced
point-wise
be
efficient swhen
than to using cross-correlations.
e is
cross-correlation overlap-and-save
fshifts iothe
frequency
However,
approaches (OLS) domain.
the length method 189 lies[8].
true?
that uThis
the
actually
need bei(189 snot
belowsuggests
performed can h10x be that
every that ntheof
emillisecond,
significantly the low-complexity
of As
curves we will timi

I.
ces o o tunits.
r the[Todo, writethe about aFFT
d n iML h[9]. w . h -[9]. m ito area
isissusing
e impracticable -
occupied for t NB-IoT
by f devices.
the FFT q and IFFT One result ready l

of length N , and Nf IFFT operations need to be performed.


point FFT, Nf point-wise complex multiplications of a vector
average for each received block of size NN-POINT
compared to the low-complexity auto-correlation method. On
detector, the complexity is expected to be significantly higher

hardware.
relate to cyclic shifts which can be easily implemented in latency of 480 ms, whereas the auto-correlation detector of [9]
frequency domain gets simplified: Different frequency offsets
Additionally, the generation of the NPSS reference signals in to different SNRs.
replaced by a point-wise multiplication in frequency domain. true? the plot suggests that the different curves correspond
of this method is that a cross-correlation in time domain is average ratio of all cross correlations BW: is this really
which can be done in frequency domain. The main benefit settings used for hit-detection based on the actual peak-to-
with the different
chosen
overlapping

but it can
method
reduceddoes

ond,

Considering

length
shown algorithm does
pti
e ofethecross-correlation is viable o for detection. an overlap-save (OLS) than method
the NPSS [8]. sequence need
samples). as ismany sub-frames as the low-com
Figure 6. is we consider a range
m a

sequence
relate to cyclic which can be easily implemented in

shown
with different frequency candidates m performed, -12.6 dB.

we focusisondivided
latency of 480 m
aNtheinPrpeStransmitted
w n s c
ne hathetio9fact bl-ep
i f d p d u b a o
yeddomain u
A block
as a
diagram
Zadoff-Chu
o
ONPSS
. shown indetector
o i craw d

detector
p f d This s a the reason,
Sem
why smany r This
which methodcan well
done established
in frequency t especially domain. for discrete
The main con- benefit settings n used e for hit-detection based on the
atm
e

inOne
a

IFFTcross-correlations
d f exploit q isbutthe susequence of 234 Hz which isdomain. sufficient for d NPSS detection an
FFT and o
b sam the
IFFT single-path-
eth onabout, w plof m umRF-transceiver d i
e I as
m 2.iveapplications aauto-correlation p onormalized
e0ctsesystems
s,GPS
noralgorithm as
fdoes
shown
oScfocus
Additionally,
By ocross-
in not
applying
um
But, the since
OLS generation
is cknown, the
llit can
the of
fact
auto-correlation
NPSS the NPSS
that IV. theChtoaOMPLEXITY
ucomputational
algorithm
cross-correlation, reference
reduced area
signals
tcross-correlation.
doeswhen the is input
using isnin
impracticable
in anoccupied
to different
acquisition.
overlap-save
AND for
aPnisERFORMANCE
NB-IoT SNRs. by
t(OLS) the
io complexity
devices. methodiroverall
FFT o4,lbenefit
[8]. and This IFFT not need units. as permany O

is known,
hy160 mureceivers

in general
e n function
fraec

As the proposed cross-correlation-based algorithm is an ML

By applying OLS to the NPSS cross-correlation, the input acquisition.

However, the

In hardware
Soffset. g can

30
aFig. hML notover exploit that frequency
l sequence However the pespecially complexity

onto each samples


be significantly
athis iwell uratio

total Nsin
isn p
r method established forbe discrete
i convolutions -bcan Hence, its computational effort tim

illustrated
by ub-fr reeceof tcorrelation t like radar use
u gvolution isyalso be appliedwhich can done frequency
The The main hof settings used for

of NPSS hardwaref accelerator


me 2index eFFT . I 189 tm P of method that asequences
m cross-correlation tknown, in time domain beenseen average qin of all cross correlations BW

periodic
f given 8o[10]. qthis oInN s s pisthe hardware. wefficient n sIn takes 620 ms to
l l 1samples -sub-optimal. e -

and
0 etime-domain n
ble time ressources [Todo, write e not exploit fact that the transmitted isfrequency However the computational can be significantly
carrier by o e

offsets,
o frequency domain gets simplified: Different offsets

Tableresult
As from Figure the OLS detector achieve
n e
. s e s re tThus, b N
l stream divided
n into a
overlapping method an of length
istherefore
well established The
i especially performance for r
[8].discrete terms convolutions timing acquisition
Hence, latency
itsratio
overall

which is results
0 r -isn m r o

every sub-frame
a n S its performance ffact, mcross-correlation de- reduced when using overlap-and-save N chosen (OLS) method in this work. addition, the width
m f d 0 r
heim asw i c
yi hsaedetectors
r
r
-f 19. ning on, e a
s b
He to s thcehem NPSlow-latency s o transmitted
f for
r trather signal
,4 sthan r
detection
=sequence
rrML u In iso[6].
oftknown,
paper · we
operformance a itstoreason, performance
on f but
method it ecan is also ais
m besub-optimal.
in applied
terms of to x
cross-correlations.
computational
is 0
of this method complexity isi
t of that if a one
: the cross-correlation m lies actuallyin timehin-bandbelow
domain t
- BW: 10x is that averageof the low-com of

Fig. 3. Computation
l cross-correlation t tplot
nit IFFT a single-path- of the RF-transceiv
he2tectors o isreplaced R algorithm
ris e 5. Ta uthe -copare xthe mcyclic w by As
point-wise lthe beproposed multiplication cross-correlation-based
inreducedfrequency domain.
stin true? the is
suggests an ML
that theactually is A
different bloc cu
shown inwas Figure im For con the FFTin and eakfor
is[6]

bothIFFT
ncyrasteoIn t·olthe its as sub-optimal. In fact,
the1 replaced
de- when tousing ofan480 overlap-and-save (OLS) method [8].
vaeN sorfceocoverlapping
eillustrated flong,

of
percent
sThis

to be 188.
ontML abemethod
relate shifts which can easily 3. itimplemented inwhile
BW N
160 latency ms, 4whereas the auto-correlation detector of
cture with radix-2 elements pachosen
tlike f es asystems
This in
swhy thesequences
many right part
This of toFigure
well
but The
established
can number
also beespecially
applied shown for discrete Figure
cross-correlations. con- for in-band
lies mention belo

is well established
d[6].adepends isrgeneration aoThis isjuwell lytake

de-alsoreduced
i Npecomplexity. rdetector 2nespecially

[6] of
h

when
n N SFig.
the begi addit se t r- ts =fact, oss projects b r iML of the to
veOLS thecross-correlated is by very acquisition.

units.
ond latitniogn nown que(Page
s Ahsamples By applying wThis NPSS cross-correlation,
correlation
sexample point-wise the
peak input
multiplication of inefrequency allows p90% domain.
itto land true? every the plot fourt sug
f oradar

is correlations
l e r140
applicationscross-correlation a m e l lvector
detectors
tectors GPS aretireceivers are MLAdditionally,
icdetectorsuse a eone detectors
cross- o the
volution [4]. l o reason,
but itNPSScanof is
why the
also manyNPSS
be applied reference
d
method
to signals
d
cross-correlation. in established
g toThe e rtime,
different
laSNRs. e adiscrete con-

the
e i m

IV. C OMPLEXITY AND P ERFORMANCE


I to
hardware. a detector, IV. C the complexity is expected
P takes 620 to
ms beto significantly
achieve therhit higher
rate. is V.shown H
r f c
niiasignal h e on the By length
applying and OLS is to here
n the for
NPSS n the first
cross-correlation,
f remove input just mention
acquisition. the SN
ph received
b p
freccceofyivthe
f 244), which i n the t t
ryl different w
psignal 2 l
onto each other isdivided short. This e is OMPLEXITY
the case r
incross-
our n
AND for ERFORMANCE
which i f a d AR

· NNTable
u
e s
(1) rrepuFor taamany
e t ndetection itsdomain adeployment

area
e s N
hne = 0In. . . 10 ca ta k
T oML N
e Nrthe fin correlation
t henfor gacorrelation dogd o m frequency3 applications
orchosen r In this w begiven
like stream
radar
frequency systems is or GPS into
bsequence
receivers
gets overlapping
simplified: Additionally,
use a eDifferent
tsamples) sequences ais much
the
volution generation
frequency ofbut length
it of
ndinofterms
can
offsets the
N
also NPSS
be isequences
difshown The
applied performance
reference
tinu70% to signals
cross-correlation.
slength in terms
in
nFigure to
The of timing
different acqui
SNRs

N
opossible ,are tislonger As has can beofseen from 4, the OLS det
wn radix-2 c bcomputational

Energy per NB-PSS det. [mWs]


w
cCoom h the the NPSS f detector the metrics i
canFFT
. candidates y
detection [10]. topaper we focus on method is efficient incross-correlation
terms of complexity one o

I.ffs30
g ispoint, whilei still covering c of the peak amplitud
n a
IFF
-
delay
e t ing. et be er po e the feedback architecture
pow with elements was chosen s p r d o , a

than the
the received
other one depends

methodinto

isusing
length

domain

range of frequency offsets

with the known NPSS. In addition correlations areNaturally,


correlating scheme
me ressources [Todo, write about a a reason, n l why d with = applications c y
offsets, like 188.the radar Afterwards,
received systems
sample
l y the or
block-wise GPS stream
(2,400 n divided into overlapping o which a the h most demanding
p SNR
The requireme
performanc

in the
l n i n t N
n
eletforhey244), eadetection. e n e

impractical
e ce a viable
n e
caInd
t - ncot 140 alofsequence
ucompared toesamples).
the low-complexity
iperformed, ethe twhile
auto-correlation method. s On the FFT

NPSS
as illustrated in the right part of Figure 3. The number Figure 4 for in-band BW: in-ban
ofm-12.6
correlation for signal [10]. In this paper frequency
we focus on domain method gets simplified:
efficient Different of computational frequency offsets
complexity if one As can be seen
stk. rm
b omaoption tio low-latency v160 s with cthe

be applied
ery arfreeqetuorre nisuem tCelynOMPLEXITY elong,
eres rocnf(Page
f
oinvthe e different

Key

accelerator
N IV.relate toNPSS cyclic shifts P(189
which can isbe easily is e
implemented in V. H (VLSI) I MPLEMENTATION
ros received
latency 3.of 480 ms, whereas the auto-correlatio
nrather
dw egthan N complexity.
f
aeThus, the ML detector [6] the sequences to be cross-correlated very long, the rtime,
o

However theespecially

Hence, the number of cross-correlations in

Hereby different time offset to hypothesis, which corre- Fig. 6. Energy


As ctthe proposed cross-correlation-based algorithm

impracticable
AND ERFORMANCE
forto tecan hARDWARE is sThe an ML A block diag

computational

= units.
it dhe
NPSS than c this dthe iimplemented rachieve

cross-correlations
s aft the different frequency
.offset candidates dB. The
there curves arelate to different thresho
timing-offset

d preceivers shown in Figure


oo lfilicswhich
im offvector us in . wby nthan as illustrated right part ofcross-correlated
Figure number of
h l
ecuse iscan trades off ablock
minimum height aoof correlation ofpeak ma

One result
nML

thesequences
l

= ready
d inprojects
escross-correlation 9 c the
low-latency for rather
cbeaoverlapping
fisignal detection
complexity.
nThus, [9].
onethe isaIn detector [6] m of tour
inssequences lof to be isfirst
very inwhile the

Energy per NB-PSS det. [mWs]


ye tcover H samples depends on the NPSS length and is for
: of the remove it and just
candidates

cto

Afterwards,
relate
d beca wiFtihg. 3 or for ervdw
w s a o op t l co
u which i(Page hthe
signal vector hardware. onto .cross-correlation-based
8e]offsets, ain
each
averageother
for short.
each This is
rcyclic
receivedtheeexpected case shifts
block which
rAebe
example fornbe
size which takes easily 620 ms single aon N-
90% latency
athe hit rate. 480 ms,

31percent

of
e n feew n lN o 4 o samples

frequency
s n N N 2.87/10m

isanoccupied
r n T m o W s s
lo i
e m a 8 m As proposed s i algorithm is an ML diagram the cross-correlation NPSS detec
f e r done
[ frequency domain. The
overlapping main benefit
samples settings
depends used
on the for NPSShit-detection inelength based
and is here actual
for the peak-
first

189characteristics
r
due to fthe th available time are ressources [Todo, write about FFT

2,400 samples
y p o detector, the complexity is
, to significantly higher is shown in Fig
Ta
e o h i 0 eB . g n t m s e

in a total
c n t a l t t i o u
120 o h s 1 yi . 244), which s projects
chosen i thebe
to received
i 188. signal
o
Afterwards, vector
e onto
the e each
block-wise other one is
cross-correlation short. This is the
B
deployment case Oour
which example has for
the which
most demanding S

is efficient
n s · el mpt f5re9. ris ad
t s of mthe
eyy optionf possible candidates s
r with ] different i gnmethod frequency d l - s the received
f sample
r hardware. sequence computational (2,400 samples)e is dmuch
complexity longer vand memory v takes
size. 620 By ms toch ac

isdifferent
a h e e q c r ]
p .rr3
e
2,400

right part
u y m i e e e

when to
✓+137
paper a we r focus 0 on low latency 8 rather than low complexity.
r v

NPSS sequence
e n , a X e s o

in
t t f
l n m
er SS rba-nfarcacencSe inNC(r
g i (VLSI)
N [ obuetfio eth with gulow-complexity ns of t-12.6

we
s ,i8c0 efor ris performed, as multiplications cu dB. [9FFT
f
qu not i o ve detection. detector,
of this the complexity iscocthe qhuedetection.
that a is
point pCeto
expected
cross-correlation FFT, to be in
chosen tipoint-wise
significantly
time to be th samples).
domain 188.higher complex
is
Afterwards, is
average shown the in
ratio Figure
block-wise of all 5. The
cross
cross-correlationmain
of
correlations computational
achvector BW:
deployment iselements
tations this which or
rea a
lex t opti e

is short. This
of the possible candidates with different frequency offsets, the received
uthe sample sequence (2,400 samples) is much longer

the
pviable beNPSS Nsequence 160

I. 30 percent
n a i
N
o dy l I B v IV. P(189 V. H Irelate

that needcomplexity
F d different than thefrequencyNPSS offset candidates c b o The different curves to dif

NpFig.
e s e t .n e e 6 d

l compared OMPLEXITY
the
F tfAND s
ERFORMANCE
auto-correlation i method. On
ARDWARE
a the f and
MPLEMENT
IF

the received
e

overlap-save
j2⇡f
a . n o (2)

overlapping
p o 8 m e a e s a t t o
c | n

are
s ✓, n f ) r = d 7 r[k]s [k]e ,

time
c P t n
t F e g a o This allows to cover a frequency offset ran

samplethe
S c b d 140 h compared to the low-complexity auto-correlation r method. On n the FFT a and IFFT blocks r
with a required throughput
a ofN Nedrywa e Nquirbrolse2s,40eriffeF.p O
1,is1, o1, 1], 120 h h

need
u c 1 a true? the plot suggests that the different curves correspo
ti r H Thus, the ML detector [4]is areplaced (Page
viable option by a
244), e point-wise
for NPSS which multiplication
projects in the
withfrequency the N different domain.
than
n= 31 NPSS
frequency sequenceoffset (189
candidates samples). is performed, r of -12.6 dB. The d
ei t l e

receive
c ) IV. CThe P ERFORMANCE V.onH ARD
m
arch]. e c s P s - e f m c v r T n fi o o
cen lt rea
o
r = ltqtui Ior T riftoy
OMPLEXITY AND

Energy per NB-PSS det. [mWs]


o r i e r

for number
s e 0 o r s e itcyon Additionally,
. S ispcueach
which theT can e ofblock
be length
done in frequency ,N-POINT
and
cAshNthe
domain. f
rbeIFFT ti inoperations
main eNML
benefit need
settings to used be ctof
forperformed. to2.87/10ms
hit-detection
andrespective
based the
ris y the
tthe .generation ea N Nafblock ran edomain.
n c y off

using
As nproposed cross-correlation-based algorithm isrdifferent A ablock diagram of the cross-correlation

One
L dfor fkHz.

= 189
s em average
nthis for each received eFFT c of size single N- cML

computational
e p
ac isit rseanrseereceived u d - e average received n of h size i
-POINT fsignals
single N- o f
samples N e
and e FFTsettings IFFT = for2
comp

required.
c O o ofis the NPSS reference
which can e
done in to frequency i SNRs. The t main benefit useddiagra

for NB-IoT
r e e x
e i ) x i o t N
h f t 2.87/10ms = 287.1/s 890.0/s
t
r ( i

to be
r b w c

cross-correlations.
o d r e t d e
mI T I heva itiof ot-h n· rNT e B signal e evector S prletonto e each r of irnredomain the if possible r frequency 31 · 4 · 234 ⇡ 29.0 O
average ratio
de4,radix-2
all cross correlations BW
wa Ursequence
i n 100 k=✓ p v of method that cross-correlation O
proposed in
evector time domain
cross-correlation-based is algorithm is an A block
e f =03n1n lafdetector, d method
N
nNum tombpel oT(OdL c120 ohecomplex
t enc iswexpected

offset (189
inda
N
e

NB-IoTofdevices.
O e point foeFFT, the complexity toofbe significantly higher isand shown in Figure 5. The main computation

in termssequences
adoffsets
t

domain

of the ML
ecsubframe. oinss multiplications

toready
N. s 3=the mc sav lfrequency ty fgetssehby folow-complexity nt multiplication -in ecan

frequency
0 p esu
point-wise multiplications e auto-correlation tations or be million operations per secon
rdThis g aI SPSIaSncqguthis oautN aiu n
simplified: Nf Different frequency As 1.5seen from Figure the OLS detector achieves

of
low
can
parison,
method
of this itIn is that a iscross-correlation Sthe time ndomain average ratio of

5. result
tFFT, istations
rac ed b
xipoint ebepoint-wise complex thexpected of vector or an
N 45.6
ftaolrsignal

on isthetheNPSS
sequence
u
i ieoonns atlocandidates, ly, length h-ctooNPSS N Thtoimplemented
140

to be performed
ait 1.5

ofcross-correlated
dois-annd-soption omore
true? beL plot suggests thatis the different cu
req dco vector t100 hp,lereplaced torathe ehpoint-wise detector, the
hby frequency
complexity
crlow-complexity
domain. to significantly higher shown in Figur

an overlap-and-save (OLS) methodlies


I ifl IFFT every tsub-frame we receive tiSNRs. s = 2,400 samples a[

Energy per NB-PSS det. [mWs]


a . ciarelate compared method. On the inFFT and IFFT blocks with required

be performed
T -Ieis naalrviable for detection.

N
a tcyclic m O N

block-wise
. 3 ne r n scwhere received
t f e . r a a

and
of and h c operations need be performed. respectively. 480Even for the demanding IFFT is possib
f h i l n s g shifts which can easily replaced althe point-wise latency multiplication ofinauto-correlation ms, whereas
frequency
l the auto-correlation
domain. true? detector
the plot of
sug

samples.
N N i y
Q
a e F m
O a i e v s o h o h e r

offset hypothesis

Naturally,
of length
d l t a B o w B e
M symbols each consisting l e a e n h NOt atakes

the
s o l

o o
Additionally, the o generation of f
the NPSS reference signals to different

Figure 3. Theis very

s = 2,400
C N w a c o o i o m raverage of length , and compared IFFT
of sizeaoperations
to need to h be performed.
r method. On respectively.
the FFT oand Eve IFF

Blockcandidates
h
f
S e l
c upi n s i d
i N p e
A g e l b F d . r l i a n -N o f t p t r l o , forw each N
r received Cyclic N
block N-POINT
-POINT single
FFT N- t r andto890.0/s t FFT an

samples
t i
I s
a Nthe 2.87/10ms =90% 287.1/s
p

candidates
S k G of min l hi o e l a d gI s r e t
o rpthe NPSS t l a c c e g e 6 d length s of the NPSS in time domain is c sa

By choosing
t r
t e aerdoth si-tcc upltrs 8u9tat leerl pcuial ovrre hedorT c n i y a e l hardware. r f
t 620 ,ms to o
achieve a hit rate.

for discrete
o a For ML detector correlation metrics
s e are given Additionally,
by generation of the NPSS reference signals in NN different SNRs

by
c n a r n . = 189

(OLS)
l b g f e

be
i o r i t o l l e s

power
f i c
. Wsnh psTidn
4

the detector
P Icopy i a a t e

of computational
o r h f e o a o frequency o
f FFT,loNdf point-wise domain
-12 swhich
gets simplified: Different frequency offsets t
le s. O n sfrequency omultiplications o- teor

the length
o average for each received block of size As can be seen
a single from N-OFigure 4, the OLS
reseen det
l. None loI of 80 t
r s oion urmy l point ge implemented N-POINT-POINT FFT p
N
y ti NPSS. N N I millionAs 2.87/10ms e be = 28

is
ng the b sn ucomplex gets of a vector
urf e cross-correlations ttations . 120 1.5 and 45.6 radix-2
aoperat
T M rafm
K s occ

76,800 189 cross-correlations


devices.
dauto-correlatio

low-latency
npdClelength Shift ATthe
O

of 5copies b B e
orm ationreitItedi,n ntioan 0Tceruois is riem =1p[r[0]CC bvr[1] CCm n o issCC -c1]] ied CC e at es nv80 io FFT, gtotal
udomain simplified: in u Different frequency ms,a Toffsets Ncan

(2,400cross-correlation
.e. . r[137 toh cyclic of AND beIn
s r hm
t acatnicao he ecsop ingoass-cstabol s100
e IV.
ioucn N , and N of length ors are req

complexity
aamhardware. atcan

samples).
eV.a be

case inlength
a it
E. D diag pplencput✓+137
relate shifts
NfnPgIFFT point easily
scyclicN f point-wise
to N ·igN complex Hlatency
multiplications ra ofimplemented 480 of whereas
vector Stations
the
nptrate.
1.5 and
fIF

N
of roperations need besperformed. respectively. Even
(VLSI) for more
I MPLEMENTATION
S90% demanding

the

implemented
a

of
T oms,

canare
OMPLEXITY ERFORMANCE ARDWARE
n o o h h t i t F t
m eFteTF e
r i

ready
Indiagram
-POINT
t inP
11 n to 128 symbols, m relate to shifts which hcan i easily

lowevery
canpconsumption.

method
depends
c
g a t x . Eachieve latency of 480

every sub-frame
e l F r T a t o t

p = 189 samples.
gB.A ck s to p -coirrcestranosw C u X e l , s h t

we shall
c c h n r er ed us cr ll c be eq co . te As nuthe
t d g s e l
ai
e m e of length , and IFFT operations takes need 620 to ms be to
performed. N a m hit
respectively. Even
in u 0 l e e m M

in thefractional-frequency
S i u

method
n c e g d g p N
Considering N s different frequency candidates a -t

Nthe
I 4 e i ra hl p r r e s R o N a = 31 E l e
F , ) a p
f
n r

samples

Fig. 6. Energy per NPSS


t h 3 a leo r[k]s r uac

NO a FFT

total Npa ·total


s t hofhPSeSeither 4 u h e mi v s o e S s f l proposed
o s ⇤ cross-correlation-based r ealgorithm
hardware. N is an ML
f e A block % diagram of L the n
cross-correlation e takes NPSS
620
h ms detec
to

Naturally,
wn FFT
d j2⇡f f

be
, i n t (2) r p

every
q n CC s CC i CC S o CC e e m e 0 o

samples) is much longer


Bolroithm cinsertion
c-prefix esr nisu orate e C (r | ✓, o f ) = l
c. . .complexity [k]e , t PS74,400
P
ugI MPLEMENT
d
are
rors is thee s N
7se has , gwe aechvsampling wbl w e s orefor or efiIV. th significantly ti performed al(VLSI)

N-
o
ofhed240 60
sw kHz Pn aland gmss[k] orr S kSin= sm0s-pthe t expected av eto? be n cross-correlationsheis shown 9 in FigureIneed m ma

FFT[8].and
136

is performed, of -12.6 dB. The different curves relate to different threshold

our example
n n f ehigher M a

of length
u to be o every o

of
a detector, is n a 5. V. TheH l
main n
computational elements

arethe
d N
f r e tT

be implemented
i i C OMPLEXITY e

convolutions

N
i r

power
e
inmaittime
n t g c AND ERFORMANCE
w ) ARDWARE r c

significantly

assume
e samples d e l r - s a n e s e o
sho and I
n i H s p i P o e 100 n
a vise an S IA block r with ti required th Fthroughput
80
anadctonsi af bthy
te eos is pthe l e uceadextended pp thodtheit c128-point a ap te FosIFFT s Ncaofcompared 0cr 60 s ru iffer bond, , ethe

the
fo which
ow ow esh lprefix e
).s top the corNB-IoT o

f = 32 for example
s cross-correlation-based CsOMPLEXITY

number
lin
g 1508 e IV. m P ERFORMANCE T V. seHcARD

correlation
shown
pHor meddcyclic 4e0the elements bthe n iblock uta iofaredevices.

can be significantly
ANDand
l in f r k=✓ low-complexity tauto-correlation method. On
is FFT MLV L IFFT -blocks

millisecond,in
ch chicross-corr.
e

single N-
n,we redbe s mSe tobut ovceiern⇤t rt boe c itsheth -w(2i,saverage dthe cross-correlation
o FtrF lsesifor each which impractical sfor

required.
inAsreceived proposed algorithm diagram
IAF block

millisec-
aNnthe
iohn aroispttiicsieed ans proa ion atime aithe . oftosizedis N 8be0 significantly p890.0/s

complexity
( =is 287.1/s r diagra ssi

support. that the low-complexity auto-correlation


adetector, salgorithm

depends
and
c c a m e

of
p e -POINT aproposed
IFFT single N- o and FFT and IFFT comp

low
can
parison,
method
h s n As 4 cross-correlation-based is anu5.ML

we assume
and is
tu domain w NPSS
ee on ap relat algroerd s hkond de-als Th OuLtionintiso effihtceps s. Tohi 60uloenc0ck 9dspoint sequence given in Eq. 1. t m m i 2.87/10ms
r d p

long,of
i o a o

Ns NPSS
e

NPSS in time efficiently in VLSI and thus has a very


i o i where the S [k] received
o n s signal vector
ma complexity s
expected O to
of the o higher E shown in Figure q The n iscan
main computation o

ondefines
c T 40 t do omtoacomplex t cbe45.6 cmillion reradix-2

power
h iad TFFT, s be pFigur

Fig.by6.cross-
r p t e Nfcompared Adetector, ofHowever, isthe R computational complexity aoperations signifi

This IFFT
s n
FF

time-domain
h 1.5eto n
e a h point-wise n multiplications acomplexity
vector tations Aexpected or and significantly higher shown pers
in secon
c, cinuebadysampling
y t y i o i
ocnt and
l e i T

consumption.
c g h

the
n b m a i

· of
r et iomaximum-likelihood
i l n the low-complexity c auto-correlation method. On t Evenmafor the
the FFT and 80 IFFT blocks with a required

we
d IFFTlsoperations need W
ms as elati srate co tof ehnodrt =
eThe nca ny in -1 d od rqiugenfunction q 8 foryNT Fcross-corr. t IFFT t possib

N if one
(c1a| ✓ofin= en toof6be2size
auto-corr.
T iit isand

performed
n vode1,200x hse-1 e . 0, 40ftNiom,)average 0performed. RaDrespectively. erathe

Npower
C(r

for which
abe f auto-correlation th = s Fauto-corr.

whileshown
a io nc mrreltatit mapplsy- zivim
length and aeach ereceived tcompared reducedto N the when
low-complexity using an
e Noverlap-save more
method. demanding(OLS)
On method
FFT FFT [7] IFF

its
s - elget ltez

be

the implementation.
c f for s oN-
sowponcaptured eth theesesignal dies ps r nsce0etr[1] ignNfffpoint-wise st lablock woriN1.5

window
f cross- consumption.
iaepdpli uto-coin
rr order msp, q[r[0] ma.the en1]] N-POINT
-POINT NAO single
IFFT and
oN-pmillion
an

efficiently
in upoint h2.87/10ms /287.1/s FFradix-2 operati
890.0/s
r sequeacodistortion

receive
cro oto y a rofree s d received vector efr over . . r[137 frequency e s multiplications H m ofTsize

which
r b u h n h s e r
a d u f o n q s average
k for each
method . received
is well r a block
.
established s N-POINT
-POINT
especially O.0 aand
IFFTsingle 2 for I
2.87/10ms
discrete convo = 28

respectively. Even for the more demanding IFFT it is possible


tations or 1.5 and 45.6 million radix-2 operations per second,
2.87/10ms = 287.1/s and 890.0/s FFT and IFFT compu-
the FFT and IFFT blocks with a required throughput of
is shown in Figure 5. The main computational elements are

takes 620 ms to achieve a 90% hit rate.

deployment which has the most demanding SNR requirement


here for the first time, remove it and just mention the SNR

Hence, its overall computational effort per timing acquisition


not need

timing
sub-frame of the ML detector compared to theparison,
complexity
135.4
additions
Choosing
n c - w y s i t e 20 n as o d i o e c e FFT, o i n complex
t a V of
g a vector
5 tationsk 0 x -
45.6 g

Energy per NPSS detection overarea


ube sub-frame c e r n

[8]. actually below 10x that of the low-complexity timing

implemented
a - s B i n f l o w t a c 9 i i

the

the
and coarse-timing offset estimation with the overlap-save method.
d sy function f y

on
d

Table
ncofy 240 renlength

hardwareof accelerator
h s , o re2. This dbei performed.
cross-corr.

depends
o o
mcus tr[a6te] d in epar sampling a 100 point FFT, point-wise
rebe applied
complex
blorespectively.
multiplications of a vector tations or and

power consumption
t a plotted dNf IFFT operations need 8 ad foranthe

consumption of the auto-correlation


encN , and d more demanding IF

of acquisition [9] is roughly 10x higher.


he r. A ient eboundary e
aut smitt t,iscro eoffset SenSc rate
40
reliskHz

A block diagram of the cross-correlation NPSS detector

As can be seen from Figure 4, the OLS detector achieves a

The performance in terms of timing acquisition latency is

As we will
N 1.5
n
aso s ufszeoreais
-1
hFigure iev-1 evaluated in for

weitsassume
atm fetz50 inand of te150 it to and to
60 rEven
f
efes[k] igu operations
has 200 250
ive the ffic 19.200 th ransamples. ac he r ML t e folluaccording sr nhg so 8htoe8.reAc hereNquP reque s-cor tion 20 r50 qu emken= 0 . . . 136of islengthbut N , and can
ock nf F150
N IFFT also T auto-corr. need to cross-correlations.
onbe performed. em respectively. ByEven ap

RX-power
ece of r fashion e F i

MOPS, respectively
ny l

units.
s l d

thatandin
e e t f v e t o
c epaic the f f s a S f r l b i I F / s i l Fig. 5. Block diagram

in Figure 4 for in-band BW: in-band is mentioned


e n c e I n t e i er w i e 1 t cyclic t t prefix extended o i c 128-point P S RX-power tIFFT pof of the time
RF-transceiver domain
100 [mW] OLS to the NPSS cross-correlation,
. 1 m r e 200 the input stream is 25 d
i n r l

V. H ARDWARE (VLSI) I MPLEMENTATION


r t c 20ltip e N eren A wn nd 287 .6

the power consumption of the auto-correlation


a t, s beginning s c p asde rlatop tsb, e threan e i n im o

as many
h . s the i re
mate
dw Bthe u that t timal of e mof40length

implementation.
an
hi PS is pa ML vreoˆ n at a S ⇤m[k] ho overlapping a

I. 30
n fsteo NPSS e sequence given inifEq. f (1). y 100 into sequences N as illustrated

detection
T i f f o u t h i l RX-power s ofTRF-transceiver 4 5 [mW] h

the
. p o nf d s = t
and for
] G h o (f ,eot̂ ) =darg emax {C(r 50 f . : D ea 150 200 250
ary 9 fact bIn
n [ timing.
o addition, .
[6] s or an
t the h s| efo , too)} L is FF ms

FFT
ect cooys o e

270.5 MOPS. Thus, the computational


st

on its implementation.
u -

multiplications
In nM

of RF-transceiver
b i ion d be

see percent
One
he s s ctorsbecause 0]. hus, nal v qucenh th th an fo ,tood i nt-w Fig. t 6. ifieEnergy per NPSS C E detection er over the RX-power 1.5 Even

VLSI
em [1the RX-power aof RF-transceiver 10[mW]
it t nce ioffset
quency Key characteristics are sh l s h

low-complexity
t a n

sub-frames
T g e i c i r p N i g / r

Cross-Correlation
s i h e im a i 7
.8 ns o detection

size
e sy tion ity. ed s nt f r t o A

thatofthe
w ich c

in the following,
det arafter me a p e tgoen hypothesis, y h . Onper 2NPSS .
thm Energy

efficiently
a tuned s RFig.
M 6. ely over 20RX-power
s not
orm ML e rad dete mple receiv iffeHereby
c power- x re differentwh thistimebyoffset ts hich which corre- ori cantl hodCross-Correlation tio cof ivfrequency-offset candidate

per NPSSofdetection
FO

over RX-power
resultas ready
hit e - a t
area is occupied by the FFT t h g w R l g N t 50 1

To make
on e
rcomplexity ik gna because l o hethe h spond d f d , n Fig. 6. P E Energy a per fi NPSS t edetection povere RX-power

and
a l c to sub-frame boundaries,
e l y have i to s be evaluated by cross- i e l s

leading toinan VLSI


s ons r si than ts t wit tion . o lac l a f t d n g r e
Naturally, the power consumption of the auto-correlation a i hit e m o r

N
n dom insh the correlation D
a s si g n n
si vec . t
uency ti offset o r ojec within
options tes etecprocessing correlatingscheme the received reofp the itio samples A N window
n-b o be elatio O aand coarse-timing

of theaauto-correlation
can
ica ion f rathe Fig. pr 3.dida Signal cyNPSS c
cli detector which ed
onNaturally, the power To consumption fairofcom-
the auto-correlati

=
Y
dd ML I Tincludes iofractional-frequency a offset estimation. The right sub-figure

of the

[mW]
low-complexity
d

thus
n t t r of form

To
t y

the low-complexity
h S A e a r
l a y c
i shows n
method depends
a correlation S with the known its implementation. NPSS. make u aauto-correlation
In addition c correlations X
E are rperformed e l d o N s

1,
c

be
rS re hardware h c
accelerator P computation with the overlap-save
q o method. t e c n r

frequency-offset
n L r -
t e w l e N
Naturally, the power consumption r e of the t P o e c o i o e

RX-power
, t p

024
a 4) ossi b r f te rehypothesis c xp t Fig. 6. En
l
method depends on its implementation. To make a fair com. O Mfo which u ze N

make
w- 24diagram fo for every Nf frequencylaoffset ss- s defines athe ica be

estimated

Key
the ML
olock p tion of NPSS hardware accelerator a . C shallcrosupport. i e exity of si ltipl d to

has
parison, method we assume thaton theitslow-complexity re dwthe
auto-correlation

overall computational
auto-correlation
IMING ge A CQUISITION

fair com-
depends implementation. To make a fair com- V

the
p range of frequency offsets detector

is occupied
P a N f o r I d t y l u
( the ofleNPSS hardware accelerator
agram ha ose plexin i
mp block lex m s ne
e
b parison, we assume that and
the low-complexity auto-correlati

low power
can

method
ro,p which

characteristics
of viatiming -cotime-domain

over
a
Hence, the offset numberhypotheses of cross-correlations

a
number
perform acquisition

low-complexity
n Table I. 30 percent of the Hereby can be implemented
different
parison, we time efficiently
assume that in
the VLSI
low-complexity thus has a very
auto-correlation p
t m wcorre- pHowever, o n the computational complexity can be significantly

To make Fig.
a o d i

detector
Naturally, thea ve
po c e m t

to
fair
e

very
o

candidate
and
s

Naturally,
o
i While auto- requires N · N operations. Intheverye sub-frame l eivwe receive co era
stics are spond
-correlation.
shown in NTable I.p samples
30f percent s of
or, the
th thebycan c isbe implemented
when using efficiently
an overlap-savein(OLS)
VLSI and[9].thus
This has

20
to sub-frame boundaries, have to Abe evaluated cross- e reduced
op method

RX-power
135.0
reNPSS

be implemented

effortdepends
FFT units. One result ready low power consumption.
oof the

detector. consumption.

50
w inFTtime
re showncorrelating
the in Tablethe
transmitted, periodic I. received
30 percent
= 2,400 of the
and the e can
t
length d be
t
ea implemented
h t - efficiently in VLSI andforthus has a very

com-
c
c
method depends on

of
thus hasthea very
oNin NPSS

auto-correlation
s
samples in det correlation
the aretotal window IF method is wellaccelerator
established especially discrete convolutions
by the FFT and bothIFFT units. One
is NpFig. 5. result
Block ready
pdiagram for N pf ·plow
of sN· fNpower
f cross- consumption.
hardware

does
the NPSS detection domain samples. In

real
and
= 189

per
we assume that the low-complexity auto-correlation
m

by the FFT
FFT and sequence
IFFT units.
with the known One result
NPSS. ready o low
By cchoosing a g e powerN
, f = a32 n d consumption.
but it can also be applied to cross-correlations. By applying
parison, we assum

a fair
transmitted is correlations are required. r T
N for example
ave nt FF h N ,
tion approaches in In general
addition,results
correlations
in a totalare performed
number of 76,800 for
oi 189 every Nf fre- OLS
t cross-correlations of to the NPSS cross-correlation, the input stream is divided

6. com-

are
g
oss-correlationquency Key
offsetlength
approaches 189characteristics
hypothesis that need to be performed
fo , which
p
defines arele the shown
n
of every range of in
millisecond, Table
into
which I. 30sequences
overlapping percent of the
of length can be inimplemente
N as illustrated the

Energy per NPSS detection over RX-power


on its implementation. To make a fair com-
power consumption of the auto-correlation

shown
100
correlation algorithm does
area
frequency offsetsis
is occupied
detector shall by
impracticable
the for NB-IoT
the The
devices.
support. FFT minimum right part of Fig. 3. The number
and IFFT units. One result ready low power consum of overlapping samples

RX-power of RF-transceiver [mW]


and IFFT units.
mitted sequence is known, However the computational complexity can be significantly

efficiently in VLSI and thus has a very


frequency grid spacing is defined by the 240 kHz sampling depends on the NPSS length and is chosen to be NO = 188.
fact, cross-correlation de- reduced when using an overlap-and-save (OLS) methodAfterwards, [8]. the block-wise cross-correlation with the different

in
rate and the FFT and IFFT size which trades off computational
is the reason, why many This method is well established especially for discrete frequency-offset
con-

Table
candidates is performed, which can be done
GPS receivers complexity,
use a cross- memory
volution requirements,
but it can also be andapplied
processing delay for The
to cross-correlation.
in frequency domain. The main benefit of this method is that
. In this paper estimation
we focus onaccuracy. Larger FFT sizes ofwith smaller grid spac- if one

150
method is efficient in terms computational complexity

I.
Thus, the MLings improve
detector [6] offrequency offset
the sequences estimation
to be accuracy
cross-correlated butlong,
is very a cross-correlation
havewhile the in time domain is replaced by a point-

30
One result200ready
a longer
ved signal vector delay
onto each andonerequire
other is short.more
This memory.
is the case An FFT
in our size of
example wise
for which multiplication in frequency domain. Additionally, the

percent of the
1,024offsets,
different frequency results the
in areceived
grid spacing of 234 Hz
sample sequence which
(2,400 is sufficient
samples) generation
is much longer of the NPSS reference signals in frequency domain
on. than the NPSS sequence (189 samples).
for NPSS detection and was therefore chosen in this work. In gets simplified: Different frequency offsets relate to cyclic
addition, the width of the correlation peak of Fig. 2 allows to shifts which can be easily implemented in hardware.
take every fourth grid point only, while still covering 93% of
the peak amplitude. Nf trades off the minimal observed height IV. C OMPLEXITY AND P ERFORMANCE
of a correlation peak against computational complexity and
low 250
can be implemented efficiently in VLSI and thus has
As the proposed cross-correlation-based algorithm is an ML
memory size. It is a design parameter, which can be chosen detector, the complexity is expected to be significantly higher
power consumption.
to match the accuracy of the underlying crystal oscillator. compared to the low-complexity auto-correlation method. On
Choosing Nf = 31 leads to a frequency-offset range of average for each received block of size N − NO a single N -
31 · 4 · 234 Hz which allows to compensate ±14.5 kHz. point FFT, Nf point-wise complex multiplications of a vector
In every 10 ms frame we receive Ns = 2,400 samples and of length N , and Nf N -point IFFT operations need to be
the length of the NPSS in time domain is Np = 189 samples. performed. Choosing an FFT size of N = 1, 024 the number
In total Ns · Nf cross-correlations of length Np are required as of real additions and multiplications can be estimated to 135.0
shown in the left part of Fig. 3. Considering Nf = 31 different and 135.4 MOPS, respectively leading to an overall compu-
frequency candidates a total of 74,400 cross-correlations need tational complexity of 270.5 MOPS. Thus, the computational
to be performed every 10 ms, which is impractical for NB-IoT effort per sub-frame of the ML detector is roughly 10x higher
devices. than the auto-correlation timing acquisition [6].
1 TABLE I
NPSS D ETECTOR CHARACTERISTICS IN TWO CMOS TECHNOLOGIES .
0.9

0.8 CMOS technology SMIC 130 nm GF 28 nm


0.7 Synthesized Cell Area 3.34 mm2 0.22 mm2
0.6 Voltage 1.2 V 1.0 V
90% [5] @ -12.6 dB SNR kGE 735 600
CDF

0.5
est. PML 38 mW 2.5 mW
0.4

0.3 SNR [dB]


-12.6
0.2
50% [5] @ -12.6dB SNR -13.6
-14.6
0.1
-15.6 words, but the unaltered 188 overlap samples need to be stored
0 for the next FFT computation, as well. Secondly, during the
0 100 200 300 400 500 600 700 800 900 1000
tML [ms] FFT operation further inputs r[k] need to be stored in the
memory. Furthermore, a single-port RAM has been chosen,
Fig. 4. Timing acquisition latency of cross-correlation NPSS detector. which minimizes the storage area. The introduced memory-
bandwidth bottleneck limiting the throughput to one radix-2
operation every 4 clock cycles is tolerable due to the very low
The performance in terms of timing-acquisition latency is throughput requirements of the FFT.
shown in Fig. 4 for in-band deployment which has the most
demanding SNR requirement of -12.6 dB and beyond. For In contrast such an architecture would not be sufficient to
the simulations the TU1.2 channel model was used and the meet the throughput requirement of the IFFT. Here, the mem-
threshold was set to achieve a false-alarm rate of 1%. The ory bandwidth has to be 4× higher to support a throughput
OLS detector achieves a latency of 400 ms, whereas the auto- of one radix-2 operation every cycle. Thus, the memory in
correlation detector of [6] takes 620 ms to achieve a 90% hit the IFFT block is split into four banks each still being a
rate. The average detection latency is 140 ms which is roughly single-port RAM to minimize storage area. Memory access
a factor of two below the value of [6]. conflicts are avoided by assuring that every two subsequent
radix-2 operations do not access the same register banks. After
V. H ARDWARE I MPLEMENTATION processing the FFT, the correlations in frequency domain, and
A block diagram of the cross-correlation NPSS detector is the Nf = 31 IFFTs for each received block of length N the
shown in Fig. 5. The main computational elements are the FFT results are non-coherently combined with previous correlation
and IFFT blocks with a required throughput of 2.87/10 ms = results. The size of the memory holding the intermediate, non-
287.1/s and 890.0/s FFT and IFFT computations or 1.5 and coherently combined correlation results is reduced by down-
45.6 million radix-2 operations per second, respectively. Even sampling the correlation results by a factor of 2 as proposed
for the more demanding IFFT it is possible to reuse a single in [6]. After the processing of a sub-frame, a peak-detection is
radix-2 instance for all IFFT operations when assuming typical used to decide, whether the NPSS sequence was found. Rather
VLSI clock frequencies. So, for the FFT as well as for the than using a simple peak-to-average ratio an analysis of the
IFFT block a single radix-2 in-place architecture is sufficient. four largest correlation results is considered which improves
The FFT is designed to include a RAM holding 1,360 the detection probability when combining correlation results of
complex samples which is larger than N . The reason for multiple sub-frames. Also, the existence of side-peaks (Fig. 2)
this is two-fold: Firstly, the FFT operates on 1,024 complex requires a more sophisticated peak detection as a simple peak-
to-average ratio would lead to many false detections.
We implemented the detector in VHDL and performed syn-
1024 FFT
RAM 1,024x44
thesis experiments in SMIC130 and GF28 CMOS technology
shift-
RAM 1,360x44
RAM 1,024x44
PSS
LUT related
addr.
targeting a clock frequency of 62 MHz. The key characteristics
conv. of the detector are give in Table I.
RAM 256x54

RADIX-2
RADIX-2

address With Nf = 31 a correlation RAM with 334 kbit is required.


start
This is the largest memory in the design and occupies 54%
Control address
of the entire area. However since this memory is only used
Unit Correlation
Result
for NPSS detection it can be easily shared with other building
RAM blocks. The implementation also includes the fine frequency-
37,200x9
and timing-offset estimation as proposed in [6].
peak The power consumption of the detector was estimated by
detection
hit using Cadence R
tools from post-synthesis netlist and the value
change dump file to 38 mW (1.2V, TT, 25C) and 2.5 mW (1.0V,
Fig. 5. Block diagram of NPSS detector. TT, 25C) for the 130- and 28-nm technology, respectively.
VI. E NERGY E FFICIENCY VII. C ONCLUSION
The energy of timing acquisition is given by the power of The fact that the RF-transceiver dominates downlink power
the detector and the RF-transceiver in receive mode times the consumption in an NB-IoT device creates design space for
latency t. Given the energy of the ML approach and the auto- dedicated hardware implementations which can execute ex-
correlation (AC) approach for a certain RF-transceiver power haustive baseband algorithms. Following this guideline we
PRF we compute the savings according to have shown that the computationally complex ML approach
for NB-IoT timing acquisition can lead to significant energy
 
(PRF + PML )tML savings in NB-IoT devices. The savings were achieved by
∆E [%] = 100 1 − . the low latency of our detector which due to algorithmic
(PRF + PAC )tAC
transforms based on the OLS method and by targeting a
For the AC timing acquisition we account for a power of dedicated VLSI implementation shows a relatively low power
PAC = P10 ML
because the arithmetic load is about 10× below consumption. We were able to reduce the energy required for
the arithmetic load of the ML approach. However it shall be a single NPSS detection by 34% for 28 nm CMOS technology
denoted that this factor is dependent on the implementation. and from 9% up to 21% even in a rather mature 130 nm CMOS
In Fig. 6 the energy saving ∆E [%] per timing acquisition technology. Future research will address area reductions es-
is plotted over the power consumption of the RF-transceiver pecially by sharing memory resources with other hardware
PRF [W] for the latency of the ML detector (tML = 400 ms) building blocks.
and the AC detector (tAC = 620 ms) in [6].
R EFERENCES
40 [1] Wang, Y-P. Eric, et al. ”A Primer on 3GPP Narrowband Internet of
Things (NB-IoT).” arXiv preprint arXiv:1606.04171 (2016).
35 [2] Lin Zhong, Power Consumption by Wireless Communication, Lec-
ture ELEC518, 2011, http://www.ruf.rice.edu/∼mobile/elec518/lectures/
30 NB-IoT RF- 3-wireless.pdf
transciever [3] Intel Cooperation, R1-156524: On device complexity for NB-IoT,
region of 6.2.6.1, 3GPP TSG RAN WG1 Meeting Nr. 83, Anaheim, USA, 16-
25 interest
20 Nov., 2016
∆E [%]

[4] 3GPP TS 36.101 V14.0.0, Evolved Universal Terrestrial Radio Access


20
(E-UTRA); User Equipment (UE) radio transmission and reception, July
[11] [10]
2016
15 [5] 3GPP TS 36.211 V13.2.0, Evolved Universal Terrestrial Radio Access
(E-UTRA); Physical channels and modulation, June 2016
10 [6] Qualcomm Inc., R1-161981: NB-PSS and NB-SSS Design, 2.2.5, 3GPP
TSG RAN WG1 NB-IoT Ad-Hoc Meeting, Sophia Antipolis, France,
5 2.5mW, 28nm 22-24 March, 2016
38 mW, 130nm [7] John G. Proakis, ”Digital Communications”, McGraw-Hill series in elec-
0 trical and computer engineering : communications and signal processing,
0 0.05 0.1 0.15 0.2 0.25 (2001)
PRF [W] [8] Akopian, David. ”Fast FFT based GPS satellite acquisition methods.”
IEE Proceedings-Radar, Sonar and Navigation 152.4 (2005): 277-286.
Fig. 6. Energy savings per timing acquisition for the ML detector with 400 ms [9] Nussbaumer, Henri J. Fast Fourier transform and convolution algorithms.
latency over the auto-correlation detector 620 ms latency with different power Vol. 2. Springer Science & Business Media, 2012.
consumption values. [10] Wang, Andrew Y., and Charles G. Sodini. ”On the energy efficiency of
wireless transceivers.” 2006 IEEE International Conference on Commu-
nications. Vol. 8. IEEE, 2006.
Even though the AC detectors show a lower power con- [11] A. Mirzaie, A. Yazdi, Z. Zhou, E. Chang, P. Suri, and H. Darabi, ”A
sumption for NPSS detection (due to their reduced number 65 nm CMOS quad-band SAW-less receiver for GSM/GPRS/EDGE,” in
Symp. VLSI Circuits, 2010, pp. 179180.
of additions and multiplications) they do not improve overall [12] L. Sundstrm et al., ”A receiver for LTE Rel-11 and beyond supporting
energy efficiency because of higher latency. The dotted line non-contiguous carrier aggregation,” 2013 IEEE International Solid-
shows the maximum possible savings of 35.5%. State Circuits Conference, San Francisco, 2013, pp. 336-337.
The power consumption of RF-transceivers is dependent
on multiple factors whose analysis lie beyond the scope of
this paper, therefore we consider a broad range of values
for RF-transceiver power consumption. The grey rectangle in
Fig. 6 indicates the region of interest for NB-IoT dedicated
RF-transceivers which lies below the power consumption of
conventional LTE and GSM transceivers due to the simplifi-
cations made in NB-IoT. Power consumptions of state of the
art conventional LTE and GSM transceivers are indicated by
the vertical lines in Fig. 6 indicate the power consumption of
two reported RF-transceivers [11], [12].

Common questions

Powered by AI

Using an ML detector for NPSS detection involves higher complexity compared to a low-complexity auto-correlation method. The ML detector is advantageous due to its superior performance by effectively exploiting known sequences, thus offering higher accuracy and lower latency in detecting NPSS despite needing more computational operations (cross-correlations). In contrast, the auto-correlation method is simpler and consumes less power, but at the cost of suboptimal performance as it doesn't utilize the transmission knowledge effectively. Therefore, the choice between these methods involves a trade-off between performance (ML detector) and energy efficiency along with simplicity (auto-correlation).

In VLSI implementations, auto-correlation is more hardware efficient and consumes less power compared to cross-correlation. The auto-correlation method can be implemented with low complexity, resulting in very low power consumption, which is advantageous for devices like NB-IoT that require energy efficiency . However, cross-correlation methods, despite being more power-intensive, offer better detection performance due to their ability to utilize known sequences effectively .

Implementing cross-correlation using VLSI technology improves hardware efficiency by leveraging specialized architectures such as single-path delay feedback structures with radix-2 elements. This allows for significant reductions in power consumption and area usage — critical for applications in compact and energy-constrained environments like NB-IoT. By efficiently managing high-throughput operations of FFT and IFFT, VLSI can address the intensive computational demands while maintaining low power profiles .

Fast Fourier Transform (FFT) plays a crucial role in the cross-correlation NPSS detector by enabling efficient computation of correlations through transformation into the frequency domain. In the cross-correlation NPSS detection method, a single N-point FFT is performed for each received block, followed by point-wise complex multiplications and Nf IFFT operations. This is essential for reducing computational complexity and improving performance while handling frequency offsets and timing acquisition .

The choice of frequency offset candidates (Nf) directly impacts the computational effort required for NPSS detection using cross-correlation. A higher number of frequency offsets increases the number of cross-correlations that need to be computed, leading to a significant rise in computational complexity. For example, using Nf = 32 results in 76,800 cross-correlations every millisecond, which is challenging for NB-IoT devices. Therefore, optimizing Nf to balance between detection accuracy and computational feasibility is crucial .

In NPSS detection, block-wise cross-correlation with different frequency offset candidates involves dividing the input stream into overlapping sequences. Each block undergoes a cross-correlation process via point-wise multiplication in the frequency domain using FFT and IFFT transformations. Different frequency offsets are handled as cyclic shifts, simplifying hardware implementation and allowing efficient frequency domain signal processing. This approach reduces the task of time-domain cross-correlation to more manageable frequency-domain operations .

Cross-correlation is favored in applications like radar systems or GPS receivers because it acts as a maximum likelihood (ML) detector by effectively utilizing the known transmitted sequences, leading to optimal detection performance. While auto-correlation is more hardware efficient, it doesn't leverage the knowledge of the transmitted sequence, which can result in suboptimal detection. In signal detection applications where accuracy is critical, the enhanced performance of cross-correlation outweighs its increased complexity .

Implementing an ML detector for NPSS detection involves substantial computational requirements. Specifically, for each received block, a single N-point FFT is required, followed by Nf point-wise complex multiplications, and Nf IFFT operations. Additionally, the overall computational effort for ML detection is roughly 10 times higher than the low-complexity auto-correlation method. This includes operations like 135.0 and 135.4 million operations per second (MOPS) for additions and multiplications, respectively, leading to a total of 270.5 MOPS .

The overlap-save (OLS) method provides computational benefits by reducing the complexity of performing cross-correlations. In the context of NPSS detection, the OLS method divides the input stream into overlapping sequences, which allows a replacement of a time-domain cross-correlation with point-wise multiplication in the frequency domain. This method is efficient because it simplifies the generation of reference signals in the frequency domain and allows cyclic shifts to be easily implemented in hardware .

Design choices in NB-IoT applications concerning the efficiency of auto-correlation versus cross-correlation methods largely depend on the balance between power consumption and detection performance. Auto-correlation, with its low complexity and power efficiency, is suited for scenarios where energy saving is critical. However, for applications demanding high detection accuracy even in low SNR environments, cross-correlation is favored despite its greater power consumption and complexity. Thus, the trade-off between these methods dictates whether efficiency or performance takes priority in the design .

You might also like