NB IoT NPSS NSSS Acquisition ETHgroup PDF
NB IoT NPSS NSSS Acquisition ETHgroup PDF
{kroell,mkorb}@newacp.ch {weberbe,huang}@iis.ee.ethz.ch,
Abstract—Initial timing acquisition in narrow-band IoT (NB- bandwidth reduction to 200 kHz was the main simplification
IoT) devices is done by detecting a periodically transmitted of NB-IoT compared to the minimal bandwidth requirement of
known sequence. The detection has to be done at lowest possible
arXiv:1608.02427v2 [cs.NI] 7 Oct 2016
30
tones
20
11 OFDM
symbols 80 ms block
10
Auto-correlation approaches
as5. diagram of NPSS hardware
NP ols, croE nAthe area is
Nis occupied NB-IoT
by isdevices.nthe FFT and IFFT units. O
Fig.
N
sthe ethe
dliedbyasthe
in But, since auto-correlation algorithm does Naturally, the power as cons
narch].
is a viable )option for icNPSS detection. oo than NPSS osequence (189 samples).
2 like with the reduced when using an overlap-save (OLS) method [8]. This of not
-12.6 dB. many
The
tfdifferent frequency offset candidates performed,
T
cupied
aracteristics
,
N BASED
PSS detection.
ristics
Itoespecially
in case of by
established for discrete convolutions its overall computational per tim
m tishpoint-wise tlength tdepends
ystems or GPS receivers use a cross- volution but it can also be applied to cross-correlation. The
hown
the detection complexity because the spond to sub-frame boundaries, have to be evaluated
istde- Tc
III. Ais
ei Fiis o
sition, .performance
namely 5..the
auto-correlation y Cand cross-correlation. i While .samples
revery efrequency hypothesis which defines the
well.
b er h s the oma imumzMPLEMENTATION e - S
ORRELATION BASED stream
In IMING total
s divided
e into
cross-correlations overlapping
a-range of
asequence length
S of
are required. method N 135.4 The performance
MOPS, odiscrete
respectively
on its in terms leading
implementation. timing
to an acquisition
overall To latency
computation
make
option
eDWARE hFig. cdiagram
l r
N ·fact of frequency ,length offsets
N =the detector shall support.
in the following. r carea g occupied
c by the FFT and IFFT units. One ready low power consumption.
and ifare
rrelation NPSS detector
m
sy80 ems ti h frame i d x e r i n
VmW
10
Key
|
area g
voltage
isFioccupied
characteristics
w assumed, ranging
is There occupied
i s ,
are two i g
d main
are by i n
algorithms
shown
d
the a
rrmand
an
FFT
d
its as
toConsidering
in w
perform
performance
e
Table n
illustrated
and timing IFFT
Fig.
ifm isNI. Fin whichthe
facquisition
m 5.30
=
sub-optimal. units.
Block
31 t i
percent p
lNodepends
right
different can
Fig. fi
vfrequires
part
One
diagram
In l y
Hence,
fact, be of
frequency
5. done
Figure
resultBlock
VSberto.periodic
of ofthe
the h
cross-correlation NPSSnumber 3. in
candidates
ready
diagram
In pcana
The frequency
everyof
hardware ings
de- number
be
range
aig·total
low
of
sub-frame
cross-correlations NPSS
improve of
accelerator
implemented
reduced
of L domain.
power
frequency
we
when shown
complexity
hardware
receive
in i
usingt in The
offsets
consumption.
time-domain
frequency Nu
an Figure
of i the
=270.5 main
accelerator
efficiently 2,400
soverlap-and-save
detector
4
offset
n method
gfor
MOPS.samplesbenefit
shall
in-band Thus,
estimation
in (OLS) depends
support.
andthe
VLSI BW: thesettings
method in-band
computational
and on
Choosing
accuracy
[8]. thusitsused
is imple
mention
aneffort
but
has fo
FFT
5. Block
a e lof low-latency r rather
auto-correlation tectors than are complexity.
ML e isn the
detectors Thus, y[6].the
only ML
This detector
option
is the [6]
reason, theof why the l[6].
sequences
transmitted,
many
e eperform
This to bemethod T
cross-correlated is well is
established very
hthe long,especially
p M
while the x
for discrete con- iNthat parison, welow-complex
assume
consumption shown the
i following. s auto-correlation
u ecross-correlation. qofreceive
domain
length is
of the NPSS samples.
in time In domaintotal is cross- samples. additions and multiplications can be estimate
e
tim he m ion f lotted ing
r to sequence (
o
is unknown,
heapplied
in case
Ftasreceived
alike of thew
t
NPSS
c and
ncharacteristics oreo
detection
p
both
i ndetectors
u By uone
applying N =
Additionally,
imethod OLS 189
the but
fm
NPSS the it can N
generationalso
cross-correlation, NN
parison,
be ·= applied
N 189
of ethe
lthe towe input cassume
cross-correlations.
NPSS ism reference
acquisition. thecompared low-complexity
signals in lies actually
tothat differeauto
belo
f time {area by the FFT p There IFFT units.
a
arecross-correlations
two main One
algorithms
isresult
to
n ready timing o low
acquisition power
x Hence,
consumption. the number offirst cross-correlations in time-domain
p p s f
consumption
p e 80
n in receive hmode l
C 244),
is which
cannamely t uthe Key Key etransmitted
aer approaches
characteristics etheIninsthis
overlapping
74,400
sgtoapaper
are s are samples
shown
While m ashown
e auto-
in tneed in
iGPS
Table isto Table
on the
I.
performed I.
NPSS
30 30
operations.
percent percent
length
inevery In32and
millisec-ofeevery of
is
the the
p
sub-frame
here iscan
sub-frame for acan
we the
be the be i
implemented
ML implemented
time, =remove
detector itdiscrete
and
efficientlyefficiently
just
to themention in in
the
VLS SN V
sispd
i l u o tectors are ML
sbetheaisntradar of this c This Pthe
o
.31into
N reason,
is ·itN
c length
that why
a0 of
many
cross-correlation h NPSS
This method in time well domain
inestablished
time t especially
parison,
domain samples.
for we
is assume
average additions
con-
parison, and
ratio the mu o w
ntdetection
(Page which projects signal vector onto each other short. This is the case our example for which 189
o
t algorithms applications
obe a radar
othe systems
dofifare
GPS
sequence receivers is use
hfocus
correlations cross- are volution
required.
=tsequence
but By can
choosing also be applied
trequires for to cross-correlation.
example The
the FFT
t ea result 1189 m fgoperations.
th consider i f rdevices.
q
eoption
p
ram of IFFT
cross-correlation NPSS e nisi t[10]. dimpractical N
aO N
of detector area isNoccupied byunknown, isthe and IFFT units. One ready low gpower consumption.
T IMING diagram
T ort 1 s p ord sequence e eis for namely
Key
iauto-correlation stream
teboth
characteristics and
ycross-correlation.
dividedalgorithms isctare oofoverlapping can
shown
While auto-
sin
sequences Table cointo of I. length Nncan
i30 percent ywhich The of every
Nisthe
performance sub-frame
can ininwe
be
terms receive
implemented
ofSNR timing acqui
f
on in a of the ycorrelation laasignal ,· computational N
tof
the thRF-transceiver tlas
y shown in Figure
in
Key rocharacteristics shown in Table I. 30 percent of the can be implemented efficiently
IFFT
the only
h
option transmitted, periodic
l
s
rsnumber
f
fothe
2,400 samples ytotal and the length p of pthe NPSS N in
iand time VLSI and thus
and the I a single-path- 0 rg mt o known tof theh receiver. pdetection
n frequency domain l189 gets fsimplified: Different frequency offsets
applications
chosen like 188. systems
Afterwards, or N
M receivers
block-wise use aterms
cross-
cross-correlation volution but deployment also ifbe applied has to
the cross-correlation.
most demanding The As
requireme can
respm
ist fzo- i acc attheiedigital
possible
of 80
candidates for enunknown,
area with different
is t
occupied ond,
n o e
which
general
by issthe
a
the
we
results
FFT received
r in
and t
on total for
sample
emethod
iIFFT NB-IoT efficient
units. 76,800In
(2,400
h in
One samples)
N of
cross-correlations
result NN is -=total
cross-correlations
much
ready timing
longer
m acomplexity
of acquisition
xNterms
low length
one
power [9] ps roughly
are
consumption.
i required. 10x higher. 135.4 MOPS,
the NPSS
bfor
o e l
t e n i
the shown
r ethan ncase Considering
n different frequency candidates total of complexity of 270.5 MOPS. Thus, the computa
(VLSI)s
o n area
o is r occupied
,dthbyasreceived
fi o
correlation by
oKey the h Ois FFT
characteristics Key
only , and IFFT5
characteristics
.e N ifare
by = units.
a shown
t point-wise
transmitted, Fig.
gOne in
stream 5.
are result
Table
periodic i Block
is multiplication
divided
shown I. ready
sdiagram
w 30 percent
in
2,400 t low
of
overlapping
Table
samples in NPSS
l eof power
the
frequency
I. hardware
sequences
30
the consumption.
can
percent
length domain. be
of accelerator
implemented
length
the
of
NPSS true?
the
N iniftime Thethe
can efficient
plot
performanc
be ims
transmitted
o as illustrated in the
e right part of Figure o 3. The number of shown in Figure 4 for in-band BW: in-ban
RE e f MPLEMENTATION
d
, includes baseband t t sequence
t is a in
aiw
the
h sequence NPSS ntand detection
ea2each
both
c0isprelate domain is uneed w samples.
that In
fthe peak-to-average cross-
npower y ratio ispower not reliable obM
e ast reshown in Figure 6. Thus
processing for cell-search.
l ldomain =ptowe
h c e Soccupied
correlation for signal detection [10]. In this paper we focus on method is efficient · in· of computational complexity one
a i N =
risN N N
cy m icomplexity.
is a viable option
be applied NPSS
racas
0the detection. transmitted than
2is the NPSS sequencedOne (189 samples).
coknown to the receiver.
o low-latency rather
t occupied complexity.
s1the with Thus, the the ML
different detector frequency [6] of offset
the sequences candidates to be is performed,
cross-correlated of
very -12.6
long, dB. while The the different cdiagram curves relate to different thresho
,
p onto an eNB-IoT o
t a
are moret -1 hardware efficient than cross-correlation
nHowever,
approaches
ioisdoes
tearea
the length computational
189 that
by
need complexity
to
thebe
FFT
performed
ocyclic can
Consideringoon be
andevery significantly
IFFT
millisecond,
punits.
different
which AsOne
Fig. 5. will
frequencyresult
Block seecandidates in
ready the following,
a ·total
low
of of
NPSS the ML
complexity
hardware detector
consump of 270
toacc do
Table
et shion area ns be
-0.5 is
ilfprojects
ming aframe.
y C FFT IFFT
P74,400 c7caseunits. result ready low power
esequences consumption.
p
n
N =the 31 p s f
e algorithms
s fse hinav[9]. m,areduced
rand lillustrated
o e t. the linevector to shifts e[8]. which ccan
pm be easily implemented onin
-2 f o -1.5 0.5 1.5
ofisd
hlm
tchyenormalized dodwith m latency
ll end me(VLSI)
cross-correlations to be performed every
)= happlied sequence
isignal unknown, in of Lfor the NPSS detection
as lThe both in right omillisec-
part of samples.
Figuresub-frame In
3. total
The the p ML
number Nsdetector
· of f cross-showncompared in Figure the p
of c theIRF-transceiver ehit-detection
N= 189 N N
at ffs iowhich n histhe eN
the FFT
theaauto-correlation correlations sthe are required. By choosing for example
5. For elements
the FFT was and the IFFT aurasingle-path-
can the
dtransmitted fiausing
overlapping
sequence samples depends othe the NPSS length and is here tNPSS for the first time, remove itactual
and justthema
transmitted,
fsince
ofshown rehn
low-latency
oarea eis
rather
occupied than
u
fwhich
isby the Thus,
FFT ML detector
IFFT tunits.
[6] of tthe
One result be ascross-correlated
ready low powervery long, while
consumption. the
32
t Additionally, the generation of the NPSS reference signals in to different SNR
V. tH lfatthe ihardware foverlapping nlength
(Page 244), onto other one is short. This is case in our f example for which
adix-2 g r to ge s o L fa
chosen ffi , t̂ o
r c as
ARDWARE
s,Auto-correlation
But,
r esequence algorithm
which
dm pcan when
M area
be
f is impracticable
done
MPLEMENTATION s
e in
an occupied M
asfrequency
overlap-save NB-IoT
]signalin a by
domain. (OLS) devices.
74,400 method sFFTmain
cross-correlations
the andrThis
obenefit
average IFFT settings
-isnot need
powerneed units.
to used
onbelonger
many laOne
for
performed
is sub-frames
ofcomputed oinresult
every as ready
based
=millisec-
onlythe low-complexity
over the
sub-framelow
afor
10xshort powe
of peak-
detect
ACQUISITION
nefact e r
u i
sul to 250 mW for theincludes the digital approaches are general more correlations are required. By choosing for example
e itknownnthe e t w p t i s
oAuto-correlation algorithms
u v
m 4 can be ond, y n
applied e the
is188.results transmitted
impractical
9 e
total for sequence
NB-IoT
number a of devices.f
samples
76,800 189 w depends
cross-correlations the timing caseacquisition
t [9]
32
and isthe
is roughly here higher.
the first
d e n. a range from
t
pe mW
e to receiver. approaches in general f n R
units.
are In
n r p - (Page 244),
i . which projects
chosen the
to h be hardware.
received [ Afterwards, h vector onto
the each
block-wise other one
o
cross-correlation short. This is the
deployment our
whichf example
has for which
most takes
demanding 620 S
i
e s i i n
ssu consider
of possible candidates different frequency t
offsets, the received sample sequence (2,400 samples) is much
ep awe
r s
and74,400
i f̂
50 m not r exploit s
RX-power
the a arecomoreahardwarethat the transmitted p r e is known,
d o However the
Key
computational t
characteristicscomplexity cane be
are l
significantly ashown o
in ofTable I. 30 percent ofof the c
N e
l er o
d sarer ebut of this 5 method is
o.applied
that a cross-correlation in time domain is average ratio of e all cross N correlations BW: is this rea
f cross-correlation NPSS detector
p
sam n ord ndar les.
y i
a was dis
M
e
(
10 t i
nt msuframe,
e d a
nits performance l e
plo2,400 - l
w onisisatime-domain
efficient Nsthan
viable f
is option
o s u
dfor n
efficient
a pfact,
lelcross-correlation asamples
NPSS m .
ti lcross-correlation
peethe
than known
of e method
aeauto-correlation
sdetection. c
the 3 to e
1tiitmfthardware
cross-correlation
N
is
emoption
the
possible
prrby
aapproaches.
well x
w
caireduced
receiver.
e
ldalso
established
frequency
enNPSS
approaches
candidates
with f
Auto-correlation
However, thewhen
itibe
with
different theespecially
n domain
omultiplication
But,
length
thandifferent the
approaches
computationaln NPSS
189
ian
since
frequency
for
that
rainm
frequency
discrete
ond,
gets
need e
chosen
the
sequenceoffsetto which
incomplexity
general
naof
be
offsets,
convolutions
simplified:
to lperformed
(189be
candidates
is188. impractical
results
can
samples.
thKey
the ecomputational
samples). be
every
Afterwards,
received
Hence,
inDifferent total
significantlyfor number
millisecond,
An
is characteristics
sample
performed,
its
NB-IoT
the
FFT t i
itoplotoverall
frequency
scomplexity
sequence
ioffset
which
block-wiseof
devices.
m
size
76,800
As computational
(2,400
-12.6 we of
189
Soffsets
will
cross-correlation
are
edB. 1,024
samples)
The shown is
effort
cross-correlations
see in the
results
t different
much
different
per
As
inwhich
timingtiming
following, can
deployment
longer
curves in relate
Table
acquisiti
acquisition
be the
a correspo
grid see
which
I.toThe ML
30sesd
dif
detection
rchitecture with radix-2 N
elements chosen t 3 0 r
ere e bo tectors sMaLmare ti asdetectors
sub-optimal.
the rins[6]. ald In
sBut, inissince b areplaced
more
e n can
e
de-
t o
reduced
point-wise
be
efficient swhen
than to using cross-correlations.
e is
cross-correlation overlap-and-save
fshifts iothe
frequency
However,
approaches (OLS) domain.
the length method 189 lies[8].
true?
that uThis
the
actually
need bei(189 snot
belowsuggests
performed can h10x be that
every that ntheof
emillisecond,
significantly the low-complexity
of As
curves we will timi
I.
ces o o tunits.
r the[Todo, writethe about aFFT
d n iML h[9]. w . h -[9]. m ito area
isissusing
e impracticable -
occupied for t NB-IoT
by f devices.
the FFT q and IFFT One result ready l
hardware.
relate to cyclic shifts which can be easily implemented in latency of 480 ms, whereas the auto-correlation detector of [9]
frequency domain gets simplified: Different frequency offsets
Additionally, the generation of the NPSS reference signals in to different SNRs.
replaced by a point-wise multiplication in frequency domain. true? the plot suggests that the different curves correspond
of this method is that a cross-correlation in time domain is average ratio of all cross correlations BW: is this really
which can be done in frequency domain. The main benefit settings used for hit-detection based on the actual peak-to-
with the different
chosen
overlapping
but it can
method
reduceddoes
ond,
Considering
length
shown algorithm does
pti
e ofethecross-correlation is viable o for detection. an overlap-save (OLS) than method
the NPSS [8]. sequence need
samples). as ismany sub-frames as the low-com
Figure 6. is we consider a range
m a
sequence
relate to cyclic which can be easily implemented in
shown
with different frequency candidates m performed, -12.6 dB.
we focusisondivided
latency of 480 m
aNtheinPrpeStransmitted
w n s c
ne hathetio9fact bl-ep
i f d p d u b a o
yeddomain u
A block
as a
diagram
Zadoff-Chu
o
ONPSS
. shown indetector
o i craw d
detector
p f d This s a the reason,
Sem
why smany r This
which methodcan well
done established
in frequency t especially domain. for discrete
The main con- benefit settings n used e for hit-detection based on the
atm
e
inOne
a
IFFTcross-correlations
d f exploit q isbutthe susequence of 234 Hz which isdomain. sufficient for d NPSS detection an
FFT and o
b sam the
IFFT single-path-
eth onabout, w plof m umRF-transceiver d i
e I as
m 2.iveapplications aauto-correlation p onormalized
e0ctsesystems
s,GPS
noralgorithm as
fdoes
shown
oScfocus
Additionally,
By ocross-
in not
applying
um
But, the since
OLS generation
is cknown, the
llit can
the of
fact
auto-correlation
NPSS the NPSS
that IV. theChtoaOMPLEXITY
ucomputational
algorithm
cross-correlation, reference
reduced area
signals
tcross-correlation.
doeswhen the is input
using isnin
impracticable
in anoccupied
to different
acquisition.
overlap-save
AND for
aPnisERFORMANCE
NB-IoT SNRs. by
t(OLS) the
io complexity
devices. methodiroverall
FFT o4,lbenefit
[8]. and This IFFT not need units. as permany O
is known,
hy160 mureceivers
in general
e n function
fraec
However, the
In hardware
Soffset. g can
30
aFig. hML notover exploit that frequency
l sequence However the pespecially complexity
total Nsin
isn p
r method established forbe discrete
i convolutions -bcan Hence, its computational effort tim
illustrated
by ub-fr reeceof tcorrelation t like radar use
u gvolution isyalso be appliedwhich can done frequency
The The main hof settings used for
periodic
f given 8o[10]. qthis oInN s s pisthe hardware. wefficient n sIn takes 620 ms to
l l 1samples -sub-optimal. e -
and
0 etime-domain n
ble time ressources [Todo, write e not exploit fact that the transmitted isfrequency However the computational can be significantly
carrier by o e
offsets,
o frequency domain gets simplified: Different offsets
Tableresult
As from Figure the OLS detector achieve
n e
. s e s re tThus, b N
l stream divided
n into a
overlapping method an of length
istherefore
well established The
i especially performance for r
[8].discrete terms convolutions timing acquisition
Hence, latency
itsratio
overall
which is results
0 r -isn m r o
every sub-frame
a n S its performance ffact, mcross-correlation de- reduced when using overlap-and-save N chosen (OLS) method in this work. addition, the width
m f d 0 r
heim asw i c
yi hsaedetectors
r
r
-f 19. ning on, e a
s b
He to s thcehem NPSlow-latency s o transmitted
f for
r trather signal
,4 sthan r
detection
=sequence
rrML u In iso[6].
oftknown,
paper · we
operformance a itstoreason, performance
on f but
method it ecan is also ais
m besub-optimal.
in applied
terms of to x
cross-correlations.
computational
is 0
of this method complexity isi
t of that if a one
: the cross-correlation m lies actuallyin timehin-bandbelow
domain t
- BW: 10x is that averageof the low-com of
Fig. 3. Computation
l cross-correlation t tplot
nit IFFT a single-path- of the RF-transceiv
he2tectors o isreplaced R algorithm
ris e 5. Ta uthe -copare xthe mcyclic w by As
point-wise lthe beproposed multiplication cross-correlation-based
inreducedfrequency domain.
stin true? the is
suggests an ML
that theactually is A
different bloc cu
shown inwas Figure im For con the FFTin and eakfor
is[6]
bothIFFT
ncyrasteoIn t·olthe its as sub-optimal. In fact,
the1 replaced
de- when tousing ofan480 overlap-and-save (OLS) method [8].
vaeN sorfceocoverlapping
eillustrated flong,
of
percent
sThis
to be 188.
ontML abemethod
relate shifts which can easily 3. itimplemented inwhile
BW N
160 latency ms, 4whereas the auto-correlation detector of
cture with radix-2 elements pachosen
tlike f es asystems
This in
swhy thesequences
many right part
This of toFigure
well
but The
established
can number
also beespecially
applied shown for discrete Figure
cross-correlations. con- for in-band
lies mention belo
is well established
d[6].adepends isrgeneration aoThis isjuwell lytake
de-alsoreduced
i Npecomplexity. rdetector 2nespecially
[6] of
h
when
n N SFig.
the begi addit se t r- ts =fact, oss projects b r iML of the to
veOLS thecross-correlated is by very acquisition.
units.
ond latitniogn nown que(Page
s Ahsamples By applying wThis NPSS cross-correlation,
correlation
sexample point-wise the
peak input
multiplication of inefrequency allows p90% domain.
itto land true? every the plot fourt sug
f oradar
is correlations
l e r140
applicationscross-correlation a m e l lvector
detectors
tectors GPS aretireceivers are MLAdditionally,
icdetectorsuse a eone detectors
cross- o the
volution [4]. l o reason,
but itNPSScanof is
why the
also manyNPSS
be applied reference
d
method
to signals
d
cross-correlation. in established
g toThe e rtime,
different
laSNRs. e adiscrete con-
the
e i m
· NNTable
u
e s
(1) rrepuFor taamany
e t ndetection itsdomain adeployment
area
e s N
hne = 0In. . . 10 ca ta k
T oML N
e Nrthe fin correlation
t henfor gacorrelation dogd o m frequency3 applications
orchosen r In this w begiven
like stream
radar
frequency systems is or GPS into
bsequence
receivers
gets overlapping
simplified: Additionally,
use a eDifferent
tsamples) sequences ais much
the
volution generation
frequency ofbut length
it of
ndinofterms
can
offsets the
N
also NPSS
be isequences
difshown The
applied performance
reference
tinu70% to signals
cross-correlation.
slength in terms
in
nFigure to
The of timing
different acqui
SNRs
N
opossible ,are tislonger As has can beofseen from 4, the OLS det
wn radix-2 c bcomputational
I.ffs30
g ispoint, whilei still covering c of the peak amplitud
n a
IFF
-
delay
e t ing. et be er po e the feedback architecture
pow with elements was chosen s p r d o , a
than the
the received
other one depends
methodinto
isusing
length
domain
in the
l n i n t N
n
eletforhey244), eadetection. e n e
impractical
e ce a viable
n e
caInd
t - ncot 140 alofsequence
ucompared toesamples).
the low-complexity
iperformed, ethe twhile
auto-correlation method. s On the FFT
NPSS
as illustrated in the right part of Figure 3. The number Figure 4 for in-band BW: in-ban
ofm-12.6
correlation for signal [10]. In this paper frequency
we focus on domain method gets simplified:
efficient Different of computational frequency offsets
complexity if one As can be seen
stk. rm
b omaoption tio low-latency v160 s with cthe
be applied
ery arfreeqetuorre nisuem tCelynOMPLEXITY elong,
eres rocnf(Page
f
oinvthe e different
Key
accelerator
N IV.relate toNPSS cyclic shifts P(189
which can isbe easily is e
implemented in V. H (VLSI) I MPLEMENTATION
ros received
latency 3.of 480 ms, whereas the auto-correlatio
nrather
dw egthan N complexity.
f
aeThus, the ML detector [6] the sequences to be cross-correlated very long, the rtime,
o
However theespecially
impracticable
AND ERFORMANCE
forto tecan hARDWARE is sThe an ML A block diag
computational
= units.
it dhe
NPSS than c this dthe iimplemented rachieve
cross-correlations
s aft the different frequency
.offset candidates dB. The
there curves arelate to different thresho
timing-offset
One result
nML
thesequences
l
= ready
d inprojects
escross-correlation 9 c the
low-latency for rather
cbeaoverlapping
fisignal detection
complexity.
nThus, [9].
onethe isaIn detector [6] m of tour
inssequences lof to be isfirst
very inwhile the
cto
Afterwards,
relate
d beca wiFtihg. 3 or for ervdw
w s a o op t l co
u which i(Page hthe
signal vector hardware. onto .cross-correlation-based
8e]offsets, ain
each
averageother
for short.
each This is
rcyclic
receivedtheeexpected case shifts
block which
rAebe
example fornbe
size which takes easily 620 ms single aon N-
90% latency
athe hit rate. 480 ms,
31percent
of
e n feew n lN o 4 o samples
frequency
s n N N 2.87/10m
isanoccupied
r n T m o W s s
lo i
e m a 8 m As proposed s i algorithm is an ML diagram the cross-correlation NPSS detec
f e r done
[ frequency domain. The
overlapping main benefit
samples settings
depends used
on the for NPSShit-detection inelength based
and is here actual
for the peak-
first
189characteristics
r
due to fthe th available time are ressources [Todo, write about FFT
2,400 samples
y p o detector, the complexity is
, to significantly higher is shown in Fig
Ta
e o h i 0 eB . g n t m s e
in a total
c n t a l t t i o u
120 o h s 1 yi . 244), which s projects
chosen i thebe
to received
i 188. signal
o
Afterwards, vector
e onto
the e each
block-wise other one is
cross-correlation short. This is the
B
deployment case Oour
which example has for
the which
most demanding S
is efficient
n s · el mpt f5re9. ris ad
t s of mthe
eyy optionf possible candidates s
r with ] different i gnmethod frequency d l - s the received
f sample
r hardware. sequence computational (2,400 samples)e is dmuch
complexity longer vand memory v takes
size. 620 By ms toch ac
isdifferent
a h e e q c r ]
p .rr3
e
2,400
right part
u y m i e e e
when to
✓+137
paper a we r focus 0 on low latency 8 rather than low complexity.
r v
NPSS sequence
e n , a X e s o
in
t t f
l n m
er SS rba-nfarcacencSe inNC(r
g i (VLSI)
N [ obuetfio eth with gulow-complexity ns of t-12.6
we
s ,i8c0 efor ris performed, as multiplications cu dB. [9FFT
f
qu not i o ve detection. detector,
of this the complexity iscocthe qhuedetection.
that a is
point pCeto
expected
cross-correlation FFT, to be in
chosen tipoint-wise
significantly
time to be th samples).
domain 188.higher complex
is
Afterwards, is
average shown the in
ratio Figure
block-wise of all 5. The
cross
cross-correlationmain
of
correlations computational
achvector BW:
deployment iselements
tations this which or
rea a
lex t opti e
is short. This
of the possible candidates with different frequency offsets, the received
uthe sample sequence (2,400 samples) is much longer
the
pviable beNPSS Nsequence 160
I. 30 percent
n a i
N
o dy l I B v IV. P(189 V. H Irelate
that needcomplexity
F d different than thefrequencyNPSS offset candidates c b o The different curves to dif
NpFig.
e s e t .n e e 6 d
⇤
l compared OMPLEXITY
the
F tfAND s
ERFORMANCE
auto-correlation i method. On
ARDWARE
a the f and
MPLEMENT
IF
the received
e
overlap-save
j2⇡f
a . n o (2)
overlapping
p o 8 m e a e s a t t o
c | n
are
s ✓, n f ) r = d 7 r[k]s [k]e ,
time
c P t n
t F e g a o This allows to cover a frequency offset ran
samplethe
S c b d 140 h compared to the low-complexity auto-correlation r method. On n the FFT a and IFFT blocks r
with a required throughput
a ofN Nedrywa e Nquirbrolse2s,40eriffeF.p O
1,is1, o1, 1], 120 h h
need
u c 1 a true? the plot suggests that the different curves correspo
ti r H Thus, the ML detector [4]is areplaced (Page
viable option by a
244), e point-wise
for NPSS which multiplication
projects in the
withfrequency the N different domain.
than
n= 31 NPSS
frequency sequenceoffset (189
candidates samples). is performed, r of -12.6 dB. The d
ei t l e
receive
c ) IV. CThe P ERFORMANCE V.onH ARD
m
arch]. e c s P s - e f m c v r T n fi o o
cen lt rea
o
r = ltqtui Ior T riftoy
OMPLEXITY AND
for number
s e 0 o r s e itcyon Additionally,
. S ispcueach
which theT can e ofblock
be length
done in frequency ,N-POINT
and
cAshNthe
domain. f
rbeIFFT ti inoperations
main eNML
benefit need
settings to used be ctof
forperformed. to2.87/10ms
hit-detection
andrespective
based the
ris y the
tthe .generation ea N Nafblock ran edomain.
n c y off
using
As nproposed cross-correlation-based algorithm isrdifferent A ablock diagram of the cross-correlation
One
L dfor fkHz.
= 189
s em average
nthis for each received eFFT c of size single N- cML
computational
e p
ac isit rseanrseereceived u d - e average received n of h size i
-POINT fsignals
single N- o f
samples N e
and e FFTsettings IFFT = for2
comp
required.
c O o ofis the NPSS reference
which can e
done in to frequency i SNRs. The t main benefit useddiagra
for NB-IoT
r e e x
e i ) x i o t N
h f t 2.87/10ms = 287.1/s 890.0/s
t
r ( i
to be
r b w c
cross-correlations.
o d r e t d e
mI T I heva itiof ot-h n· rNT e B signal e evector S prletonto e each r of irnredomain the if possible r frequency 31 · 4 · 234 ⇡ 29.0 O
average ratio
de4,radix-2
all cross correlations BW
wa Ursequence
i n 100 k=✓ p v of method that cross-correlation O
proposed in
evector time domain
cross-correlation-based is algorithm is an A block
e f =03n1n lafdetector, d method
N
nNum tombpel oT(OdL c120 ohecomplex
t enc iswexpected
offset (189
inda
N
e
NB-IoTofdevices.
O e point foeFFT, the complexity toofbe significantly higher isand shown in Figure 5. The main computation
in termssequences
adoffsets
t
domain
of the ML
ecsubframe. oinss multiplications
toready
N. s 3=the mc sav lfrequency ty fgetssehby folow-complexity nt multiplication -in ecan
frequency
0 p esu
point-wise multiplications e auto-correlation tations or be million operations per secon
rdThis g aI SPSIaSncqguthis oautN aiu n
simplified: Nf Different frequency As 1.5seen from Figure the OLS detector achieves
of
low
can
parison,
method
of this itIn is that a iscross-correlation Sthe time ndomain average ratio of
5. result
tFFT, istations
rac ed b
xipoint ebepoint-wise complex thexpected of vector or an
N 45.6
ftaolrsignal
on isthetheNPSS
sequence
u
i ieoonns atlocandidates, ly, length h-ctooNPSS N Thtoimplemented
140
to be performed
ait 1.5
ofcross-correlated
dois-annd-soption omore
true? beL plot suggests thatis the different cu
req dco vector t100 hp,lereplaced torathe ehpoint-wise detector, the
hby frequency
complexity
crlow-complexity
domain. to significantly higher shown in Figur
be performed
T -Ieis naalrviable for detection.
N
a tcyclic m O N
block-wise
. 3 ne r n scwhere received
t f e . r a a
and
of and h c operations need be performed. respectively. 480Even for the demanding IFFT is possib
f h i l n s g shifts which can easily replaced althe point-wise latency multiplication ofinauto-correlation ms, whereas
frequency
l the auto-correlation
domain. true? detector
the plot of
sug
samples.
N N i y
Q
a e F m
O a i e v s o h o h e r
offset hypothesis
Naturally,
of length
d l t a B o w B e
M symbols each consisting l e a e n h NOt atakes
the
s o l
o o
Additionally, the o generation of f
the NPSS reference signals to different
s = 2,400
C N w a c o o i o m raverage of length , and compared IFFT
of sizeaoperations
to need to h be performed.
r method. On respectively.
the FFT oand Eve IFF
Blockcandidates
h
f
S e l
c upi n s i d
i N p e
A g e l b F d . r l i a n -N o f t p t r l o , forw each N
r received Cyclic N
block N-POINT
-POINT single
FFT N- t r andto890.0/s t FFT an
samples
t i
I s
a Nthe 2.87/10ms =90% 287.1/s
p
candidates
S k G of min l hi o e l a d gI s r e t
o rpthe NPSS t l a c c e g e 6 d length s of the NPSS in time domain is c sa
By choosing
t r
t e aerdoth si-tcc upltrs 8u9tat leerl pcuial ovrre hedorT c n i y a e l hardware. r f
t 620 ,ms to o
achieve a hit rate.
for discrete
o a For ML detector correlation metrics
s e are given Additionally,
by generation of the NPSS reference signals in NN different SNRs
by
c n a r n . = 189
(OLS)
l b g f e
be
i o r i t o l l e s
power
f i c
. Wsnh psTidn
4
the detector
P Icopy i a a t e
of computational
o r h f e o a o frequency o
f FFT,loNdf point-wise domain
-12 swhich
gets simplified: Different frequency offsets t
le s. O n sfrequency omultiplications o- teor
the length
o average for each received block of size As can be seen
a single from N-OFigure 4, the OLS
reseen det
l. None loI of 80 t
r s oion urmy l point ge implemented N-POINT-POINT FFT p
N
y ti NPSS. N N I millionAs 2.87/10ms e be = 28
is
ng the b sn ucomplex gets of a vector
urf e cross-correlations ttations . 120 1.5 and 45.6 radix-2
aoperat
T M rafm
K s occ
low-latency
npdClelength Shift ATthe
O
of 5copies b B e
orm ationreitItedi,n ntioan 0Tceruois is riem =1p[r[0]CC bvr[1] CCm n o issCC -c1]] ied CC e at es nv80 io FFT, gtotal
udomain simplified: in u Different frequency ms,a Toffsets Ncan
(2,400cross-correlation
.e. . r[137 toh cyclic of AND beIn
s r hm
t acatnicao he ecsop ingoass-cstabol s100
e IV.
ioucn N , and N of length ors are req
complexity
aamhardware. atcan
samples).
eV.a be
case inlength
a it
E. D diag pplencput✓+137
relate shifts
NfnPgIFFT point easily
scyclicN f point-wise
to N ·igN complex Hlatency
multiplications ra ofimplemented 480 of whereas
vector Stations
the
nptrate.
1.5 and
fIF
N
of roperations need besperformed. respectively. Even
(VLSI) for more
I MPLEMENTATION
S90% demanding
the
implemented
a
of
T oms,
canare
OMPLEXITY ERFORMANCE ARDWARE
n o o h h t i t F t
m eFteTF e
r i
ready
Indiagram
-POINT
t inP
11 n to 128 symbols, m relate to shifts which hcan i easily
lowevery
canpconsumption.
method
depends
c
g a t x . Eachieve latency of 480
every sub-frame
e l F r T a t o t
p = 189 samples.
gB.A ck s to p -coirrcestranosw C u X e l , s h t
we shall
c c h n r er ed us cr ll c be eq co . te As nuthe
t d g s e l
ai
e m e of length , and IFFT operations takes need 620 to ms be to
performed. N a m hit
respectively. Even
in u 0 l e e m M
in thefractional-frequency
S i u
method
n c e g d g p N
Considering N s different frequency candidates a -t
Nthe
I 4 e i ra hl p r r e s R o N a = 31 E l e
F , ) a p
f
n r
samples
NO a FFT
Naturally,
wn FFT
d j2⇡f f
be
, i n t (2) r p
every
q n CC s CC i CC S o CC e e m e 0 o
N-
o
ofhed240 60
sw kHz Pn aland gmss[k] orr S kSin= sm0s-pthe t expected av eto? be n cross-correlationsheis shown 9 in FigureIneed m ma
FFT[8].and
136
our example
n n f ehigher M a
of length
u to be o every o
of
a detector, is n a 5. V. TheH l
main n
computational elements
arethe
d N
f r e tT
be implemented
i i C OMPLEXITY e
convolutions
N
i r
power
e
inmaittime
n t g c AND ERFORMANCE
w ) ARDWARE r c
significantly
assume
e samples d e l r - s a n e s e o
sho and I
n i H s p i P o e 100 n
a vise an S IA block r with ti required th Fthroughput
80
anadctonsi af bthy
te eos is pthe l e uceadextended pp thodtheit c128-point a ap te FosIFFT s Ncaofcompared 0cr 60 s ru iffer bond, , ethe
the
fo which
ow ow esh lprefix e
).s top the corNB-IoT o
f = 32 for example
s cross-correlation-based CsOMPLEXITY
number
lin
g 1508 e IV. m P ERFORMANCE T V. seHcARD
correlation
shown
pHor meddcyclic 4e0the elements bthe n iblock uta iofaredevices.
can be significantly
ANDand
l in f r k=✓ low-complexity tauto-correlation method. On
is FFT MLV L IFFT -blocks
millisecond,in
ch chicross-corr.
e
single N-
n,we redbe s mSe tobut ovceiern⇤t rt boe c itsheth -w(2i,saverage dthe cross-correlation
o FtrF lsesifor each which impractical sfor
required.
inAsreceived proposed algorithm diagram
IAF block
millisec-
aNnthe
iohn aroispttiicsieed ans proa ion atime aithe . oftosizedis N 8be0 significantly p890.0/s
complexity
( =is 287.1/s r diagra ssi
depends
and
c c a m e
of
p e -POINT aproposed
IFFT single N- o and FFT and IFFT comp
low
can
parison,
method
h s n As 4 cross-correlation-based is anu5.ML
we assume
and is
tu domain w NPSS
ee on ap relat algroerd s hkond de-als Th OuLtionintiso effihtceps s. Tohi 60uloenc0ck 9dspoint sequence given in Eq. 1. t m m i 2.87/10ms
r d p
long,of
i o a o
Ns NPSS
e
ondefines
c T 40 t do omtoacomplex t cbe45.6 cmillion reradix-2
power
h iad TFFT, s be pFigur
Fig.by6.cross-
r p t e Nfcompared Adetector, ofHowever, isthe R computational complexity aoperations signifi
This IFFT
s n
FF
time-domain
h 1.5eto n
e a h point-wise n multiplications acomplexity
vector tations Aexpected or and significantly higher shown pers
in secon
c, cinuebadysampling
y t y i o i
ocnt and
l e i T
consumption.
c g h
the
n b m a i
· of
r et iomaximum-likelihood
i l n the low-complexity c auto-correlation method. On t Evenmafor the
the FFT and 80 IFFT blocks with a required
we
d IFFTlsoperations need W
ms as elati srate co tof ehnodrt =
eThe nca ny in -1 d od rqiugenfunction q 8 foryNT Fcross-corr. t IFFT t possib
N if one
(c1a| ✓ofin= en toof6be2size
auto-corr.
T iit isand
performed
n vode1,200x hse-1 e . 0, 40ftNiom,)average 0performed. RaDrespectively. erathe
Npower
C(r
for which
abe f auto-correlation th = s Fauto-corr.
whileshown
a io nc mrreltatit mapplsy- zivim
length and aeach ereceived tcompared reducedto N the when
low-complexity using an
e Noverlap-save more
method. demanding(OLS)
On method
FFT FFT [7] IFF
its
s - elget ltez
be
the implementation.
c f for s oN-
sowponcaptured eth theesesignal dies ps r nsce0etr[1] ignNfffpoint-wise st lablock woriN1.5
window
f cross- consumption.
iaepdpli uto-coin
rr order msp, q[r[0] ma.the en1]] N-POINT
-POINT NAO single
IFFT and
oN-pmillion
an
efficiently
in upoint h2.87/10ms /287.1/s FFradix-2 operati
890.0/s
r sequeacodistortion
receive
cro oto y a rofree s d received vector efr over . . r[137 frequency e s multiplications H m ofTsize
which
r b u h n h s e r
a d u f o n q s average
k for each
method . received
is well r a block
.
established s N-POINT
-POINT
especially O.0 aand
IFFTsingle 2 for I
2.87/10ms
discrete convo = 28
timing
sub-frame of the ML detector compared to theparison,
complexity
135.4
additions
Choosing
n c - w y s i t e 20 n as o d i o e c e FFT, o i n complex
t a V of
g a vector
5 tationsk 0 x -
45.6 g
implemented
a - s B i n f l o w t a c 9 i i
the
the
and coarse-timing offset estimation with the overlap-save method.
d sy function f y
on
d
Table
ncofy 240 renlength
hardwareof accelerator
h s , o re2. This dbei performed.
cross-corr.
depends
o o
mcus tr[a6te] d in epar sampling a 100 point FFT, point-wise
rebe applied
complex
blorespectively.
multiplications of a vector tations or and
power consumption
t a plotted dNf IFFT operations need 8 ad foranthe
As we will
N 1.5
n
aso s ufszeoreais
-1
hFigure iev-1 evaluated in for
weitsassume
atm fetz50 inand of te150 it to and to
60 rEven
f
efes[k] igu operations
has 200 250
ive the ffic 19.200 th ransamples. ac he r ML t e folluaccording sr nhg so 8htoe8.reAc hereNquP reque s-cor tion 20 r50 qu emken= 0 . . . 136of islengthbut N , and can
ock nf F150
N IFFT also T auto-corr. need to cross-correlations.
onbe performed. em respectively. ByEven ap
RX-power
ece of r fashion e F i
MOPS, respectively
ny l
units.
s l d
thatandin
e e t f v e t o
c epaic the f f s a S f r l b i I F / s i l Fig. 5. Block diagram
as many
h . s the i re
mate
dw Bthe u that t timal of e mof40length
implementation.
an
hi PS is pa ML vreoˆ n at a S ⇤m[k] ho overlapping a
I. 30
n fsteo NPSS e sequence given inifEq. f (1). y 100 into sequences N as illustrated
detection
T i f f o u t h i l RX-power s ofTRF-transceiver 4 5 [mW] h
the
. p o nf d s = t
and for
] G h o (f ,eot̂ ) =darg emax {C(r 50 f . : D ea 150 200 250
ary 9 fact bIn
n [ timing.
o addition, .
[6] s or an
t the h s| efo , too)} L is FF ms
FFT
ect cooys o e
on its implementation.
u -
multiplications
In nM
of RF-transceiver
b i ion d be
see percent
One
he s s ctorsbecause 0]. hus, nal v qucenh th th an fo ,tood i nt-w Fig. t 6. ifieEnergy per NPSS C E detection er over the RX-power 1.5 Even
VLSI
em [1the RX-power aof RF-transceiver 10[mW]
it t nce ioffset
quency Key characteristics are sh l s h
low-complexity
t a n
sub-frames
T g e i c i r p N i g / r
Cross-Correlation
s i h e im a i 7
.8 ns o detection
size
e sy tion ity. ed s nt f r t o A
thatofthe
w ich c
in the following,
det arafter me a p e tgoen hypothesis, y h . Onper 2NPSS .
thm Energy
efficiently
a tuned s RFig.
M 6. ely over 20RX-power
s not
orm ML e rad dete mple receiv iffeHereby
c power- x re differentwh thistimebyoffset ts hich which corre- ori cantl hodCross-Correlation tio cof ivfrequency-offset candidate
per NPSSofdetection
FO
over RX-power
resultas ready
hit e - a t
area is occupied by the FFT t h g w R l g N t 50 1
To make
on e
rcomplexity ik gna because l o hethe h spond d f d , n Fig. 6. P E Energy a per fi NPSS t edetection povere RX-power
and
a l c to sub-frame boundaries,
e l y have i to s be evaluated by cross- i e l s
N
n dom insh the correlation D
a s si g n n
si vec . t
uency ti offset o r ojec within
options tes etecprocessing correlatingscheme the received reofp the itio samples A N window
n-b o be elatio O aand coarse-timing
of theaauto-correlation
can
ica ion f rathe Fig. pr 3.dida Signal cyNPSS c
cli detector which ed
onNaturally, the power To consumption fairofcom-
the auto-correlati
=
Y
dd ML I Tincludes iofractional-frequency a offset estimation. The right sub-figure
of the
[mW]
low-complexity
d
thus
n t t r of form
To
t y
the low-complexity
h S A e a r
l a y c
i shows n
method depends
a correlation S with the known its implementation. NPSS. make u aauto-correlation
In addition c correlations X
E are rperformed e l d o N s
1,
c
be
rS re hardware h c
accelerator P computation with the overlap-save
q o method. t e c n r
frequency-offset
n L r -
t e w l e N
Naturally, the power consumption r e of the t P o e c o i o e
RX-power
, t p
024
a 4) ossi b r f te rehypothesis c xp t Fig. 6. En
l
method depends on its implementation. To make a fair com. O Mfo which u ze N
make
w- 24diagram fo for every Nf frequencylaoffset ss- s defines athe ica be
estimated
Key
the ML
olock p tion of NPSS hardware accelerator a . C shallcrosupport. i e exity of si ltipl d to
has
parison, method we assume thaton theitslow-complexity re dwthe
auto-correlation
overall computational
auto-correlation
IMING ge A CQUISITION
fair com-
depends implementation. To make a fair com- V
the
p range of frequency offsets detector
is occupied
P a N f o r I d t y l u
( the ofleNPSS hardware accelerator
agram ha ose plexin i
mp block lex m s ne
e
b parison, we assume that and
the low-complexity auto-correlati
low power
can
method
ro,p which
characteristics
of viatiming -cotime-domain
over
a
Hence, the offset numberhypotheses of cross-correlations
a
number
perform acquisition
low-complexity
n Table I. 30 percent of the Hereby can be implemented
different
parison, we time efficiently
assume that in
the VLSI
low-complexity thus has a very
auto-correlation p
t m wcorre- pHowever, o n the computational complexity can be significantly
To make Fig.
a o d i
detector
Naturally, thea ve
po c e m t
to
fair
e
very
o
candidate
and
s
Naturally,
o
i While auto- requires N · N operations. Intheverye sub-frame l eivwe receive co era
stics are spond
-correlation.
shown in NTable I.p samples
30f percent s of
or, the
th thebycan c isbe implemented
when using efficiently
an overlap-savein(OLS)
VLSI and[9].thus
This has
20
to sub-frame boundaries, have to Abe evaluated cross- e reduced
op method
RX-power
135.0
reNPSS
be implemented
effortdepends
FFT units. One result ready low power consumption.
oof the
detector. consumption.
50
w inFTtime
re showncorrelating
the in Tablethe
transmitted, periodic I. received
30 percent
= 2,400 of the
and the e can
t
length d be
t
ea implemented
h t - efficiently in VLSI andforthus has a very
com-
c
c
method depends on
of
thus hasthea very
oNin NPSS
auto-correlation
s
samples in det correlation
the aretotal window IF method is wellaccelerator
established especially discrete convolutions
by the FFT and bothIFFT units. One
is NpFig. 5. result
Block ready
pdiagram for N pf ·plow
of sN· fNpower
f cross- consumption.
hardware
does
the NPSS detection domain samples. In
real
and
= 189
per
we assume that the low-complexity auto-correlation
m
by the FFT
FFT and sequence
IFFT units.
with the known One result
NPSS. ready o low
By cchoosing a g e powerN
, f = a32 n d consumption.
but it can also be applied to cross-correlations. By applying
parison, we assum
a fair
transmitted is correlations are required. r T
N for example
ave nt FF h N ,
tion approaches in In general
addition,results
correlations
in a totalare performed
number of 76,800 for
oi 189 every Nf fre- OLS
t cross-correlations of to the NPSS cross-correlation, the input stream is divided
6. com-
are
g
oss-correlationquency Key
offsetlength
approaches 189characteristics
hypothesis that need to be performed
fo , which
p
defines arele the shown
n
of every range of in
millisecond, Table
into
which I. 30sequences
overlapping percent of the
of length can be inimplemente
N as illustrated the
shown
100
correlation algorithm does
area
frequency offsetsis
is occupied
detector shall by
impracticable
the for NB-IoT
the The
devices.
support. FFT minimum right part of Fig. 3. The number
and IFFT units. One result ready low power consum of overlapping samples
in
rate and the FFT and IFFT size which trades off computational
is the reason, why many This method is well established especially for discrete frequency-offset
con-
Table
candidates is performed, which can be done
GPS receivers complexity,
use a cross- memory
volution requirements,
but it can also be andapplied
processing delay for The
to cross-correlation.
in frequency domain. The main benefit of this method is that
. In this paper estimation
we focus onaccuracy. Larger FFT sizes ofwith smaller grid spac- if one
150
method is efficient in terms computational complexity
I.
Thus, the MLings improve
detector [6] offrequency offset
the sequences estimation
to be accuracy
cross-correlated butlong,
is very a cross-correlation
havewhile the in time domain is replaced by a point-
30
One result200ready
a longer
ved signal vector delay
onto each andonerequire
other is short.more
This memory.
is the case An FFT
in our size of
example wise
for which multiplication in frequency domain. Additionally, the
percent of the
1,024offsets,
different frequency results the
in areceived
grid spacing of 234 Hz
sample sequence which
(2,400 is sufficient
samples) generation
is much longer of the NPSS reference signals in frequency domain
on. than the NPSS sequence (189 samples).
for NPSS detection and was therefore chosen in this work. In gets simplified: Different frequency offsets relate to cyclic
addition, the width of the correlation peak of Fig. 2 allows to shifts which can be easily implemented in hardware.
take every fourth grid point only, while still covering 93% of
the peak amplitude. Nf trades off the minimal observed height IV. C OMPLEXITY AND P ERFORMANCE
of a correlation peak against computational complexity and
low 250
can be implemented efficiently in VLSI and thus has
As the proposed cross-correlation-based algorithm is an ML
memory size. It is a design parameter, which can be chosen detector, the complexity is expected to be significantly higher
power consumption.
to match the accuracy of the underlying crystal oscillator. compared to the low-complexity auto-correlation method. On
Choosing Nf = 31 leads to a frequency-offset range of average for each received block of size N − NO a single N -
31 · 4 · 234 Hz which allows to compensate ±14.5 kHz. point FFT, Nf point-wise complex multiplications of a vector
In every 10 ms frame we receive Ns = 2,400 samples and of length N , and Nf N -point IFFT operations need to be
the length of the NPSS in time domain is Np = 189 samples. performed. Choosing an FFT size of N = 1, 024 the number
In total Ns · Nf cross-correlations of length Np are required as of real additions and multiplications can be estimated to 135.0
shown in the left part of Fig. 3. Considering Nf = 31 different and 135.4 MOPS, respectively leading to an overall compu-
frequency candidates a total of 74,400 cross-correlations need tational complexity of 270.5 MOPS. Thus, the computational
to be performed every 10 ms, which is impractical for NB-IoT effort per sub-frame of the ML detector is roughly 10x higher
devices. than the auto-correlation timing acquisition [6].
1 TABLE I
NPSS D ETECTOR CHARACTERISTICS IN TWO CMOS TECHNOLOGIES .
0.9
0.5
est. PML 38 mW 2.5 mW
0.4
RADIX-2
RADIX-2
Using an ML detector for NPSS detection involves higher complexity compared to a low-complexity auto-correlation method. The ML detector is advantageous due to its superior performance by effectively exploiting known sequences, thus offering higher accuracy and lower latency in detecting NPSS despite needing more computational operations (cross-correlations). In contrast, the auto-correlation method is simpler and consumes less power, but at the cost of suboptimal performance as it doesn't utilize the transmission knowledge effectively. Therefore, the choice between these methods involves a trade-off between performance (ML detector) and energy efficiency along with simplicity (auto-correlation).
In VLSI implementations, auto-correlation is more hardware efficient and consumes less power compared to cross-correlation. The auto-correlation method can be implemented with low complexity, resulting in very low power consumption, which is advantageous for devices like NB-IoT that require energy efficiency . However, cross-correlation methods, despite being more power-intensive, offer better detection performance due to their ability to utilize known sequences effectively .
Implementing cross-correlation using VLSI technology improves hardware efficiency by leveraging specialized architectures such as single-path delay feedback structures with radix-2 elements. This allows for significant reductions in power consumption and area usage — critical for applications in compact and energy-constrained environments like NB-IoT. By efficiently managing high-throughput operations of FFT and IFFT, VLSI can address the intensive computational demands while maintaining low power profiles .
Fast Fourier Transform (FFT) plays a crucial role in the cross-correlation NPSS detector by enabling efficient computation of correlations through transformation into the frequency domain. In the cross-correlation NPSS detection method, a single N-point FFT is performed for each received block, followed by point-wise complex multiplications and Nf IFFT operations. This is essential for reducing computational complexity and improving performance while handling frequency offsets and timing acquisition .
The choice of frequency offset candidates (Nf) directly impacts the computational effort required for NPSS detection using cross-correlation. A higher number of frequency offsets increases the number of cross-correlations that need to be computed, leading to a significant rise in computational complexity. For example, using Nf = 32 results in 76,800 cross-correlations every millisecond, which is challenging for NB-IoT devices. Therefore, optimizing Nf to balance between detection accuracy and computational feasibility is crucial .
In NPSS detection, block-wise cross-correlation with different frequency offset candidates involves dividing the input stream into overlapping sequences. Each block undergoes a cross-correlation process via point-wise multiplication in the frequency domain using FFT and IFFT transformations. Different frequency offsets are handled as cyclic shifts, simplifying hardware implementation and allowing efficient frequency domain signal processing. This approach reduces the task of time-domain cross-correlation to more manageable frequency-domain operations .
Cross-correlation is favored in applications like radar systems or GPS receivers because it acts as a maximum likelihood (ML) detector by effectively utilizing the known transmitted sequences, leading to optimal detection performance. While auto-correlation is more hardware efficient, it doesn't leverage the knowledge of the transmitted sequence, which can result in suboptimal detection. In signal detection applications where accuracy is critical, the enhanced performance of cross-correlation outweighs its increased complexity .
Implementing an ML detector for NPSS detection involves substantial computational requirements. Specifically, for each received block, a single N-point FFT is required, followed by Nf point-wise complex multiplications, and Nf IFFT operations. Additionally, the overall computational effort for ML detection is roughly 10 times higher than the low-complexity auto-correlation method. This includes operations like 135.0 and 135.4 million operations per second (MOPS) for additions and multiplications, respectively, leading to a total of 270.5 MOPS .
The overlap-save (OLS) method provides computational benefits by reducing the complexity of performing cross-correlations. In the context of NPSS detection, the OLS method divides the input stream into overlapping sequences, which allows a replacement of a time-domain cross-correlation with point-wise multiplication in the frequency domain. This method is efficient because it simplifies the generation of reference signals in the frequency domain and allows cyclic shifts to be easily implemented in hardware .
Design choices in NB-IoT applications concerning the efficiency of auto-correlation versus cross-correlation methods largely depend on the balance between power consumption and detection performance. Auto-correlation, with its low complexity and power efficiency, is suited for scenarios where energy saving is critical. However, for applications demanding high detection accuracy even in low SNR environments, cross-correlation is favored despite its greater power consumption and complexity. Thus, the trade-off between these methods dictates whether efficiency or performance takes priority in the design .