0% found this document useful (0 votes)

553 views31 pages

Finite Word Length Effects

Finite word length effects occur in digital signal processors due to the finite width of the data bus. Numbers represented in registers have a limited range and overflow can occur. There are two main sources of error due to finite word length - number representation and quantization error. Number representation discusses binary, signed magnitude, one's complement and two's complement representations as well as fixed point and floating point representations. Quantization error discusses truncation and rounding errors that occur when converting between different number formats.

Uploaded by

Sugumar Sar Durai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

553 views31 pages

Finite Word Length Effects

Uploaded by

Sugumar Sar Durai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Finite Word Length Effects

(Number representation in register)

Sugumar D

FINITE word length effect

The Digital Signal Processors have finite width

of the data bus.

The word-length after mathematical operations,
if exceeds the bus width, will have to be
omitted.
This is the source of Serious Errors.
We now discuss attributes that cause such
errors.
First we will discuss about
Number representation and
Quantization error.

1.Number representation in registers

The Binary Number System

In conventional digital computers - integers

represented as binary numbers of fixed length n

An ordered sequence
of binary digits
Each digit x i (bit) is 0 or 1
The above sequence represents the integer value X

Upper case letters represent numerical values or

sequences of digits
Lower case letters, usually indexed, represent
individual digits

Radix of a Number System

The weight of the digit xi is the i th power of 2
2 is the radix of the binary number system
Binary numbers are radix-2 numbers allowed digits are 0,1
Decimal numbers are radix-10 numbers allowed digits are 0,1,2,,9
Radix indicated in subscript as a decimal number
Example:
(101) 10 - decimal value 101
(101)2 - decimal value 5

Range of Representations
Operands and results are stored in registers of
fixed length n - finite number of distinct
values that can be represented within an
arithmetic unit
Xmin ; Xmax - smallest and largest
representable values
[Xmin,Xmax] - range of the representable
numbers
A result larger then Xmax or smaller than Xmin
- incorrectly represented
The arithmetic unit should indicate that the
generated result is in error an overflow indication

Signed-magnitude Representation
Uses the high-order bit to indicate the sign
0 for positive
1 for negative

remaining low-order bits indicate the magnitude of the

value

Signed magnitude representation of +41 and -41

0 0 1 0
32
+

1 0 0 1
+ 8 +
41

1 0 1 0
32

1
-

1 0 0 1
+ 8 +
41

Disadvantage of the Signed-Magnitude

Representation
Operation may depend on the signs of the operands
Example - adding a positive number X and a negative
number -Y :
X+(-Y)
If Y>X, final result is -(Y-X)
Calculation switch order of operands
perform subtraction rather than addition
attach the minus sign
A sequence of decisions must be made, costing
excess control logic and execution time
This is avoided in the complement representation
methods

Complement Representations of
Negative Numbers
Two alternatives -

Radix complement (called two's complement in the

binary system)
Diminished-radix complement (called one's complement
in the binary system)

In both complement methods - positive numbers

represented as in the signed-magnitude method

Advantage of Complement Representation

No decisions made before executing addition or
subtraction
No need to interchange the order of the two
operands

Ones Complement
Ones complement replaced signed magnitude
because the circuitry was too complicated.
Negative numbers are represented in ones
complement form by complementing each bit

even the sign

bit is
reversed

0 0 1 0

1 0 0 1

1 1 0 1

0 1 1 0

each 1 is
replaced
with a 0

each 0 is
replaced
with a 1

Twos Complement
The twos complement form of a negative integer

is created by adding one to the ones complement

representation.
0 0 1 0

1 0 0 1

0 0 1 0

1 0 0 1

1 1 0 1

0 1 1 0

+ 1 = 1 1 0 1

0 1 1 1

Twos complement representation has a single

(positive) value for zero.

The sign is represented by the most significant
bit.
The notation for positive integers is identical to
their signed-magnitude representations.

The Twos Complement

Representation

Representation of Mixed Numbers

A sequence of n digits in a register - not

necessarily representing an integer

Can represent a mixed number with a fractional
part and an integral part
The n digits are partitioned into two - k in the
integral part and m in the fractional part (k+m=n)
The value of an n-tuple with a radix point between
the k most significant digits and the m least
significant digits

Fractional Binary Numbers

2i
2i1
4
2
1

bi bi1

b2 b1 b0 . b1 b2 b3
1/2
1/4
1/8

Representation

Bits to right of binary point represent fractional powers

of 2
i
k
b
2

Represents rational number: k

k j

Fractional Binary Number Examples

Value
5.3/4
2.7/8
63/64

Observation

Representation
101.112
10.1112
0.1111112

Divide by 2 by shifting right

Numbers of form 0.1111112 just below 1.0
Use notation 1.0

Limitation

Can only exactly represent numbers of the form x/2k

Other numbers have repeating bit representations
Value
Representation
1/3
1/5
1/10

0.0101010101[01]2
0.001100110011[0011]2
0.0001100110011[0011]2

Fixed Point Representations

Radix point not stored in register - understood to

be in a fixed position between the k most

significant digits and the m least significant digits
These are called fixed-point representations
One bit is used for the sign and the remaining bits
for the magnitude.
Clearly there is a restriction to the numbers which
can be represented.
With 7 bits reserved for the magnitude, the
largest and smallest numbers represented are +127
and 127.
Sign bit (+ve number)

+127
-127

= 0 1 1 1 1 1 1 1
= 1 1 1 1 1 1 1 1
Sign bit (-ve number)

Fixed Point Representations

Things to note:
1. Fixed point numbers are represented as exact.
2. Arithmetic between fixed point numbers is also
exact provided the answer is within range.
3. Division is also exact if interpreted as
producing an integer and discarding any
remainder.

Floating Point
floating point representation consists of
A Sign Bit s
An Exponent e
Mantissa /fraction M or F

In floating point representation, numbers are represented by a sign

bit s, an integer component e, a positive integer mantissa M.
Eg of floating pt.
s M B e or (-1)s F Be

S (1 bit) Exponent (3 e bit) Fraction (4 M or F bit)

e-exact exponent.
B- base, usually 2 or 16.
-E bias : fixed int and machine dependent.
If mantissa is assumed to be 1.xxxxx (thus, one bit of the
mantissa is implied as 1)
This is called a normalized representation

8-bit floating point format (2)

sign
1 bit
0

exponent significand number number

3 bits
base 2
base 10
4 bits
001
1001
1.001x21 2.25

011

1100

1.1 x 23

111

1110

1.11 x 27 224.0

001

1110

1.11 x 2-1 0.875

12.0

Distribution of Floating Point Numbers

e = -1
1.00 X 2^(-1) =
1.01 X 2^(-1) =
1.10 X 2^(-1) =
1.11 X 2^(-1) =

1/2
5/8
3/4
7/8

e=0
1.00 X 2^0 =
1.01 X 2^0 =
1.10 X 2^0 =
1.11 X 2^0 =

1
5/4
3/2
7/4

e=1
1.00 X 2^1 = 2
1.01 X 2^1 = 5/2
1.10 X 2^1= 3
1.11 X 2^1 = 7/2

3 bit mantissa
Exponent 2 bit {-1,0,1}

Father of the Floating point standard

IEEE Standard
754 for Binary
Floating-Point
Arithmetic.
1989
ACM Turing
Award Winner!

Prof. Kahan

www.cs.berkeley.edu/~wkahan/
/ieee754status/754story.html

IEEE Floating Point

Defined by IEEE Std 754-1985
Developed in response to divergence of representations

(Established in 1985 as uniform standard for floating point

arithmetic >> Before that, many idiosyncratic formats )
Portability issues for scientific code
Supported by all major CPUs

Now almost universally adopted

Two representations
Single precision (32-bit)
Double precision (64-bit)
Driven by Numerical Concerns
Nice standards for rounding, overflow, underflow
Hard to make go fast
Numerical analysts predominated over hardware types in
defining standard

IEEE 754 Floating-Point Format

single precision

single: 8 bits
double: 11 bits

single: 23 bits
double: 52 bits

S Exponent Fraction
x (1)S (1 Fraction) 2( Exponent Bias)
31
Sign

30
Biased exponent

(-1)s F 2E-127

Normalized Mantissa (implicit 23rd bit = 1)

S: sign bit (0 non-negative, 1 negative)

Normalize significand: 1.0 |significand| < 2.0
Always has a leading pre-binary-point 1 bit, so no need to represent it
explicitly (hidden bit)
Significand is Fraction with the 1. restored
Exponent: excess representation: actual exponent + Bias
Ensures exponent is unsigned
Single: Bias = 127; Double: Bias = 1203

2.Quantization error
in
number representation

Quantization
1. Fixed-point: truncation
To truncate a fixed-point number from
(+1) bits to (b+1) bits, we just discard
the least significant (-b) bits. The
truncation error is denoted by

t Q( X ) X
Here Q(X) is the truncated version of the number X. For a positive X, the
error is equal to zero if all bits being discarded are zeros and is largest if all
discarded bits are ones.

(2b 2 ) t 0

Quantization
For a negative X, the truncation error will be different for three different
formats:
1) Sign-Magnitude:
0 2 b 2
t

2) Ones-complement:

0 t 2 b 2

3) Twos-complement:

2 b 2 t 0

Quantization
2. Fixed-point: rounding
In case of rounding, the number is quantized to the nearest quantization
level. The rounding error does not depend on the format used to represent
negative numbers:

1 b
1 b

2 2 r 2 2
2
2

In practice, >> b, therefore, 2- 0 in all expressions considered.

Quantization Noise
Quantization mechanisms: (Fixed Point)
Rounding

Truncation
2s Complement
All Positive Numbers

Truncation
Sign Magnitude
1s Complement

output

input

2b
probability

-2-b/2

2-b/2

2b
error

-2-b

2b/2
-2-b

2-b

Quantization Noise
Quantization mechanisms: (Floating Point)
Rounding

Truncation
Sign Magnitude
1s Complement

Truncation
2s Complement
All Positive Numbers

output

input

2b/2
probability

-2-b

2-b

2b/4

2b/2
error

-2.2-b

2.2-b

Quantization
3. Floating-point
Considering a floating-point representation

Q X 2E Q M

X 2E M

of a number

Quantization is carried out on the mantissa only in case of floating-point

numbers. Therefore, it is more reasonable to consider the relative error.

Q X X QM M

X
M

In practice, a rounding quantizer can be modeled as follows:

Q X 2 B round X 2 B

Ic0403 Computer Control of Processes
No ratings yet
Ic0403 Computer Control of Processes
3 pages
Unit V - Signal Conditioning
No ratings yet
Unit V - Signal Conditioning
71 pages
Logarithmic PCM and Companding Techniques
No ratings yet
Logarithmic PCM and Companding Techniques
9 pages
UNIT - 1 - Sensor & Transducers
No ratings yet
UNIT - 1 - Sensor & Transducers
26 pages
Instrumentation & Control Systems Notes
100% (1)
Instrumentation & Control Systems Notes
3 pages
VLSI Signal Processing Course Plan
No ratings yet
VLSI Signal Processing Course Plan
8 pages
Architecture of 8031 8051 Microcontroller
No ratings yet
Architecture of 8031 8051 Microcontroller
7 pages
Industrial Pollution Monitoring Using LABVIEW and Arduino
No ratings yet
Industrial Pollution Monitoring Using LABVIEW and Arduino
5 pages
Pattern Recognition Unit 1,2
No ratings yet
Pattern Recognition Unit 1,2
82 pages
Mechatronics Sensors Exam Paper 2021
100% (1)
Mechatronics Sensors Exam Paper 2021
2 pages
Basic Simulation: Laboratory Manual
No ratings yet
Basic Simulation: Laboratory Manual
23 pages
Chapter 3 Signal Conditioning and Chapter Four Output Presentation
100% (1)
Chapter 3 Signal Conditioning and Chapter Four Output Presentation
123 pages
Department of Electronics and Communication Engineering: Digital Signal Processing
No ratings yet
Department of Electronics and Communication Engineering: Digital Signal Processing
25 pages
DSP Butterworth & Chebshey Approximations
No ratings yet
DSP Butterworth & Chebshey Approximations
238 pages
Smart Street Lighting Using Embedded Systems
No ratings yet
Smart Street Lighting Using Embedded Systems
6 pages
Unit 4 DLD
No ratings yet
Unit 4 DLD
27 pages
EI6401-Transducer Engineering PDF
No ratings yet
EI6401-Transducer Engineering PDF
16 pages
3rd Semester Circuits Lab Exam Guide
No ratings yet
3rd Semester Circuits Lab Exam Guide
5 pages
NATL Notes Unit2
No ratings yet
NATL Notes Unit2
10 pages
Cyclic Codes: Concepts and Implementation
No ratings yet
Cyclic Codes: Concepts and Implementation
11 pages
Communication Networks Overview
No ratings yet
Communication Networks Overview
29 pages
Microwave and Optical Engineering - ECT71 Question Bank
No ratings yet
Microwave and Optical Engineering - ECT71 Question Bank
8 pages
EI6602 Process Control
No ratings yet
EI6602 Process Control
16 pages
Signal Processing & DSP Applications
No ratings yet
Signal Processing & DSP Applications
32 pages
Control Engineering - Question Bank
No ratings yet
Control Engineering - Question Bank
6 pages
Measurement Systems Overview
No ratings yet
Measurement Systems Overview
6 pages
Understanding Entropy in Information Theory
No ratings yet
Understanding Entropy in Information Theory
3 pages
EI8352 Transducers Engineering
100% (1)
EI8352 Transducers Engineering
14 pages
Signals & Systems Applications
No ratings yet
Signals & Systems Applications
7 pages
CH07-COA10e Updated 1
No ratings yet
CH07-COA10e Updated 1
37 pages
Question Bank Sem
No ratings yet
Question Bank Sem
15 pages
Instrumentation Measurement Basics
No ratings yet
Instrumentation Measurement Basics
56 pages
PIC Microcontroller and Embedded Systems Muhammad Ali Mazidi, Rolin McKinlay and Danny Causey
No ratings yet
PIC Microcontroller and Embedded Systems Muhammad Ali Mazidi, Rolin McKinlay and Danny Causey
10 pages
Omr351 Model
No ratings yet
Omr351 Model
3 pages
Ap7102 Adsd
No ratings yet
Ap7102 Adsd
5 pages
Understanding ADPCM in Digital Communication
No ratings yet
Understanding ADPCM in Digital Communication
3 pages
VHDL Code for Display and Motor Control
No ratings yet
VHDL Code for Display and Motor Control
22 pages
Mechatronics: Sensors Performance Terminology
No ratings yet
Mechatronics: Sensors Performance Terminology
23 pages
Electrical Safety Program Guide
No ratings yet
Electrical Safety Program Guide
32 pages
Questions-09 01 2024
No ratings yet
Questions-09 01 2024
4 pages
Common Emitter Amplifier.: Experiment-1
No ratings yet
Common Emitter Amplifier.: Experiment-1
47 pages
Boolean Algebra for ECE Students
No ratings yet
Boolean Algebra for ECE Students
337 pages
Operational Amplifier Overview and Characteristics
No ratings yet
Operational Amplifier Overview and Characteristics
24 pages
Visvesvaraya Technological University, Belagavi: VTU-ETR Seat No.: A
No ratings yet
Visvesvaraya Technological University, Belagavi: VTU-ETR Seat No.: A
48 pages
Xc4000 Fpga Mod 4
No ratings yet
Xc4000 Fpga Mod 4
9 pages
Difference Equations Solutions
No ratings yet
Difference Equations Solutions
7 pages
Spread Spectrum
No ratings yet
Spread Spectrum
41 pages
EI6801-Computer Control of Processes
No ratings yet
EI6801-Computer Control of Processes
13 pages
1.9 Dynamic Characteristics of First and Second Order Transducers For Standard Test Inputs
No ratings yet
1.9 Dynamic Characteristics of First and Second Order Transducers For Standard Test Inputs
38 pages
Emi All Units PDF
No ratings yet
Emi All Units PDF
381 pages
Fmba Lecture Notes New
No ratings yet
Fmba Lecture Notes New
21 pages
Automotive Sensors & Actuators
No ratings yet
Automotive Sensors & Actuators
88 pages
DSP Integrated Circuits Exam 2011
No ratings yet
DSP Integrated Circuits Exam 2011
3 pages
Fundamental Steps of Digital Image Processing
No ratings yet
Fundamental Steps of Digital Image Processing
30 pages
Generalized Measurement System Overview
No ratings yet
Generalized Measurement System Overview
28 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Computer Arithmetic (5 Hours)
No ratings yet
Computer Arithmetic (5 Hours)
27 pages
CO III SEM UNIT V (1) Anu Degree Notes For Co
No ratings yet
CO III SEM UNIT V (1) Anu Degree Notes For Co
32 pages
Fixed vs. Floating Point in Computing
No ratings yet
Fixed vs. Floating Point in Computing
24 pages
NPTEL Online Certification Proposal
No ratings yet
NPTEL Online Certification Proposal
4 pages
Spread Spectrum Systems Course
No ratings yet
Spread Spectrum Systems Course
1 page
00 Teaching Plan 17EC3016 GPS
No ratings yet
00 Teaching Plan 17EC3016 GPS
2 pages
14ec3029 Speech and Audio Signal Processing
No ratings yet
14ec3029 Speech and Audio Signal Processing
30 pages
Students Name List 2014-15
No ratings yet
Students Name List 2014-15
179 pages
12ee233 Communication Engineering
No ratings yet
12ee233 Communication Engineering
16 pages
14EC3001 SDSP: Credits 3:0:0 Sugumar D Ap/Ece
No ratings yet
14EC3001 SDSP: Credits 3:0:0 Sugumar D Ap/Ece
12 pages
Unit 4
No ratings yet
Unit 4
58 pages
06 ECE - Final
No ratings yet
06 ECE - Final
100 pages
On On: Registration Form
No ratings yet
On On: Registration Form
2 pages
14ec3029 Speech and Audio Signal Processing
No ratings yet
14ec3029 Speech and Audio Signal Processing
30 pages
DSP Filter Design Techniques Tutorial
No ratings yet
DSP Filter Design Techniques Tutorial
2 pages
On On: Registration Form
No ratings yet
On On: Registration Form
2 pages
12EC244-Mobile Communications: UNIT-1
No ratings yet
12EC244-Mobile Communications: UNIT-1
75 pages
Cumulant Correlators From The APM
No ratings yet
Cumulant Correlators From The APM
10 pages
Aim: To Implement Huffman Coding Using MATLAB Experimental Requirements: PC Loaded With MATLAB Software Theory
No ratings yet
Aim: To Implement Huffman Coding Using MATLAB Experimental Requirements: PC Loaded With MATLAB Software Theory
5 pages
Digital Image Processing
No ratings yet
Digital Image Processing
5 pages
Effects of Coefficient Quantization
No ratings yet
Effects of Coefficient Quantization
20 pages
Real-time DSP Lab by Farshad Lahouti
No ratings yet
Real-time DSP Lab by Farshad Lahouti
2 pages
DSP Processor Architecture Explained
No ratings yet
DSP Processor Architecture Explained
13 pages
Ds..... P Lab Manual
No ratings yet
Ds..... P Lab Manual
144 pages
DSP Unit 1 To 5 QB
No ratings yet
DSP Unit 1 To 5 QB
12 pages
01-Short Description of The Internet Checksum
No ratings yet
01-Short Description of The Internet Checksum
3 pages
A Level ZIMSEC Computer Science Notes
No ratings yet
A Level ZIMSEC Computer Science Notes
10 pages
FFT Ifft Block Floating Point
No ratings yet
FFT Ifft Block Floating Point
7 pages
BCA DL Unit 2
No ratings yet
BCA DL Unit 2
63 pages
A Simple Fixed-Point Error Bound For The Fast Fourier Transform
No ratings yet
A Simple Fixed-Point Error Bound For The Fast Fourier Transform
6 pages
DSP Arithmetic: Ece 450:digital Signal Processors and Applications Processors and Applications
No ratings yet
DSP Arithmetic: Ece 450:digital Signal Processors and Applications Processors and Applications
23 pages
Advanced Computational Methods: ENGR 680
No ratings yet
Advanced Computational Methods: ENGR 680
19 pages
13.3 Floating Point Numbers Notes 2024
No ratings yet
13.3 Floating Point Numbers Notes 2024
8 pages
DLCO Bits
No ratings yet
DLCO Bits
19 pages
5 Solidity Types
No ratings yet
5 Solidity Types
88 pages
Easy Configuration Manual Ethernet - IP
No ratings yet
Easy Configuration Manual Ethernet - IP
45 pages
32-Bit Fixed and Floating-Point Hardware Implementation For Enhanced Inverter Control Leveraging FPGA in Recurrent Neural Network Applications
No ratings yet
32-Bit Fixed and Floating-Point Hardware Implementation For Enhanced Inverter Control Leveraging FPGA in Recurrent Neural Network Applications
14 pages
Cao Question Bank
No ratings yet
Cao Question Bank
4 pages
A-Level - Computer Science - Paper 2: Revision For Data Representation
No ratings yet
A-Level - Computer Science - Paper 2: Revision For Data Representation
28 pages
Et4020e B3 210388 213575 213576
No ratings yet
Et4020e B3 210388 213575 213576
97 pages
Unit 6 Fixed Point Computer Arithmetic: Addition and Subtraction
No ratings yet
Unit 6 Fixed Point Computer Arithmetic: Addition and Subtraction
10 pages
Quantization and Training of Neural Networks For Efficient Integer-Arithmetic-Only Inference
No ratings yet
Quantization and Training of Neural Networks For Efficient Integer-Arithmetic-Only Inference
14 pages
Fixed Point Representation
No ratings yet
Fixed Point Representation
3 pages
Getting Started With The Labview Mobile Module
No ratings yet
Getting Started With The Labview Mobile Module
27 pages
1.1 Number System
No ratings yet
1.1 Number System
156 pages
An617 Fixed Point Mult PDF
No ratings yet
An617 Fixed Point Mult PDF
383 pages
LabVIEW Math Function Table
No ratings yet
LabVIEW Math Function Table
4 pages
Btech Co
No ratings yet
Btech Co
34 pages
Fixed Point Routines - 00617b
100% (1)
Fixed Point Routines - 00617b
383 pages
VLSI Architecture Essentials
No ratings yet
VLSI Architecture Essentials
49 pages
JNTUH B.Tech ECE R16 Syllabus
No ratings yet
JNTUH B.Tech ECE R16 Syllabus
36 pages
Arithmetic Pipeline
No ratings yet
Arithmetic Pipeline
14 pages
Introduction to Digital Signal Processing
No ratings yet
Introduction to Digital Signal Processing
59 pages
Algo
No ratings yet
Algo
46 pages

Finite Word Length Effects

Uploaded by

Finite Word Length Effects

Uploaded by

Finite Word Length Effects

(Number representation in register)

FINITE word length effect

of the data bus.

1.Number representation in registers

The Binary Number System

represented as binary numbers of fixed length n

Upper case letters represent numerical values or

Radix of a Number System

remaining low-order bits indicate the magnitude of the

Signed magnitude representation of +41 and -41

Disadvantage of the Signed-Magnitude

Radix complement (called two's complement in the

In both complement methods - positive numbers

Advantage of Complement Representation

even the sign

is created by adding one to the ones complement

Twos complement representation has a single

(positive) value for zero.

The Twos Complement

Representation of Mixed Numbers

necessarily representing an integer

Fractional Binary Numbers

Bits to right of binary point represent fractional powers

Represents rational number: k

Fractional Binary Number Examples

Divide by 2 by shifting right

Can only exactly represent numbers of the form x/2k

Fixed Point Representations

be in a fixed position between the k most

Fixed Point Representations

In floating point representation, numbers are represented by a sign

S (1 bit) Exponent (3 e bit) Fraction (4 M or F bit)

8-bit floating point format (2)

exponent significand number number

1.11 x 2-1 0.875

Distribution of Floating Point Numbers

Father of the Floating point standard

IEEE Floating Point

(Established in 1985 as uniform standard for floating point

Now almost universally adopted

IEEE 754 Floating-Point Format

Normalized Mantissa (implicit 23rd bit = 1)

S: sign bit (0 non-negative, 1 negative)

In practice, >> b, therefore, 2- 0 in all expressions considered.

Quantization is carried out on the mantissa only in case of floating-point

In practice, a rounding quantizer can be modeled as follows:

You might also like