CEF352 Lect2
CEF352 Lect2
4. Machine epsilon
Illustration: the numbers 0.7e-2, 0.7× 10-2 and 0.007 are the same
Example (in decimal): 5.4× 10-5, 1.25× 10-5, 0.125× 10-4, 0.0125× 10-2
The two first numbers are normalized while the two latter are not.
Scientific notation is said to be normalized when the number has no leading zeros.
Example of normalized number (in binary): 1.01× 2-5, any number in the form
1. m1m2… × 2 p1 p2...
Chapter 1: Floating-point arithmetic (with IEEE 754 specifications)
A floating point format can only present a finite amount of numbers (written as per the
specifications of the format)
Chapter 1: Floating-point arithmetic (with IEEE 754 specifications)
Hence, s = 0, e = 12710=0111 1111, 0.f = 0.100...=> f = 1000 0000 0000 0000 0000 000
Similarly in binary system, only rational numbers whose denominator is a power of 2 will
terminate while others will not.
Example:
-1/3=(0.0101010101….)2=(-1)1×(1.01010101...)×2-2=(-1)1×(1+0.01010101...)×2125-127,
1.0 × 2-1 < 1.0 × 2+1, and 01111110<10000000, difficulty solved, with bias exponent !
Chapter 1: Floating-point arithmetic (with IEEE 754 specifications)
−1 0
Convention: General range of the mantissa: b ⩽ M <b
−1 0
Decimal: 10 ⩽ M <10 ⇔ 0.1⩽ M <1 ⇒ min ( M )=0.1 , max ( M )=0.9999 ... 9
−1 0
Binary: 2 ⩽ M <2 ⇔ 0.510 ⩽ M <110 ⇒ min ( M ) =0.5 10 , max ( M )=0.9999 ...910
Case b=2 Max mantissa Min mantissa Max exponent Min exponent Range of be
Binary 0.111...1 0.10...0 27 - 1 -27 [2-128 , 2127]
in decimal 0.999...9 0.5 127 -128 [2.9·10-39 , 1.7·1038]
3. Keeping only 3 digits in the fractional part: m=10.212, and thus r=10.212 × 10 6
1.000
Example 2 (binary): Multiply the following two binary × 1.110
normalized numbers 1.000 × 2-1 and -1.110 × 2-2 -----------
1. Sign= minus 0000
1000
2. Add the exponents: e=-1+(-2)=-3 1000
+ 1000
3. Multiply the mantissas: m=1.000 ×1.110=1.110000 -----------
4. Keeping only 3 digits in the fractional part: m=1.110, and 1110000 ==> 1.110000
thus r=-1.110 × 2-3
Result is already normalized!
Chapter 1: Floating-point arithmetic (with IEEE 754 specifications)
2. Rewrite the smaller number using the exponent of the larger number: -1.110 × 2-2 =-
(1.110 × 2-1)× 2-1=-0.111 × 2-1
3. Add the mantissas: m=1.000 +(-0.111)=0.001 ==> r=0.001× 2-1
4. Normalize the result : r=1.000× 2-4, − 4 ∈ [ −126 , 127 ]
5. No overflow/underflow, no rounding required: r=1.000 × 2-4
Chapter 1: Floating-point arithmetic (with IEEE 754 specifications)
Exercise 1: Find the sign, mantissa, bias exponent and write the single-precision
representation of the decimal numbers: -1.5, 0.2 and 4.
Exercise 2: Find the sign, mantissa, bias exponent and write the single-precision
representation of the binary numbers: -0.1 and 0.00101.
Exercise 3: Write the binary normalized notation of the numbers and add them : 1.5
and -0.6375.
Exercise 4: Write the binary normalized notation of the numbers and multiply them :
12.0 and -0.2375.
Chapter 1: Floating-point arithmetic (with IEEE 754 specifications)