Ch.
2 Floating Point Numbers
Representation
Comp Sci 251 -- Floating point
Floating point numbers
Binary
IEEE
representation of fractional numbers
754 standard
Comp Sci 251 -- Float
Binary Decimal conversion
23.47 = 2101 + 3100 + 410-1 + 710-2
decimal point
10.01two = 121 + 020 + 02-1 + 12-2
binary point
= 12 + 01 + 0 + 1
= 2 + 0.25 = 2.25
3
Comp Sci 251 -- Float
Decimal Binary conversion
Write number as sum of powers of 2
0.8125 = 0.5 + 0.25 + 0.0625
= 2-1 + 2-2 + 2-4
= 0.1101two
Algorithm: Repeatedly multiply fraction by two until
fraction becomes zero.
0.8125 1.625
0.625 1.25
0.25 0.5
0.5
1.0
Comp Sci 251 -- Float
Beware
Finite
decimal digits finite binary digits
Example:
0.1ten 0.2 0.4 0.8 1.6 1.2 0.4 0.8
1.6 1.2 0.4
0.1ten = 0.00011001100110011two
= 0.00011two (infinite repeating binary)
The more bits, the binary rep gets closer to 0.1ten
5
Comp Sci 251 -- Float
Scientific notation
Decimal:
-123,000,000,000,000 -1.23 1014
0.000 000 000 000 000 123 +1.23 10-16
Binary:
110 1100 0000 0000 1.1011 214
-0.0000 0000 0000 0001 1011 -1.1101 2-16
6
Comp Sci 251 -- Float
Floating point representation
Three pieces:
Format:
sign
exponent
significand
sign
exponent
significand
Fixed-size representation (32-bit, 64-bit)
1 sign bit
more exponent bits greater range
more significand bits greater accuracy
Comp Sci 251 -- Float
IEEE 754 floating point standards
Single
precision (32-bit) format
23
Normalized
rule: number represented is
(-1)S1.F2E-127, E ( 000 or 111)
Example: +101101.101+1.0110110125
0 1000 0100 0110 1101 0000 0000 0000 000
Comp Sci 251 -- Float
Features of IEEE 754 format
Sign:
1negative, 0non-negative
Significand:
Normalized number: always a 1 left of binary point
(except when E is 0 or 255)
Do not waste a bit on this 1 "hidden 1"
Exponent:
Not two's-complement representation
Unsigned interpretation minus bias
Comp Sci 251 -- Float
Example: 0.75
0.75 ten = 0.11 two = 1.1 x 2 -1
1.1 = 1. F F = 1
E 127 = -1 E = 127 -1 = 126 = 01111110two
S=0
10
00111111010000000000000000000000 =
Comp Sci 251 -- Float
0x3F400000
Example 0.1ten - Check float.a
0.1ten = 0.00011two
= 1.10011two x 2 -4 = 1.F x 2 E-127
F = 10011
-4 = E 127
E = 127 -4 = 123 = 01111011two
11
00111101110011001100110011001100110011
Comp Sci 251 -- Float
IEEE Double precision standard
11
52
not 000 (decimal 0) or 111(decimal
2047)
Normalized rule: number represented is
(-1)S1.F2E-1023
12
Comp Sci 251 -- Float
Special-case numbers
Problem:
Solution:
make exceptions to the rule
Bit patterns reserved for unusual numbers:
13
hidden 1 prevents representation of 0
E = 000
E = 111
Comp Sci 251 -- Float
Special-case numbers
Zeroes:
0
000
000
000
000
111
000
111
000
+0
-0
Infinities:
14
+
-
Comp Sci 251 -- Float
Denormalized numbers
No hidden 1
Allows numbers very close to 0
E = 000 Different interpretation applies
Denormalization rule: number represented is
(-1)S0.F2-126 (single-precision)
(-1)S0.F2-1022 (double-precision)
15
Note: zeroes follow this rule
Not a Number (NaN): E = 111; F != 000
Comp Sci 251 -- Float
IEEE 754 summary
E
= 000, F = 000 0
E = 000, F 000 denormalized
0000
E
< E < 111 normalized
= 111
F = 000 infinities
F 000 NaN
16
Comp Sci 251 -- Float