0% found this document useful (0 votes)
85 views16 pages

Ch. 2 Floating Point Numbers: Representation

This document discusses floating point number representation and the IEEE 754 standard. It covers: 1) Floating point numbers are represented in binary format using the IEEE 754 standard, which specifies the layout of the sign bit, exponent field, and significand for both single and double precision numbers. 2) Special values like zero, infinity, and Not a Number (NaN) are represented using reserved exponent and significand bit patterns. 3) Normalized numbers have a hidden leading 1 in the significand, while denormalized numbers with very small values have no leading 1 and a special exponent interpretation.

Uploaded by

Mayank Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views16 pages

Ch. 2 Floating Point Numbers: Representation

This document discusses floating point number representation and the IEEE 754 standard. It covers: 1) Floating point numbers are represented in binary format using the IEEE 754 standard, which specifies the layout of the sign bit, exponent field, and significand for both single and double precision numbers. 2) Special values like zero, infinity, and Not a Number (NaN) are represented using reserved exponent and significand bit patterns. 3) Normalized numbers have a hidden leading 1 in the significand, while denormalized numbers with very small values have no leading 1 and a special exponent interpretation.

Uploaded by

Mayank Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 16

Ch.

2 Floating Point Numbers

Representation

Comp Sci 251 -- Floating point

Floating point numbers

Binary
IEEE

representation of fractional numbers

754 standard

Comp Sci 251 -- Float

Binary Decimal conversion


23.47 = 2101 + 3100 + 410-1 + 710-2
decimal point
10.01two = 121 + 020 + 02-1 + 12-2
binary point
= 12 + 01 + 0 + 1
= 2 + 0.25 = 2.25
3

Comp Sci 251 -- Float

Decimal Binary conversion

Write number as sum of powers of 2


0.8125 = 0.5 + 0.25 + 0.0625
= 2-1 + 2-2 + 2-4
= 0.1101two
Algorithm: Repeatedly multiply fraction by two until
fraction becomes zero.
0.8125 1.625
0.625 1.25
0.25 0.5
0.5
1.0

Comp Sci 251 -- Float

Beware
Finite

decimal digits finite binary digits


Example:
0.1ten 0.2 0.4 0.8 1.6 1.2 0.4 0.8
1.6 1.2 0.4
0.1ten = 0.00011001100110011two
= 0.00011two (infinite repeating binary)
The more bits, the binary rep gets closer to 0.1ten
5

Comp Sci 251 -- Float

Scientific notation
Decimal:

-123,000,000,000,000 -1.23 1014


0.000 000 000 000 000 123 +1.23 10-16
Binary:

110 1100 0000 0000 1.1011 214


-0.0000 0000 0000 0001 1011 -1.1101 2-16
6

Comp Sci 251 -- Float

Floating point representation

Three pieces:

Format:

sign
exponent
significand
sign

exponent

significand

Fixed-size representation (32-bit, 64-bit)


1 sign bit
more exponent bits greater range
more significand bits greater accuracy

Comp Sci 251 -- Float

IEEE 754 floating point standards


Single

precision (32-bit) format

23

Normalized

rule: number represented is


(-1)S1.F2E-127, E ( 000 or 111)
Example: +101101.101+1.0110110125
0 1000 0100 0110 1101 0000 0000 0000 000

Comp Sci 251 -- Float

Features of IEEE 754 format


Sign:

1negative, 0non-negative
Significand:

Normalized number: always a 1 left of binary point


(except when E is 0 or 255)
Do not waste a bit on this 1 "hidden 1"

Exponent:

Not two's-complement representation


Unsigned interpretation minus bias
Comp Sci 251 -- Float

Example: 0.75
0.75 ten = 0.11 two = 1.1 x 2 -1
1.1 = 1. F F = 1
E 127 = -1 E = 127 -1 = 126 = 01111110two
S=0

10

00111111010000000000000000000000 =
Comp Sci 251 -- Float
0x3F400000

Example 0.1ten - Check float.a


0.1ten = 0.00011two
= 1.10011two x 2 -4 = 1.F x 2 E-127
F = 10011

-4 = E 127

E = 127 -4 = 123 = 01111011two

11

00111101110011001100110011001100110011

Comp Sci 251 -- Float

IEEE Double precision standard

11

52

not 000 (decimal 0) or 111(decimal


2047)
Normalized rule: number represented is
(-1)S1.F2E-1023
12

Comp Sci 251 -- Float

Special-case numbers

Problem:

Solution:

make exceptions to the rule

Bit patterns reserved for unusual numbers:

13

hidden 1 prevents representation of 0

E = 000
E = 111
Comp Sci 251 -- Float

Special-case numbers
Zeroes:
0

000

000

000

000

111

000

111

000

+0
-0

Infinities:

14

+
-
Comp Sci 251 -- Float

Denormalized numbers

No hidden 1
Allows numbers very close to 0
E = 000 Different interpretation applies
Denormalization rule: number represented is
(-1)S0.F2-126 (single-precision)
(-1)S0.F2-1022 (double-precision)

15

Note: zeroes follow this rule

Not a Number (NaN): E = 111; F != 000


Comp Sci 251 -- Float

IEEE 754 summary


E

= 000, F = 000 0
E = 000, F 000 denormalized
0000
E

< E < 111 normalized

= 111

F = 000 infinities
F 000 NaN
16

Comp Sci 251 -- Float

You might also like