Introduction to IT
Lecture 3:
Representation of real numbers; FP2 codes; IEEE 754 standard.
Sigh-Magnitude Representation
This code is used to represent signed, both positive and negative, integers. In
the sign-magnitude representation, all bits of the number except the most
significant one have the same meaning as in the case of interger binary code
representation. The most significant bit is the sign bit: 0 - indicates a positive
number, 1 - indicates a negative number.
bn-1 bn-2 bn-3 ... b2 b1 b0
Sign bit Number magnitude
1 011
0 011
Examples:
1101SM = -510
0101SM = 510
Sigh-Magnitude Representation
00110110SM = (-1)0 * (0*26+1*25+1*24+0*23+1*22+1*21+0*20) = 5410
10110111SM = (-1)1 x (25 + 24 + 22 + 21 + 20) = - (32 + 16 + 4 + 2 + 1) = - 5510
Although simple, the SM representation is complicated when performing arithmetic
operations. In particular, the sign bit has to be dealt with separately from the
magnitude bits.
Example: addition of +18 (010010) and – 19(110011) using SM representation.
Signs are different, the result should carry the sign of the larger number in magnitude
– (-19).
The remaining 5-bit numbers are subtracted (10011 – 10010)= (00001), that is -1.
The range of SM numbers:
4 bit: from -7 = 1111SM till +7 = 0111SM
8 bit: from – 127 = 1 111 1111SM till +127 = 0 111 1111SM
Radix Complement
A positive number is represented the same way as in SM.
A negative number is represented using the b’s complement (for base b numbers).
The most significant bit has a weight of (-2n-1), where n is the number of bits in the
number notation.
2’s complement of (-19):
1 – 19 – 010011
2 – each digit is complemented (negated) – 101100 +1
3 – „1” is added to the least significant bit – 101101
2’s complement of (+18):
It is positive, the same as SM – 010010
Addition of these numbers:
101101 +
010010 =
111111, this is the 2’s complement representation of (-1) –
1*(-25) + 1*24+1*23+1*22+1*21+1*20 = -32+16+8+4+2+1 = -1(10)
Radix Complement
Advantage:
No special treatment is needed for the sign of the numbers;
A carry coming out of the most significant bit while performing arithmetic
operations is ignored without affecting the correctness of the result.
Adding (-19) – 101101 and +26 – 011010:
101101 +
011010
000111 with a carry bit (1) which is ignored
Result – 000111 - +710
11 1111
Diminished Radix Complement
No „1” is added to the least significant bit after complementing.
(-19) - 101100
(+18) – 010010
The most significant bit has a weight of (-2n-1 + 1), where n is the number of bits
in the number notation
Addition result – 111110 = 1*(-25+1) + 1*24+1*23+1*22+1*21+0*20= -31 + 16+8 +
4 + 2 = -1(10)
Disadvantage:
The need for a correction factor whenever a carry is obtained from the most
significant bit while performing arithmetic operations.
Adding -3 (111100) to + 18 (010010)
Result – (1) (001110), then add the carry to the least significant bit – 001111 –
which +15 – a correct result.
Fixed - point numbering system
A fixed position of the radix point, separating the integer part from the
fractional part of a a number.
Weight 103 102 101 100 10-1 10-2 10-3 10-4 10-5
Digits 25 6 8 , 83954
Position number 3 2 1 0 -1 -2 -3 -4 -5
Interger part Fractional part
Conversion a fractional number from binary to decimal:
- multiply the digits of the number by powers of two. In the fractional part,
the negative powers of 2 are the multiplier.
- calculating the value of a fixed-point number according to the basic
procedure requires calculating the fractions. There is, however, a
simplification: to treat the fractional part as a integer part and multiply it by
the weight of the last digit of the fixed-point notation.
Fixed - point numbering system
101011,01101(2)
101011 = 1*25+0*24+1*23+0*22+1*21+1*20 = 32+8+2+1=43
01101 = 1*23+1*22+1*20= 8+4+1=13
13*2-5=13*1/32=13/32
101011,01101(2) = 43*13/32
Conversion of a fractional number from decimal to binary
When converting a fractional number from decimal to binary, we multiply the
fractional part by 2 retaining the obtained integers as the reguired digits until the
fractional part is zero or the result is an infinite fraction.
The solution is a number made up of its integer and fractional parts.
It should be noticed, however, that the fractional part conversion may not
terminate after a finite number of repeated multiplications. Therefore the proces
may have to be terminated after a numer of steps, thus leading to some
acceptable approximation.
Fixed - point numbering system
4,1875(10)
Integer part: 4(10) = 100(2)
Fractional part
Number Multiplication Result
0,1875 0,1875 * 2 0,375
0,375 0,375 * 2 0,75
0,75 0,75 * 2 1,5
1,5 0,5 * 2 1,0
1,0 0
Result: 4,1875(10) = 100,0011(2)
7,575(10)
Integer part: 7(10) = 111(2) Result – 111.10010011…..
Number Multiplication Result
0,575 0,575 * 2 1,150
0,150 0,150 * 2 0,300
0,300 0,300 * 2 0,600
0,600 0,6 * 2 1,2
0,2 0,2*2 0,4
0,4 0,4*2 0,8
0,8 0,8*2 1,6
0,6 0,6*2 1,2
…………………………….
Floating Point Numbers
Floating-point representation - basically represents reals in scientific notation.
Scientific notation represents numbers as a base number and an exponent. For
example, 123.456 could be represented as 1.23456 × 102.
Floating-point solves a number of representation problems. Fixed-point has a fixed
window of representation, which limits it from representing very large or very
small numbers. Also, fixed-point is prone to a loss of precision when two large
numbers are divided.
Floating-point, on the other hand, employs a sort of "sliding window" of precision
appropriate to the scale of the number. This allows it to represent numbers from
1,000,000,000,000 to 0.0000000000000001 with ease.
IEEE floating point numbers have three basic components:
- the sign,
- the exponent, and
- the mantissa. The mantissa is composed of the fraction and an implicit leading
digit. The exponent base (2) is implicit and need not be stored.
Floating Point Numbers
Layout:
Sign Exponent Fraction Bias
Single Precision 1 [31] 8 [30-23] 23 [22-00] 127
Double Precision 1 [63] 11 [62-52] 52 [51-00] 1023
The Sign Bit
The sign bit: 0 denotes a positive number; 1 denotes a negative number. Flipping the
value of this bit flips the sign of the number.
The Exponent
The exponent field needs to represent both positive and negative exponents. To do
this, a bias is added to the actual exponent in order to get the stored exponent. For
IEEE single-precision floats, this value is 127. Thus, an exponent of zero means that
127 is stored in the exponent field. A stored value of 200 indicates an exponent of
(200-127), or 73. For special reasons, exponents of -127 (all 0s) and +128 (all 1s) are
reserved for special numbers.
For double precision, the exponent field is 11 bits, and has a bias of 1023.
Floating Point Numbers
The Mantissa
The mantissa, also known as the significand, represents the precision bits of
the number. It is composed of an implicit leading bit and the fraction bits.
To find out the value of the implicit leading bit, consider that any number can
be expressed in scientific notation in many different ways. For example, the
number five can be represented as any of these:
5.00 × 100
0.05 × 102
5000 × 10-3
In order to maximize the quantity of representable numbers, floating-point
numbers are typically stored in normalized form. This basically puts the radix
point after the first non-zero digit. In normalized form, five is represented as
5.0 × 100.
A nice little optimization is available to us in base two, since the only possible
non-zero digit is 1. Thus, we can just assume a leading digit of 1, and don't
need to represent it explicitly. As a result, the mantissa has effectively 24 bits
of resolution, by way of 23 fraction bits.
The Conversion Procedure (Dec to FP)
The rules for converting a decimal number into floating point are as follows:
Convert the absolute value of the number to binary, perhaps with a fractional part
after the binary point. This can be done by converting the integral and fractional
parts separately. The integral part is converted with the techniques examined
previously. The fractional part can be converted by multiplication. This is basically
the inverse of the division method: we repeatedly multiply by 2, and harvest each
one bit as it appears left of the decimal.
Append × 20 to the end of the binary number (which does not change its value).
Normalize the number. Move the binary point so that it is one bit from the left.
Adjust the exponent of two so that the value does not change.
Place the mantissa into the mantissa field of the number. Omit the leading one,
and fill with zeros on the right.
Add the bias to the exponent of two, and place it in the exponent field. For IEEE
32-bit, the bias is 127.
Set the sign bit, 1 for negative, 0 for positive, according to the sign of the original
number.
Example 1
Convert -1313.3125 to IEEE 32-bit floating point format.
a. The integral part is 131310 = 101001000012. The fractional:
0.3125 × 2 = 0.625 0 Generate 0 and continue.
0.625 × 2 = 1.25 1 Generate 1 and continue with the rest.
0.25 × 2 = 0.5 0 Generate 0 and continue.
0.5 × 2 = 1.0 1 Generate 1 and nothing remains.
b. So 1313.312510 = 10100100001.01012.
c. Normalize: 10100100001.01012 = 1.010010000101012 × 210.
d. Mantissa is 01001000010101000000000, exponent is 10 + 127 = 137 =
100010012, sign bit is 1.
So -1313.3125 is 11000100101001000010101000000000 = c4a42a0016
Example 2
Convert 0.1015625 to IEEE 32-bit floating point format.
a. Converting:
0.1015625 × 2 = 0.203125 0 Generate 0 and continue.
0.203125 × 2 = 0.40625 0 Generate 0 and continue.
0.40625 × 2 = 0.8125 0 Generate 0 and continue.
0.8125 × 2 = 1.625 1 Generate 1 and continue with the rest.
0.625 × 2 = 1.25 1 Generate 1 and continue with the rest.
0.25 × 2 = 0.5 0 Generate 0 and continue.
0.5 × 2 = 1.0 1 Generate 1 and nothing remains.
b. So 0.101562510 = 0.00011012.
c. Normalize: 0.00011012 = 1.1012 × 2-4.
d. Mantissa is 10100000000000000000000, exponent is -4 + 127 = 123 =
011110112, sign bit is 0.
So 0.1015625 is 00111101110100000000000000000000 = 3dd0000016
The Conversion Procedure (FP to Dec)
The rules for converting a floating point number into decimal are simply to reverse of the
decimal to floating point conversion:
If the original number is in hex, convert it to binary.
Separate into the sign, exponent, and mantissa fields.
Extract the mantissa from the mantissa field, and restore the leading one. You may
also omit the trailing zeros.
Extract the exponent from the exponent field, and subtract the bias to recover the
actual exponent of two. As before, 127 for the 32-bit.
De-normalize the number: move the binary point so the exponent is 0, and the value
of the number remains unchanged.
Convert the binary value to decimal. This is done just as with binary integers, but the
place values right of the binary point are fractions.
Set the sign of the decimal number according to the sign bit of the original floating
point number: make it negative for 1; leave positive for 0.
If the binary exponent is very large or small, you can convert the mantissa directly to
decimal without de-normalizing. Then use a calculator to raise two to the exponent,
and perform the multiplication. This will give an approximate answer, but is sufficient
in most cases.
Example 3
Convert the 32-bit floating point number 44361000 (in hex) to decimal.
1. Convert and separate: 4436100016 = 01000100001101100001000000000000 2
2. Exponent: 100010002 = 13610; 136 − 127 = 9.
3. Denormalize: 1.011011000012 × 29 = 1011011000.01.
4. Convert:
Exponents 29 28 27 26 25 24 23 22 21 20 2-1 2-2
Place Values 512 256 128 64 32 16 8 4 2 1 0.5 0.25
Bits 1 0 1 1 0 1 1 0 0 0 . 0 1
Value 512 + 128 + 64 + 16 + 8 + 0.25 = 728.25
5. Sign: positive
Result: 44361000 is 728.25.
Example 4
Convert the 32-bit floating point number be580000 (in hex) to decimal.
1. Convert and separate: be58000016 = 10111110010110000000000000000000 2
2. Exponent: 011111002 = 12410; 124 − 127 = -3.
3. Denormalize: 1.10112 × 2-3 = 0.0011011.
4. Convert:
Exponents 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7
Place Values 1 0.5 0.25 0.125 0.0625 0.03125 0.015625 0.0078125
Bits 0 . 0 0 1 1 0 1 1
Value 0.125 + 0.0625 + 0.015625 + 0.0078125 = 0.2109375
5. Sign: negative
Result: be580000 is -0.2109375.
Example 5
Convert the 32-bit floating point number 76650000 (in hex) to decimal.
1. Convert and separate: 7665000016 = 01110110011001010000000000000000 2
2. Exponent: 111011002 = 23610; 236 − 127 = 109.
3. Since the exponent is far from zero, convert the original (normalized) mantissa:
Exponents 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7
Place Values 1 0.5 0.25 0.125 0.0625 0.03125 0.015625 0.0078125
Bits 1 . 1 1 0 0 1 0 1
Value 1 + 0.5 + 0.25 + 0.03125 + 0.0078125 = 1.7890625
4. Use calculator to find 1.7890625 × 2109. You should get something like
1.16116794981 × 1033 .
5. Sign: positive
Result: 76650000 is about 1.16116794981 × 10 33 .
Putting it All Together
So, to sum up:
The sign bit is 0 for positive, 1 for negative.
The exponent's base is two.
The exponent field contains 127 plus the true exponent for
single-precision, or 1023 plus the true exponent for double
precision.
The first bit of the mantissa is typically assumed to be 1.f, where
f is the field of fraction bits.