0% found this document useful (0 votes)
9 views

Lecture 4

Uploaded by

jam khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 4

Uploaded by

jam khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Representing and Manipulating

Information
Floating-Point Number Representation
 A floating-point number (or real number) can represent a very large value
(e.g., 1.23×10^88)
 or a very small value (e.g., 1.23×10^-88).
 It could also represent very large negative number (e.g., -1.23×10^88)
 and very small negative number (e.g., -1.23×10^-88), as well as zero, as
illustrated:
Floating-Point Number Representation
 A floating-point number is typically expressed in the scientific notation,
 With a fraction (F), and an exponent (E) of a certain radix (r), in the form of
F×r^E.
 Decimal numbers use radix of 10 (F×10^E);
 While binary numbers use radix of 2 (F×2^E).
 Representation of floating point number is not unique.
 For example, the number 55.66 can be represented as 5.566×101,
0.5566×102, 0.05566×103, and so on.
Floating-Point Number Representation
 The fractional part can be normalized.
 In the normalized form, there is only a single non-zero digit before the
radix point.
 For example, decimal number 123.4567 can be normalized as
1.234567×102;
 binary number 1010.1011B can be normalized as 1.0101011B×23.
Floating-Point Number Representation
 It is important to note that floating-point numbers suffer from loss of
precision.
 When represented with a fixed number of bits (e.g., 32-bit or 64-bit).
 This is because there are infinite number of real numbers (even within a
small range of says 0.0 to 0.1).
 On the other hand, a n-bit binary pattern can represent a finite 2n distinct
numbers.
 Hence, not all the real numbers can be represented.
 The nearest approximation will be used instead, resulted in loss of
accuracy.
Floating-Point Number Representation
 Floating number arithmetic is very much less efficient than integer
arithmetic.
 It could be speed up with a so-called dedicated floating-point co-
processor.
 Hence, use integers if your application does not require floating-point
numbers.
Floating-Point Number Representation
 In computers, floating-point numbers are represented in scientific
notation of fraction (F) and exponent (E) with a radix of 2, in the form of
F×2^E.
 Both E and F can be positive as well as negative.
 Modern computers adopt IEEE 754 standard for representing floating-
point numbers.
 There are two representation schemes: 32-bit single-precision and 64-bit
double-precision.
IEEE-754 32-bit Single-Precision Floating-Point
Numbers
 In 32-bit single-precision floating-point representation:
 The most significant bit is the sign bit (S),
 with 0 for positive numbers and 1 for negative numbers.
 The following 8 bits represent exponent (E).
 The remaining 23 bits represents fraction (F).
Normalized Form
 Let's illustrate with an example, suppose that the 32-bit pattern is,
 1 1000 0001 011 0000 0000 0000 0000 0000, with:
 S=1
 E = 1000 0001
 F = 011 0000 0000 0000 0000 0000
Normalized Form
 In the normalized form, the actual fraction is normalized with an implicit
leading 1 in the form of 1.F.
 In this example, the actual fraction is 1.011 0000 0000 0000 0000 0000 = 1
+ 1×2-2 + 1×2-3 = 1.375D.
 The sign bit represents the sign of the number,
 with S=0 for positive and S=1 for negative number.
 In this example with S=1, this is a negative number, i.e., -1.375D.
Normalized Form
 The exponent field is interpreted as representing a signed integer in biased
form.
 That is, the exponent value is E = e − Bias,
 where e is the unsigned number having bit representation ek−1 . . . e1e0
 and Bias is a bias value equal to 2k−1 − 1.
 This yields exponent ranges from −126 to +127.
Normalized Form
 Why set the bias this way for denormalized values?
 Having the exponent value be 1 − Bias rather than simply −Bias.
 it provides for smooth transition from denormalized to normalized values.
Normalized Form
 In this example, E=e-127=129-127=2D.
 Hence, the number represented is -1.375×22=-5.5D.
De-Normalized Form
 Normalized form has a serious problem, with an implicit leading 1 for the
fraction,
 it cannot represent the number zero.
 When the exponent field is all zeros, the represented number is in
denormalized form.
 In this case, the exponent value is E = 1 − Bias.
 The value of the fraction field without an implied leading 1.
De-Normalized Form
 Denormalized numbers serve two purposes.
 First, they provide a way to represent numeric value 0,
 Since with a normalized number we must always have F ≥ 1, and hence we
cannot represent 0.
 In fact, the floating-point representation of +0.0 has a bit pattern of all
zeros: the sign bit is 0,
 the exponent field is all zeros (indicating a denormalized value), and the
fraction field is all zeros, giving F = 0.
De-Normalized Form
 when the sign bit is 1, but the other fields are all zeros, we get the value
−0.0.
 A second function of denormalized numbers is to represent numbers that
are very close to 0.0.
De-Normalized Form
 We can also represent very small positive and negative numbers in de-
normalized form with E=0.
 For example, if S=1, E=0, and F=011 0000 0000 0000 0000 0000.
 The actual fraction is 0.011=1×2-2+1×2-3=0.375D.
 Since S=1, it is a negative number.
 With E=0, the actual exponent is -126.
 Hence the number is -0.375×2-126 = -4.4×10-39,
 which is an extremely small negative number (close to zero).
Special Values
 A final category of values occurs when the exponent field is all ones.
 When the fraction field is all zeros, the resulting values represent infinity,
 either +∞ when s = 0 or −∞ when s = 1.
 Infinity can represent results that overflow, as when we multiply two very
large numbers, or when we divide by zero.
 When the fraction field is nonzero, the resulting value is called a NaN,
short for “not a number.”
IEEE-754 64-bit Double-Precision Floating-
Point Numbers
 The representation scheme for 64-bit double-precision is similar to the 32-
bit single-precision:
 The most significant bit is the sign bit (S), with 0 for positive numbers and
1 for negative numbers.
 The following 11 bits represent exponent (E).
 The remaining 52 bits represents fraction (F).
IEEE-754 64-bit Double-Precision Floating-
Point Numbers
 The value (N) is calculated as follows:
 Normalized form: For 1 ≤ E ≤ 2046, N = (-1)^S × 1.F × 2^(E-1023).
 Denormalized form: For E = 0, N = (-1)^S × 0.F × 2^(-1022). These are in
the denormalized form.
 For E = 2047, N represents special values, such as ±INF (infinity), NaN (not
a number).

You might also like