0% found this document useful (0 votes)
121 views3 pages

Floating Point Numbers: Scientific Notation

The document discusses scientific notation and how it can be used to represent large and small numbers compactly using a mantissa and exponent. It then describes how floating point numbers are represented in computers using the IEEE 754 standard, which specifies the use of a sign bit, exponent field, and mantissa for float and double data types in Java. Round-off errors can occur when converting decimal numbers like 1/3 to their binary floating point representation.

Uploaded by

clarence
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views3 pages

Floating Point Numbers: Scientific Notation

The document discusses scientific notation and how it can be used to represent large and small numbers compactly using a mantissa and exponent. It then describes how floating point numbers are represented in computers using the IEEE 754 standard, which specifies the use of a sign bit, exponent field, and mantissa for float and double data types in Java. Round-off errors can occur when converting decimal numbers like 1/3 to their binary floating point representation.

Uploaded by

clarence
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Floating Point Numbers

Scientific Notation

Scientific notation allows us to represent large and small fractions using a compact notation:
Avogadro's Number = A = 6.023 x 1023 = 602, 300, 000, 000, 000, 000,
000, 000 = M x BE
Planck's constant = 6.626068 x 10-34 = .00000000000000000000000006626068
= M x BE

where:
M = Mantissa
B = Base
E = Exponent

Notice that the representation isn't unique. For example:


A = 60.23 x 1022

When we specify that there is only one digit to the left of the decimal point this is called normalized

scientific notation.

In general, any number can be written as a power of 10, but where negative exponents are allowed:
6.023 = 6 x 100 + 0 x 10-1 + 2 x 10-2 + 3 x 10-3

Base 2 Scientific Notation

Base 2 scientific notation follows the same pattern, where we note that adding a 0 (shifting left)

means multiplying by 2 and removing a 0 (shifting right) means dividing by 2.

Example
[42]2 = 101010.0000 = 1.0101 x 25
[21]2 = 1.0101 x 24
[10.5]2 = 1.0101 x 23
[5.25]2 = 1.0101 x 22
[2.625]2 = 1.0101 x 21
[1.3125]2 = 1.0101 x 20
[0.65625]2 = 1.0101 x 2-1

Example
32 = 1.00000 x 25
16 = 1.00000 x 24
8 =� 1.00000 x 23
4 =� 1.00000 x 22
2 =� 1.00000 x 21
1 =� 1.00000 x 20
.5 = 1.00000 x 2-1
.25 = 1.00000 x 2-2

IEEE Standards

The Java virtual machine has to floating point types: float and double.

Java floats are represented using the 32 bit IEEE 754-1985 floating point standard:

Where:
sign = 1 bit = 0 (positive) or 1 (negative)
Exponent = 8 bit biased integer = actual exponent + 127
Mantissa = 23 bit unsigned integer following 1.

So the conversion formula is:


� 127
[F]2 = -1sign x 1.Mantissa x 2Exponent

Java doubles are a 64 bit version of this pattern.

Examples
[.25]2 = -10 x 1.00000000000000000000000 x 2125 =
0,01111101,00000000000000000000000
[-42.0]2 = -11 x 1.01010000000000000000000 x 2132 =
1,10000100,01010000000000000000000

There are a number of special cases. For example:


[0.0]2 = 00000000000000000000000000000000
Float.NaN
Float.POSITIVE_INFINITY
Float.NEGATIVE_INFINITY
 

Here's a nice conversion tool:


http://www.h-schmidt.net/FloatApplet/IEEE754.html
Round-Off Errors
[1/3]2 = 1/4 + [1/3 � 1/4]2 = 1/4 + [1/12]2 = 1/4 + 1/16 + [1/12 �
1/16]2 = 1/4 + 1/16 + [1/48]2 = 1/4 + 1/16 + 1/64 + ...
= 0.010101010101...

This base 2 expansion goes on forever, so we just round it off:


= -10 x 1.01010101010101010101010 x 2125 =
0,01111101,01010101010101010101010 = [0.3333333]2

These round-off errors can accumulate in a lengthy calculation.

Arithmetic

Multiplying and dividing floats isn't too bad, but adding and subtracting can be hard.

 
 

You might also like