Monash University >
School
of Computer Science and Software Engineering >
CSE1303 >
Part B >
Lectures > Lecture B04 notes
CSE1303 Computer Science
Semester 2, 2003
Part B
Lecture B04 notes: Floating point
In this lecture
- Real numbers
- Alternative representation
- Rational numbers
- Fixed point
- Floating point
- Approximation of scientific notation
- Sign
- Signed magnitude
- 0 = positive, 1 = negative
- Mantissa (significand)
- Fixed point
- Implicit leading 1
- Exponent
- Normalized representation
- C support for floating point
- float (8-bit exponent, 23-bit mantissa)
- double (11-bit exponent, 52-bit mantissa)
- long double (15-bit exponent, 64-bit mantissa)
- Limitations of floating point
- Exponent size is fixed: limits range
- Mantissa size is fixed: limits precision
- Addition not associative
- Comparing floating-point numbers
- Compare sign
- Then compare exponent
- Then compare mantissa
- Multiplying floating-point numbers
- Sign (exclusive OR)
- Exponent (add)
- Mantissa (multiply)
- Renormalize result
- Adding floating-point numbers
- Examine signs, possibly switch to subtraction
- sign: same as of larger number
- exponent: same as of large number
- mantissa: temporarily denormalize, then add
- Renormalize result
[ Top |
Home ]
Last modified 2002-12-04