REAL
signature
signature REAL
structure Real
:> REAL
where type real = real
structure LargeReal
:> REAL
structure Real<N>
:> REAL (* OPTIONAL *)
The REAL
signature specifies structures that implement floating-point numbers. The semantics of floating-point numbers should follow the IEEE standard 754-1985 [CITE] and the ANSI/IEEE standard 854-1987[CITE]. In addition, implementations of the REAL
signature are required to use non-trapping semantics. Additional aspects of the design of the REAL
and MATH
signatures were guided by the Floating-Point C Extensions[CITE] developed by the X3J11 ANSI committee and the lecture notes [CITE] by W. Kahan on the IEEE standard 754.
Although there can be many representations for NaN values, the Library models them as a single value and currently provides no explicit way to distinguish among them, ignoring the sign bit. Thus, in the descriptions below and in the Math
structure, we just refer to the NaN value.
type real
structure Math : MATH
where type real = real
val radix : int
val precision : int
val maxFinite : real
val minPos : real
val minNormalPos : real
val posInf : real
val negInf : real
val + : real * real -> real
val - : real * real -> real
val * : real * real -> real
val / : real * real -> real
val rem : real * real -> real
val *+ : real * real * real -> real
val *- : real * real * real -> real
val ~ : real -> real
val abs : real -> real
val min : real * real -> real
val max : real * real -> real
val sign : real -> int
val signBit : real -> bool
val sameSign : real * real -> bool
val copySign : real * real -> real
val compare : real * real -> order
val compareReal : real * real -> IEEEReal.real_order
val < : real * real -> bool
val <= : real * real -> bool
val > : real * real -> bool
val >= : real * real -> bool
val == : real * real -> bool
val != : real * real -> bool
val ?= : real * real -> bool
val unordered : real * real -> bool
val isFinite : real -> bool
val isNan : real -> bool
val isNormal : real -> bool
val class : real -> IEEEReal.float_class
val toManExp : real -> {man : real, exp : int}
val fromManExp : {man : real, exp : int} -> real
val split : real -> {whole : real, frac : real}
val realMod : real -> real
val nextAfter : real * real -> real
val checkFloat : real -> real
val realFloor : real -> real
val realCeil : real -> real
val realTrunc : real -> real
val realRound : real -> real
val floor : real -> int
val ceil : real -> int
val trunc : real -> int
val round : real -> int
val toInt : IEEEReal.rounding_mode -> real -> int
val toLargeInt : IEEEReal.rounding_mode
-> real -> LargeInt.int
val fromInt : int -> real
val fromLargeInt : LargeInt.int -> real
val toLarge : real -> LargeReal.real
val fromLarge : IEEEReal.rounding_mode
-> LargeReal.real -> real
val fmt : StringCvt.realfmt -> real -> string
val toString : real -> string
val scan : (char, 'a) StringCvt.reader
-> (real, 'a) StringCvt.reader
val fromString : string -> real option
val toDecimal : real -> IEEEReal.decimal_approx
val fromDecimal : IEEEReal.decimal_approx -> real option
type real
real
is not an equality type.
val radix : int
val precision : int
0
and radix
-1
, in the mantissa. Note that the precision includes the implicit (or hidden) bit used in the IEEE representation (e.g., the value of Real64.precision
is 53
).
val maxFinite : real
val minPos : real
val minNormalPos : real
val posInf : real
val negInf : real
r1 + r2
r1 - r2
r1 * r2
r1 / r2
NaN
and +-infinity / +-infinity = NaN
. Dividing a finite, non-zero number by a zero, or an infinity by a finite number produces an infinity with the correct sign. (Note that zeros are signed.) A finite number divided by an infinity is 0 with the correct sign.
rem (x, y)
trunc
(x / y). The result has the same sign as x and has absolute value less than the absolute value of y.
If x is an infinity or y is 0, rem
returns NaN. If y is an infinity, rem
returns x.
*+ (a, b, c)
*- (a, b, c)
a*b + c
and a*b - c
, respectively. Their behaviors on infinities follow from the behaviors derived from addition, subtraction, and multiplication.
The precise semantics of these operations depend on the language implementation and the underlying hardware. Specifically, certain architectures provide these operations as a single instruction, possibly using a single rounding operation. Thus, the use of these operations may be faster than performing the individual arithmetic operations sequentially, but may also cause different rounding behavior.
~ r
~
(+-infinity) = -+infinity.
abs r
abs
(+-0.0) = +0.0abs
(+-infinity) = +infinityabs
(+-NaN) = +NaN
val min : real * real -> real
val max : real * real -> real
sign r
Domain
on NaN.
signBit r
true
if and only if the sign of r (infinities, zeros, and NaN, included) is negative.
sameSign (r1, r2)
true
if and only if signBit
r1
equals signBit
r2
.
copySign (x, y)
val compare : real * real -> order
val compareReal : real * real -> IEEEReal.real_order
compare
returns LESS
, EQUAL
, or GREATER
according to whether its first argument is less than, equal to, or greater than the second. It raises IEEEReal.Unordered
on unordered arguments.
The function compareReal
behaves similarly except that the values it returns have the extended type IEEEReal.real_order
and it returns IEEEReal.UNORDERED
on unordered arguments.
Implementation note:
Implementations should try to optimize use of
compare
, since it is necessary for catching NaNs.
val < : real * real -> bool
val <= : real * real -> bool
val > : real * real -> bool
val >= : real * real -> bool
true
if the corresponding relation holds between the two reals.
Note that these operators return false
on unordered arguments, i.e., if either argument is NaN, so that the usual reversal of comparison under negation does not hold, e.g., a < b
is not the same as not (a >= b)
.
== (x, y)
!= (x, y)
true
if and only if neither y nor x is NaN, and y and x are equal, ignoring signs on zeros. This is equivalent to the IEEE =
operator.
The second function !=
is equivalent to not o op ==
and the IEEE ?<>
operator.
val ?= : real * real -> bool
true
if either argument is NaN or if the arguments are bitwise equal, ignoring signs on zeros. It is equivalent to the IEEE ?=
operator.
unordered (x, y)
true
if x and y are unordered, i.e., at least one of x and y is NaN.
isFinite x
true
if x is neither NaN nor an infinity.
isNan x
true
if x is NaN.
isNormal x
true
if x is normal, i.e., neither zero, subnormal, infinite nor NaN.
class x
IEEEReal.float_class
to which x belongs.
toManExp r
{man, exp}
, where man and exp are the mantissa and exponent of r, respectively. Specifically, we have the relation
r = man * radix
(exp)
where 1.0 <= man * radix < radix. This function is comparable to frexp
in the C library.
If r is +-0, man is +-0 and exp is +0. If r is +-infinity, man is +-infinity and exp is unspecified. If r is NaN, man is NaN and exp is unspecified.
fromManExp {man, exp}
radix
(exp). This function is comparable to ldexp
in the C library. Note that, even if man is a non-zero, finite real value, the result of fromManExp
can be zero or infinity because of underflows and overflows.
If man is +-0, the result is +-0. If man is +-infinity, the result is +-infinity. If man is NaN, the result is NaN.
split r
realMod r
{whole, frac}
, where frac and whole are the fractional and integral parts of r, respectively. Specifically, whole is integral, |frac| < 1.0, whole and frac have the same sign as r, and r = whole + frac. This function is comparable to modf
in the C library.
If r is +-infinity, whole is +-infinity and frac is +-0. If r is NaN, both whole and frac are NaN.
realMod
is equivalent to #frac o split
.
nextAfter (r, t)
nextAfter
returns the largest representable floating-point number less than r. If r = t
then it returns r. If either argument is NaN, this returns NaN. If r is +-infinity, it returns +-infinity.
checkFloat x
Overflow
if x is an infinity, and raises Div
if x is NaN. Otherwise, it returns its argument.
This can be used to synthesize trapping arithmetic from the non-trapping operations given here. Note, however, that infinities can be converted to NaNs by some operations, so that if accurate exceptions are required, checks must be done after each operation.
realFloor r
realCeil r
realTrunc r
realRound r
realFloor
produces floor(r), the largest integer not larger than r. realCeil
produces ceil(r), the smallest integer not less than r. realTrunc
rounds r towards zero, and realRound
rounds to the integer-values real value that is nearest to r. If r is NaN or an infinity, these functions return r.
floor r
ceil r
trunc r
round r
floor
produces floor(r), the largest int
not larger than r. ceil
produces ceil(r), the smallest int
not less than r. trunc
rounds r towards zero. round
yields the integer nearest to r. In the case of a tie, it rounds to the nearest even integer. They raise Overflow
if the resulting value cannot be represented as an int
, for example, on infinity. They raise Domain
on NaN arguments.
These are respectively equivalent to:
toInt IEEEReal.TO_NEGINF r toInt IEEEReal.TO_POSINF r toInt IEEEReal.TO_ZERO r toInt IEEEReal.TO_NEAREST r
toInt mode x
toLargeInt mode x
Overflow
if the result is not representable, in particular, if x is an infinity. They raise Domain
if the input real is NaN.
fromInt i
fromLargeInt i
real
value. If the absolute value of i is larger than maxFinite
, then the appropriate infinity is returned. If i cannot be exactly represented as a real
value, then the current rounding mode is used to determine the resulting value. The top-level function real
is an alias for Real.fromInt
.
toLarge r
fromLarge r
real
and type LargeReal.real
. If r is too small or too large to be represented as a real
, fromLarge
will convert it to a zero or an infinity.
fmt spec r
toString r
fmt
is parameterized by spec, which has the following forms and interpretations.
SCI arg
[~]?[0-9].[0-9]+?E[0-9]+where there is always one digit before the decimal point, nonzero if the number is nonzero. arg specifies the number of digits to appear after the decimal point, with 6 the default if arg is
NONE
. If arg is SOME
(0)
, no fractional digits and no decimal point are printed.
FIX arg
[~]?[0-9]+.[0-9]+?arg specifies the number of digits to appear after the decimal point, with 6 the default if arg is
NONE
. If arg is SOME
(0)
, no fractional digits and no decimal point are printed.
GEN arg
NONE
.
EXACT
IEEEReal.toString
for a complete description of this format.
"inf"
and "~inf"
, respectively, and NaN values are converted to the string "nan".
Refer to StringCvt.realfmt
for more details concerning these formats, especially the adaptive format GEN
.
fmt
raises Size
if spec is an invalid precision, i.e., if spec is
fmt spec
is evaluated.
The fmt
function allows the user precise control as to the form of the resulting string. Note, therefore, that it is possible for fmt
to produce a result that is not a valid SML string representation of a real value.
The value returned by toString
is equivalent to:
(fmt (StringCvt.GEN NONE) r)
scan getc strm
fromString s
real
value from character source. The first version reads from ARG/strm/ using reader getc, ignoring initial whitespace. It returns SOME
(r,rest)
if successful, where r is the scanned real
value and rest is the unused portion of the character stream strm. Values of too large a magnitude are represented as infinities; values of too small a magnitude are represented as zeros.
The second version returns
if a SOME
(r)real
value can be scanned from a prefix of s, ignoring any initial whitespace; otherwise, it returns NONE
. This function is equivalent to
.
StringCvt.scanString
scan
The functions accept real numbers with the following format:
[+~-]?([0-9]+.[0-9]+? | .[0-9]+)(e | E)[+~-]?[0-9]+?It also accepts the following string representations of non-finite values:
[+~-]?(inf | infinity | nan)where the alphabetic characters are case-insensitive.
toDecimal r
fromDecimal d
real
values and decimal approximations. Decimal approximations are to be converted using the IEEEReal.TO_NEAREST
rounding mode. toDecimal
should produce only as many digits as are necessary for fromDecimal
to convert back to the same number. In particular, for any normal or subnormal real value r, we have the bit-wise equality:
fromDecimal (toDecimal r) = r.
For toDecimal
, when the r is not normal or subnormal, then the exp
field is set to 0 and the digits
field is the empty list. In all cases, the sign
and class
field capture the sign and class of r.
For fromDecimal
, if class
is ZERO
or INF
, the resulting real is the appropriate signed zero or infinity. If class
is NAN
, a signed NaN is generated. If class
is NORMAL
or SUBNORMAL
, the sign
, digits
and exp
fields are used to produce a real number whose value is.
s * 0.d(1)d(2)...d(n) 10(exp)where
digits
= [d(1), d(2), ..., d(n)] and where s is -1 if sign
is true
and 1 otherwise. Note that the conversion itself should ignore the class
field, so that the resulting value might have class NORMAL
, SUBNORMAL
, ZERO
, or INF
. For example, if digits
is empty or a list of all 0's, the result should be a signed zero. More generally, very large or small magnitudes are converted to infinities or zeros.
If the argument to fromDecimal
does not have a valid format, i.e., if the digits
field contains integers outside the range [0,9], it returns NONE
.
Implementation note:
Algorithms for accurately and efficiently converting between binary and decimal real representations are readily available, e.g., see the technical report by Gay[CITE].
IEEEReal
,MATH
,StringCvt
If LargeReal
is not the same as Real
, then there must be a structure Real<N>
equal to LargeReal
.
The sign of a zero is ignored in all comparisons.
Unless specified otherwise, any operation involving NaN will return NaN.
Note that, if x is real, ~x
is equivalent to ~(x)
, that is, it is identical to x but with its sign bit flipped. In particular, the literal ~0.0
is just 0.0
with its sign bit set. On the other hand, this might not be the same as 0.0-0.0
, in which rounding modes come into play.
Except for the *+
and *-
functions, arithmetic should be done in the exact precision specified by the precision
value. In particular, arithmetic must not be done in some extended precision and then rounded.
The relation between the comparison predicates defined here and those defined by IEEE, ANSI C, and FORTRAN is specified in the following table.
SML | IEEE | C | FORTRAN |
---|---|---|---|
== | = | == | .EQ. |
!= | ?<> | != | .NE. |
< | < | < | .LT. |
<= | <= | <= | .LE. |
> | > | > | .GT. |
>= | >= | >= | .GE. |
?= | ?= | !islessgreater | .UE. |
not o ?= | <> | islessgreater | .LG. |
unordered | ? | isunordered | unordered |
not o unordered | <=> | !isunordered | .LEG. |
not o op < | ?>= | ! < | .UGE. |
not o op <= | ?> | ! <= | .UG. |
not o op > | ?<= | ! > | .ULE. |
not o op >= | ?< | ! >= | .UL. |
Implementation note:
Implementations may choose to provide a debugging mode, in which NaNs and infinities are detected when they are generated.
Rationale:
The specification of the default signature and structure for non-integer arithmetic, particularly concerning exceptional conditions, was the source of much debate, given the desire of supporting efficient floating-point modules. If we permit implementations to differ on whether or not, for example, to raise
Div
on division by zero, the user really would not have a standard to program against. Portable code would require adopting the more conservative position of explicitly handling exceptions. A second alternative was to specify that functions in theReal
structure must raise exceptions, but that implementations so desiring could provide additional structures matchingREAL
with explicit floating-point semantics. This was rejected because it meant that the defaultreal
type would not be the same as a defined floating-pointreal
type. This would give a second-class status to the latter, while providing the default real with worse performance and involving additional implementation complexity for little benefit.Deciding if
real
should be an equality type, and if so, what should equality mean, was also problematic. IEEE specifies that the sign of zeros be ignored in comparisons, and that equality evaluate to false if either argument is NaN. These constraints are disturbing to the SML programmer. The former implies that0 = ~0
is true whiler/0 = r/~0
is false. The latter implies such anomalies asr = r
is false, or that, for a ref cellrr
, we could haverr = rr
but not have!rr = !rr
. We accepted the unsigned comparison of zeros, but felt that the reflexive property of equality, structural equality, and the equivalence of<>
andnot o =
ought to be preserved. Additional complications led to the decision to not havereal
be an equality type.The type, signature, and structure identifiers
real
,REAL
, andReal
, although misnomers in light of the floating-point-specific nature of the modules, were retained for historical reasons.
Generated April 12, 2004
Last Modified May 25, 2000
Comments to John Reppy.
This document may be distributed freely over the internet as long as the copyright notice and license terms below are prominently displayed within every machine-readable copy.
Copyright © 2004 AT&T and Lucent Technologies. All rights reserved.
Permission is granted for internet users to make one paper copy for their
own personal use. Further hardcopy reproduction is strictly prohibited.
Permission to distribute the HTML document electronically on any medium
other than the internet must be requested from the copyright holders by
contacting the editors.
Printed versions of the SML Basis Manual are available from Cambridge
University Press.
To order, please visit
www.cup.org (North America) or
www.cup.cam.ac.uk (outside North America). |