package onnx
- Alphabetic
- Public
- Protected
Type Members
- final class Float16 extends AnyVal
Float16 represents 16-bit floating-point values.
Float16 represents 16-bit floating-point values.
This type does not actually support arithmetic directly. The expected use case is to convert to Float to perform any actual arithmetic, then convert back to a Float16 if needed.
Binary representation:
sign (1 bit) | | exponent (5 bits) | | | | mantissa (10 bits) | | | x xxxxx xxxxxxxxxx
Value interpretation (in order of precedence, with _ wild):
0 00000 0000000000 (positive) zero 1 00000 0000000000 negative zero _ 00000 subnormal number _ 11111 0000000000 +/- infinity _ 11111 not-a-number _ _ normal number
For non-zero exponents, the mantissa has an implied leading 1 bit, so 10 bits of data provide 11 bits of precision for normal numbers.