# Questions tagged [floating-point]

Floating point numbers are approximations of real numbers that can represent larger ranges than integers but use the same amount of memory, at the cost of lower precision. If your question is about small arithmetic errors (e.g. why does 0.2 + 0.1 equal 0.300000001?) or decimal conversion errors, please read the "info" page linked below before posting.

1,636 questions with no upvoted or accepted answers
Filter by
Sorted by
Tagged with
610views

### problem with vulkan floating point behavior

I try to implement the paper Extended-Precision Floating-Point Numbers for GPU Computation by Andrew Thall, Alma College in a GLSL Vulkan compute shader. I need this because some of my devices don't ...
• 111
596views

### Are .NET Decimal type computations deterministic?

I have two questions regarding .NET's decimal data type determinism: Are decimal type computations cross-platform deterministic? Or in other words, will math operations on decimal type produce ...
• 460
774views

### How to avoid Numpy type conversions?

Is it possible to avoid or emit warnings for automatic Numpy type conversions from integer and 32 bit float arrays to 64 bit float arrays? My use case for this is that I'm developing a large analysis ...
• 2,420
1kviews

### Representing a float or a binary as a 32 bit signed integer in R

I've been given a task to write an API for the AR.Drone 2.0 in R. I know it's probably not the wisest choice of language as there are good validated APIs written in Python and JS, but I took the ...
• 355
37views

### How to parse floating point infinity from std::istream

I have written a superdumb serialization library for a project that I am working on. I just got bitten by floating point infinity, which I illustrate with the sample program below. I expect the ...
• 345
123views

### C (MIPS) - How to tell compiler load single-precision floats immidiates with GPRs?

Recently, I am trying to write some utilities for n64 with gcc and have some problems with it's optimization strategy. Please consider following example: // cctest.c extern struct { float x; ...
• 81
116views

### Probable bug in MSVC with compile-time NaN comparison

My colleague was doing some basic experiments with NaN and was puzzled by the behavior on Visual Studio that did not match his expectations. After discussion, it seems that he uncovered a probable ...
• 5,477
146views

### How to catch floating point errors early (right at where they occur)?

When developing floating-point heavy code, it is very useful to enable FPU exceptions. When an operation results in a NaN/inf, we could catch it immediately. For example, on Linux, I can enable this ...
• 27.4k
365views

### Metal SIMD Min and Max operations fail for floats

Question in short Why am I getting undefined behavior from simd_min and simd_max functions in Metal 2.1 with floats? Update: Seems this only occurs on the Radeon Pro 560X GPU, but not on the Intel ...
• 133
104views

### Rationale for range restriction of IEEE-754 compound function

The IEEE Std 754-2008 lists in Table 9.1 the recommended function compound(x,n) = (1+x)^n, with real x, integer n (where ^ is the power operator). The domain is specified as x in [-1, +infinity] and ...
• 1,111
112views

### Is there a way to force numpy.set_printoptions to show the exact float value?

Following question 59674518, is there a way for numpy.set_printoptions to ensure the EXACT float value is displayed, without displaying trailing zeros, and without knowing the value a priori? I have ...
• 571
34views

### fpclassify(): what are the examples of another implementation-defined categories?

N2479 C17..C2x working draft — February 5, 2020 ISO/IEC 9899:202x (E) (emphasis added): The fpclassify macro classifies its argument value as NaN, infinite, normal, subnormal, zero, or into another ...
• 4,132
115views

### Is there a bug in controlled rounding using `exp`?

I'm observing incorrect (IMO) rounding behaviour on some platforms as follows: Calculate the value of log(2) under rounding modes to FE_DOWNWARD and FE_UPWARD (see <fenv.h>). In all cases I've ...
384views

### How to preserve raster dataType in raster processing?

When doing raster math, for example raster1-raster2, the datatype of the output raster is 'FLT4S', even if the datatype ot both raster1 and raster 2 is 'INT2S'. How can I force the output to be 'INT2S'...
• 41