Questions tagged [floating-point]

Floating point numbers are approximations of real numbers that can represent larger ranges than integers but use the same amount of memory, at the cost of lower precision. If your question is about small arithmetic errors (e.g. why does 0.2 + 0.1 equal 0.300000001?) or decimal conversion errors, please read the "info" page linked below before posting.

1,636 questions with no upvoted or accepted answers
Filter by
Sorted by
Tagged with
11votes
0answers
610views

problem with vulkan floating point behavior

I try to implement the paper Extended-Precision Floating-Point Numbers for GPU Computation by Andrew Thall, Alma College in a GLSL Vulkan compute shader. I need this because some of my devices don't ...
user avatar
  • 111
9votes
1answer
596views

Are .NET Decimal type computations deterministic?

I have two questions regarding .NET's decimal data type determinism: Are decimal type computations cross-platform deterministic? Or in other words, will math operations on decimal type produce ...
user avatar
  • 460
7votes
1answer
774views

How to avoid Numpy type conversions?

Is it possible to avoid or emit warnings for automatic Numpy type conversions from integer and 32 bit float arrays to 64 bit float arrays? My use case for this is that I'm developing a large analysis ...
user avatar
  • 2,420
6votes
0answers
1kviews

Representing a float or a binary as a 32 bit signed integer in R

I've been given a task to write an API for the AR.Drone 2.0 in R. I know it's probably not the wisest choice of language as there are good validated APIs written in Python and JS, but I took the ...
user avatar
  • 355
5votes
0answers
37views

How to parse floating point infinity from std::istream

I have written a superdumb serialization library for a project that I am working on. I just got bitten by floating point infinity, which I illustrate with the sample program below. I expect the ...
user avatar
  • 345
5votes
0answers
123views

C (MIPS) - How to tell compiler load single-precision floats immidiates with GPRs?

Recently, I am trying to write some utilities for n64 with gcc and have some problems with it's optimization strategy. Please consider following example: // cctest.c extern struct { float x; ...
user avatar
  • 81
5votes
0answers
116views

Probable bug in MSVC with compile-time NaN comparison

My colleague was doing some basic experiments with NaN and was puzzled by the behavior on Visual Studio that did not match his expectations. After discussion, it seems that he uncovered a probable ...
user avatar
  • 5,477
5votes
0answers
146views

How to catch floating point errors early (right at where they occur)?

When developing floating-point heavy code, it is very useful to enable FPU exceptions. When an operation results in a NaN/inf, we could catch it immediately. For example, on Linux, I can enable this ...
user avatar
  • 27.4k
5votes
0answers
365views

Metal SIMD Min and Max operations fail for floats

Question in short Why am I getting undefined behavior from simd_min and simd_max functions in Metal 2.1 with floats? Update: Seems this only occurs on the Radeon Pro 560X GPU, but not on the Intel ...
user avatar
  • 133
5votes
0answers
104views

Rationale for range restriction of IEEE-754 compound function

The IEEE Std 754-2008 lists in Table 9.1 the recommended function compound(x,n) = (1+x)^n, with real x, integer n (where ^ is the power operator). The domain is specified as x in [-1, +infinity] and ...
user avatar
  • 1,111
4votes
0answers
112views

Is there a way to force numpy.set_printoptions to show the exact float value?

Following question 59674518, is there a way for numpy.set_printoptions to ensure the EXACT float value is displayed, without displaying trailing zeros, and without knowing the value a priori? I have ...
user avatar
  • 571
4votes
0answers
34views

fpclassify(): what are the examples of another implementation-defined categories?

N2479 C17..C2x working draft — February 5, 2020 ISO/IEC 9899:202x (E) (emphasis added): The fpclassify macro classifies its argument value as NaN, infinite, normal, subnormal, zero, or into another ...
user avatar
  • 4,132
4votes
3answers
115views

Is there a bug in controlled rounding using `exp`?

I'm observing incorrect (IMO) rounding behaviour on some platforms as follows: Calculate the value of log(2) under rounding modes to FE_DOWNWARD and FE_UPWARD (see <fenv.h>). In all cases I've ...
user avatar
4votes
3answers
384views

How to preserve raster dataType in raster processing?

When doing raster math, for example raster1-raster2, the datatype of the output raster is 'FLT4S', even if the datatype ot both raster1 and raster 2 is 'INT2S'. How can I force the output to be 'INT2S'...
user avatar
  • 41
4votes
0answers
357views

Convert List of Floating point to bytearray and back in Python

I am trying to convert a list of floating point number to bytearray and convert it back to original list. My list looks like this: [-0.055999, -0.054000, -0.049, -0.040999, -0.037000] I am trying to ...
user avatar

15 30 50 per page
1
2 3 4 5
110