# Questions tagged [floating-point]

Floating point numbers are approximations of real numbers that can represent larger ranges than integers but use the same amount of memory, at the cost of lower precision. If your question is about small arithmetic errors (e.g. why does 0.2 + 0.1 equal 0.300000001?) or decimal conversion errors, please read the "info" page linked below before posting.

1,636 questions with no upvoted or accepted answers
Filter by
Sorted by
Tagged with
610views

### problem with vulkan floating point behavior

I try to implement the paper Extended-Precision Floating-Point Numbers for GPU Computation by Andrew Thall, Alma College in a GLSL Vulkan compute shader. I need this because some of my devices don't ...
• 111
596views

### Are .NET Decimal type computations deterministic?

I have two questions regarding .NET's decimal data type determinism: Are decimal type computations cross-platform deterministic? Or in other words, will math operations on decimal type produce ...
• 460
774views

### How to avoid Numpy type conversions?

Is it possible to avoid or emit warnings for automatic Numpy type conversions from integer and 32 bit float arrays to 64 bit float arrays? My use case for this is that I'm developing a large analysis ...
• 2,420
1kviews

### Representing a float or a binary as a 32 bit signed integer in R

I've been given a task to write an API for the AR.Drone 2.0 in R. I know it's probably not the wisest choice of language as there are good validated APIs written in Python and JS, but I took the ...
• 355
37views

### How to parse floating point infinity from std::istream

I have written a superdumb serialization library for a project that I am working on. I just got bitten by floating point infinity, which I illustrate with the sample program below. I expect the ...
• 345
123views

### C (MIPS) - How to tell compiler load single-precision floats immidiates with GPRs?

Recently, I am trying to write some utilities for n64 with gcc and have some problems with it's optimization strategy. Please consider following example: // cctest.c extern struct { float x; ...
• 81
116views

### Probable bug in MSVC with compile-time NaN comparison

My colleague was doing some basic experiments with NaN and was puzzled by the behavior on Visual Studio that did not match his expectations. After discussion, it seems that he uncovered a probable ...
• 5,477
146views

### How to catch floating point errors early (right at where they occur)?

When developing floating-point heavy code, it is very useful to enable FPU exceptions. When an operation results in a NaN/inf, we could catch it immediately. For example, on Linux, I can enable this ...
• 27.4k
365views

### Metal SIMD Min and Max operations fail for floats

Question in short Why am I getting undefined behavior from simd_min and simd_max functions in Metal 2.1 with floats? Update: Seems this only occurs on the Radeon Pro 560X GPU, but not on the Intel ...
• 133
104views

### Rationale for range restriction of IEEE-754 compound function

The IEEE Std 754-2008 lists in Table 9.1 the recommended function compound(x,n) = (1+x)^n, with real x, integer n (where ^ is the power operator). The domain is specified as x in [-1, +infinity] and ...
• 1,111
112views

### Is there a way to force numpy.set_printoptions to show the exact float value?

Following question 59674518, is there a way for numpy.set_printoptions to ensure the EXACT float value is displayed, without displaying trailing zeros, and without knowing the value a priori? I have ...
• 571
34views

### fpclassify(): what are the examples of another implementation-defined categories?

N2479 C17..C2x working draft — February 5, 2020 ISO/IEC 9899:202x (E) (emphasis added): The fpclassify macro classifies its argument value as NaN, infinite, normal, subnormal, zero, or into another ...
• 4,132
115views

### Is there a bug in controlled rounding using `exp`?

I'm observing incorrect (IMO) rounding behaviour on some platforms as follows: Calculate the value of log(2) under rounding modes to FE_DOWNWARD and FE_UPWARD (see <fenv.h>). In all cases I've ...
384views

### How to preserve raster dataType in raster processing?

When doing raster math, for example raster1-raster2, the datatype of the output raster is 'FLT4S', even if the datatype ot both raster1 and raster 2 is 'INT2S'. How can I force the output to be 'INT2S'...
• 41
357views

### Convert List of Floating point to bytearray and back in Python

I am trying to convert a list of floating point number to bytearray and convert it back to original list. My list looks like this: [-0.055999, -0.054000, -0.049, -0.040999, -0.037000] I am trying to ...
110views

### add3 instruction for a+b+c with one single rounding

Background It is well known that the exact product of two floating point numbers is not always a floating point number, but the error exact(a*b) - float(a*b) is. Some codes for exact multiplication ...
• 44.8k
501views

### Why does complex floating-point division underflow weirdly with NumPy?

Consider this code: import numpy numpy.seterr(under='warn') x1 = 1 + 1j / (1 << 533) x2 = 1 - 1j / (1 << 533) y1 = x1 * 1.1 y2 = x2 * 1.1 z1 = x1 / 1.1 z2 = x2 / 1.1 print(numpy.divide(1, ...
• 196k
149views

### Two different kinds of floating-point overflow in Python

I am testing with calculating (1e308)**2 and (1e308)*2 in python. I expect that either both yield overflow, or both yield inf. However, (1e308)**2 manifests an overflow exception while (1e308)*...
• 8,673
211views

### Any insights on this Microsoft C 5.1 floating point and DOSBox weirdness?

This is a fantastically strange bug that has been tweaking my noodle for the better part of a day; it took me some time to boil it down to this. The setup: Microsoft C 5.10 (~1988) DOSBox 0.74 ...
• 16.9k
384views

### C# Change FPU rounding mode

I'm attempting to write an interval arithmetic library in C# .NET, but in order to do this accurately I need to be able to control the rounding mode of floating point operations. After a bit of ...
• 639
2kviews

### pragma STDC FENV_ACCESS ON is not supported

I tried to slightly modify the example from the article: #include <iostream> #include <cfenv> #pragma STDC FENV_ACCESS ON int main() { std::feclearexcept(FE_ALL_EXCEPT); //int r ...
• 14.6k
1kviews

### How to correctly pass a float from C# to C++ (dll)

I'm getting huge differences when I pass a float from C# to C++. I'm passing a dynamic float wich changes over time. With a debugger I get this: c++ lonVel -0.036019072 float c# lonVel -0....
187views

### Create a program that returns the smallest cube which exceeds a non-negative integer n

So I'm trying to create a program which generates the smallest cube greater than an integer n. def first_cube_above(n): #Return the smallest cube which exceeds the non-negative integer n. ...
• 41
81views

### wrong result on adition of numbers larger than epsilon using numpy.float128

Considering that epsilon is the smallest number that you can add to one. I'm getting 1 instead of 1+epsilon when I perform the addition and print the result. I've implemented a getEpsilon function. I ...
• 441
111views

### Arithmetic operations on floating point numbers giving unexpected results

I know that with binary representation it is not possible to exactly represent a floating-point number (and I also understand why 0.1 + 0.2 == 0.3 is false). Now here is where I got stuck while I ...
• 311
105views

### FLT_HAS_SUBNORM is 0: does execution of fpclassify() with manually constructed subnormal lead to UB or lead to WDB returning FP_SUBNORMAL?

In case of FLT_HAS_SUBNORM == 0 (or any XXX_HAS_SUBNORM == 0 in general) does execution of fpclassify macro with manually constructed subnormal (constructed using type punning via union, using memcpy,...
• 4,132
48views

### In python, is there hidden rules to control how to display the precision of decimal number

For python, do read this link: https://docs.python.org/3/tutorial/floatingpoint.html, "Floating Point Arithmetic: Issues and Limitations" I do understand that there is mismatch(tiny ...
• 41
120views

### Denormalized floating point numbers: which operations trigger expensive special cases?

Denormalized floating point numbers require expensive special handling in some operations (additions, multiplications). While this is well-known, it seems to me that there are also many comparably ...
• 457
544views

### Find smallest integer that satisfies floating point inequality equation

I am looking for a fast algorithm that finds the smallest integer N that will satisfy the following inequality where s, q, u, and p are float numbers (using the IEEE-754 binary32 format): s > q + ...
400views

### Efficiently represent 16777217 as a float

Browsing job advertisements, I saw the following question: Do you understand what it takes to efficiently represent 16,777,217 as a float? [Siemens] I don't understand the question. I know that ...
• 22.9k
146views

### What guarantees does System.Numerics.Vectors provide about size and bit order?

I have implemented a vector-based c# approximation of Log. It includes unsafe code. It's been working fine in a number of environments, but on a recent deployment has fallen over. The implementation ...
• 101
151views

### Correctly rounding a trigonometric function for single-precision

I want a correctly rounded (round to nearest ties to even) single-precision trigonometric function (0.5 ulp error). I can use either the CORDIC algorithm or one of the polynomial approximation ...
• 440
107views

### Is there a way to disable denormals in numpy? (Enabling ftz and daz flags)

I'm trying to perform a few calculations on floating point numbers that are close to the float32 min. I want the numbers to be flushed to zero when they drop below the float32 minimum instead of ...
155views

### Floating point [in]accuracy of C program, when running on the same machine, changed over last two weeks

The following C code was compiled today on two systems with Microsoft's compiler (installed with Visual Studio 2017 Community), both of which had modern 64-bit Intel processors and were running ...
• 1,606
62views

### Dealing with floating point point inaccuracy in very small numbers efficiently

The program I am working with takes OpenStreetMap data to render a map. The data consists of 4 coordinates, that make up the bounds of the data. I am drawing lines, that sometimes exceed these bounds ...
85views

### R not working properly with big numbers because of default options?

Seems like R coerces big numbers and cannot compare them effectively: x = 123412415124231251233213 x == 123412415124231251233214 [1] TRUE x == 123412415124231251233217 [1] TRUE Any idea why (maybe a ...
• 408
700views

### Unexpected result with kotlin contentEquals on DoubleArray

I have a need to compare two DoubleArrays in order to determine if they have the same values in the same order. To do so I have used the contentEquals extension function, however, it treats 0 and -0 ...
• 7,117
2kviews

### How to format float to 4 decimal places within json dumps?

Have difficulty in converting pf_stats output to have 4 decimal places: import json import numpy as np def run_simulation(H, P, B, C, mu, sigma, T, L): x = normal(mu, sigma, (L, T)) pf_all ...
• 31
1kviews

### Float16 (HalfTensor) in pytorch + cuda

Can I set torch.HalfTensor as default and use it with CUDA? I can't even create usual Conv2D: In [1]: import torch In [2]: torch.__version__ Out[2]: '0.2.0_3' In [3]: from torch import nn In [4]: ...
• 718
431views

### Lack of precision of the toFixed method in javascript

I have do some test about Number.prototype.toFixed method in chrome(v60.0.3112.101) console and found sth puzzled me. Why 1.15.toFixed(1) return "1.1" but not the "1.2"? Why 1.05.toFixed(1) return "1....
• 1,199
169views

Is there a built-in function in Haskell that rounds a real floating-point number to the nearest whole number, without changing the type of said number? sameTypeRound f == fromIntegral (round f)
• 1,551
484views

### Getting FloatingPointError instead of ZeroDivisionError when dividing by zero

I'm running a very time-consuming post-processor in Python and have encountered a FloatingPointError where I was expecting a ZeroDivisionError. My code captured the possibility of a ZeroDivisionError ...
312views

### Is it safe to cast Math.Round result to float?

A colleague has written some code along these lines: var roundedNumber = (float) Math.Round(someFloat, 2); Console.WriteLine(roundedNumber); I have an uncertainty about this code - is the number ...
• 124k
1kviews

### How to preserve float precision in CSV to JSON conversion (via pandas.read_csv)?

NB: My question is not a duplicate of Format floats with standard json module. In fact, Mark Dickinson provided a good answer to my question in one of his comments, and this answer is all about ...
• 30.1k
699views

### Python pyvttbl ANOVA error

I am trying to perform ANOVA with pyvttbl over my dataset but I get a strange error. Here is my code: import pyvttbl df = pyvttbl.DataFrame() df.read_tbl("ANOVA_MWE_input.csv") print df print type(...
• 261
124views

### Javascript wrongfully changes the result of a simple multiplication. How can I fix it?

function roundUp(num, precision) { return Math.ceil(num * precision) / precision; } var num = 0.07; var precision = 100; console.log(roundUp(num, precision)); When the arguments to the ...
• 1,154
569views

### why 0.1 + 0.3 = 0.4 in JavaScript and Python?

I know why 0.1 + 0.2 !== 0.3, because 0.1 cannot be represented exactly in a binary floating point representation, but why 0.1 + 0.3 === 0.4 in JavaScript? I think 0.1, 0.3 both cannot be represented ...
5kviews

### How to solve...ValueError: cannot convert float NaN to integer

I'm running quite a complex code so I won't bother with details as I've had it working before but now im getting this error. Particle is a 3D tuple filled with 0 or 255, and I am using the scipy ...
115views

### Behaviour of floating point precision for division

While working with various floating point number solutions I have been logging the values to compare the outputs. e.g. console.log(3 * 0.1) //0.30000000000000004 console.log(3 * 0.2) //0....
• 1,441