1

On my Intel x86_64 machine, this C++ code generates different sequences on Clang vs GCC:

#include <iostream>

namespace {

template<typename Out>
constexpr auto caster{[](auto x) constexpr {
        return static_cast<Out>(x);
}};

}  // namespace

auto main() -> int {
        constexpr auto fl{caster<double>};

        constexpr double ellipse_b_start{1.0};
        constexpr double ellipse_b_end{150.0};
        constexpr long ellipse_b_count{12347};

        constexpr double ellipse_b_step{(ellipse_b_end - ellipse_b_start) /
                                        fl(ellipse_b_count)};

        std::ios::sync_with_stdio(false);
        std::cout << std::hexfloat;

        for (long i{0}; i < ellipse_b_count; i++) {
                auto ellipse_b{ellipse_b_start + ellipse_b_step * fl(i)};

                std::cout << ellipse_b << '\n';
        }
}

Addition and multiplication are well-defined by IEEE 754, so I expected my sequence would likewise be a mathematical constant.

Traditionally the Intel x87 extended precision floating-point registers would be blamed for this. But this is a modern Intel x86_64 CPU, so presumably AVX or SSE are used for floating-point instead of x87?

My questions

  1. What is the reason for the different behavior between GCC and Clang?
  2. How can I get the exact same sequence of numbers on both compilers? The numbers from the sequence should be quickly generated.
  3. Is this a manifestation of a bug in Clang?
  4. Is this a manifestation of a bug in GCC?

-ffp-contract=off

Eric Postpischil proposed this compiler option as a solution. While it perhaps is a fix to this problem, it is problematic as a solution when applied to my complete code (the above is just an example), because the compiler option would apply to the entire compilation unit, which would be undesirable for performance and other reasons.

Additional information

The GCC version 11.1.0.

Clang is 12.0.1.

Both GCC and Clang compile my code according to these options:

-std=c++20 -pedantic -g -march=native -flto -O3 -fno-exceptions

The CPU is i5-8300H.

I can also provide the binaries if someone's wants to take a look.

Context

The motivation for the code was comparing several different implementations of an analytical function, where the sequence in question provides inputs on which the different implementations are to be compared. This is why I want the sequences to be predictable even across compilers. I basically want to be able to consider the sequence of inputs as fixed/written in stone.

Examples of differing parts of the sequence

GCC:

...
0x1.59973622ca91bp+0
0x1.5cae14b13b7c3p+0
0x1.5fc4f33fac66cp+0
0x1.62dbd1ce1d515p+0
0x1.65f2b05c8e3bdp+0
...

Clang:

...
0x1.59973622ca91bp+0
0x1.5cae14b13b7c4p+0
0x1.5fc4f33fac66cp+0
0x1.62dbd1ce1d515p+0
0x1.65f2b05c8e3bep+0
...

Clang's sequence and GCC's sequence do tend to synchronize, there are never many inconsistent points in a row.

Ghidra decompilation for Clang

int main(void)

{
        undefined auVar1 [16];
        basic_ostream *pbVar2;
        long lVar3;
        long in_FS_OFFSET;
        undefined in_XMM1 [16];
        char local_21;
        long local_20;
        
        local_20 = *(long *)(in_FS_OFFSET + 0x28);
        lVar3 = 0;
        std::ios_base::sync_with_stdio(false);
        *(uint *)(_ITM_deregisterTMCloneTable + *(long *)(std::cout + -0x18)) =
             *(uint *)(_ITM_deregisterTMCloneTable + *(long *)(std::cout + -0x18)) | 0x104;
        do {
                auVar1 = vcvtsi2sd_avx(in_XMM1,lVar3);
                auVar1 = vmulsd_avx(auVar1,ZEXT816(0x3f88b6f473875453));
                auVar1 = vaddsd_avx(auVar1,ZEXT816(0x3ff0000000000000));
                pbVar2 = std::basic_ostream<char,std::char_traits<char>>::_M_insert_double_
                                   (SUB168(auVar1,0));
                local_21 = '\n';
                std::__ostream_insert_char_std__char_traits_char__(pbVar2,&local_21,1);
                lVar3 = lVar3 + 1;
        } while (lVar3 != 0x303b);
        if (*(long *)(in_FS_OFFSET + 0x28) == local_20) {
                return 0;
        }
                    /* WARNING: Subroutine does not return */
        __stack_chk_fail();
}

Ghidra decompilation for GCC

undefined8 main(void)

{
        undefined auVar1 [16];
        basic_ostream *pbVar2;
        long lVar3;
        long in_FS_OFFSET;
        undefined in_YMM1 [32];
        char local_21;
        long local_20;
        
        lVar3 = 0;
        local_20 = *(long *)(in_FS_OFFSET + 0x28);
        std::ios_base::sync_with_stdio(false);
        *(uint *)(_ITM_deregisterTMCloneTable + *(long *)(std::cout + -0x18)) =
             *(uint *)(_ITM_deregisterTMCloneTable + *(long *)(std::cout + -0x18)) | 0x104;
        do {
                auVar1 = vxorpd_avx(SUB3216(in_YMM1,0),SUB3216(in_YMM1,0));
                in_YMM1 = ZEXT1632(auVar1);
                auVar1 = vcvtsi2sd_avx(auVar1,lVar3);
                lVar3 = lVar3 + 1;
                auVar1 = vfmadd132sd_fma(auVar1,ZEXT816(0x3ff0000000000000),
                                         ZEXT816(0x3f88b6f473875453));
                pbVar2 = std::basic_ostream<char,std::char_traits<char>>::_M_insert_double_
                                   (SUB168(auVar1,0));
                local_21 = '\n';
                std::__ostream_insert_char_std__char_traits_char__(pbVar2,&local_21,1);
        } while (lVar3 != 0x303b);
        if (local_20 == *(long *)(in_FS_OFFSET + 0x28)) {
                return 0;
        }
                    /* WARNING: Subroutine does not return */
        __stack_chk_fail();
}

Notice how GCC does a fused multiply-add operation, while Clang doesn't. I guess that could be the reason for the differences? But is there a nice way to prevent the differences in the sequence's terms?

I previously said that I would accept an inline assembly solution, but now that I think about that, I actually want a cross-platform solution. If there is no better way, I'll just try using -ffp-contract.

32
  • 4
    Please show a minimal reproducible example within the question without relying on external links Jul 24, 2021 at 6:42
  • 2
    @PaulMcKenzie You link show that floating point math has unexpected behaviour, but this question is about the consistency of that behaviour.
    – gerum
    Jul 24, 2021 at 8:29
  • 3
    @user2373145: Yes, absolutely. Stack Overflow is not a personal debugging service. Participants are not expected to fetch your code from some third-party site, download the library (PARI) it uses, configure the library (with unknown options, since you did not state them), build and install that library, build your code, and debug it for you. Stack Overflow is intended to be a durable repository of specific questions and answers to serve readers in the future. External links are not durable: The data at them changes or vanishes over time… Jul 24, 2021 at 12:57
  • 3
    @user2373145: I did download your code, and I downloaded PARI, and I configured it, and I tried to build it and got a syntax error from bison. That’s a reason Stack Overflow questions should be self-contained with minimal reproducible examples: Readers should not have to resolve version and compatibility issues. They have nothing to do with the underlying problem. The burden of eliminating these and reducing the problem to something minimal is yours. Jul 24, 2021 at 13:00
  • 4
    Re “GCC does a fused multiply-add operation”: For this specific issue, compile with -ffp-contract=off. Jul 24, 2021 at 14:07

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Browse other questions tagged or ask your own question.