Questions tagged [fma]
Fused Multiply Add or Multiply-Accumulate
73
questions
47votes
2answers
36kviews
How to use Fused Multiply-Add (FMA) instructions with SSE/AVX
I have learned that some Intel/AMD CPUs can do simultanous multiply and add with SSE/AVX: FLOPS per cycle for sandy-bridge and haswell SSE2/AVX/AVX2.
I like to know how to do this best in code and I ...
47votes
1answer
8kviews
Obtaining peak bandwidth on Haswell in the L1 cache: only getting 62%
I'm attempting to obtain full bandwidth in the L1 cache for the following function on Intel processors
float triad(float *x, float *y, float *z, const int n) {
float k = 3.14159f;
for(int i=0;...
36votes
2answers
3kviews
Significant FMA performance anomaly experienced in the Intel Broadwell processor
Code1:
vzeroall
mov rcx, 1000000
startLabel1:
vfmadd231ps ymm0, ymm0, ymm0
vfmadd231ps ymm1, ymm1, ymm1
vfmadd231ps ymm2, ymm2, ymm2
vfmadd231ps ymm3, ymm3, ymm3
...
24votes
2answers
16kviews
FMA3 in GCC: how to enable
I have a i5-4250U which has AVX2 and FMA3. I am testing some dense matrix multiplication code in GCC 4.8.1 on Linux which I wrote. Below is a list of three difference ways I compile.
SSE2: gcc ...
18votes
1answer
5kviews
AVX2: Computing dot product of 512 float arrays
I will preface this by saying that I am a complete beginner at SIMD intrinsics.
Essentially, I have a CPU which supports the AVX2 instrinsic (Intel(R) Core(TM) i5-7500T CPU @ 2.70GHz). I would like ...
16votes
6answers
4kviews
Which algorithms benefit most from fused multiply add?
fma(a,b,c) is equivalent to a*b+c except it doesn't round intermediate result.
Could you give me some examples of algorithms that non-trivially benefit from avoiding this rounding?
It's not obvious, ...
15votes
4answers
10kviews
How to get data out of AVX registers?
Using MSVC 2013 and AVX 1, I've got 8 floats in a register:
__m256 foo = mm256_fmadd_ps(a,b,c);
Now I want to call inline void print(float) {...} for all 8 floats. It looks like the Intel AVX ...
15votes
2answers
2kviews
Fused multiply add and default rounding modes
With GCC 5.3 the following code compield with -O3 -fma
float mul_add(float a, float b, float c) {
return a*b + c;
}
produces the following assembly
vfmadd132ss %xmm1, %xmm2, %xmm0
ret
I ...
14votes
3answers
2kviews
Can I use the AVX FMA units to do bit-exact 52 bit integer multiplications?
AXV2 doesn't have any integer multiplications with sources larger than 32-bit. It does offer 32 x 32 -> 32 multiplies, as well as 32 x 32 -> 64 multiplies1, but nothing with 64-bit sources.
Let's say ...
14votes
2answers
6kviews
Why does the FMA _mm256_fmadd_pd() intrinsic have 3 asm mnemonics, "vfmadd132pd", "231" and "213"?
Could someone explain to me why there are 3 variants of the fused multiply-accumulate instruction: vfmadd132pd, vfmadd231pd and vfmadd213pd, while there is only one C intrinsics _mm256_fmadd_pd?
To ...
10votes
1answer
796views
Do FMA (fused multiply-add) instructions always produce the same result as a mul then add instruction?
I have this assembly (AT&T syntax):
mulsd %xmm0, %xmm1
addsd %xmm1, %xmm2
I want to replace it with:
vfmadd231sd %xmm0, %xmm1, %xmm2
Will this transformation always leave equivalent state ...
10votes
3answers
1kviews
Optimize for fast multiplication but slow addition: FMA and doubledouble
When I first got a Haswell processor I tried implementing FMA to determine the Mandelbrot set. The main algorithm is this:
intn = 0;
for(int32_t i=0; i<maxiter; i++) {
floatn x2 = square(x), ...
9votes
2answers
10kviews
Preventing GCC from automatically using AVX and FMA instructions when compiled with -mavx and -mfma
How can I disable auto-vectorization with AVX and FMA instructions? I would still prefer the compiler to employ SSE and SSE2 automatically, but not FMA and AVX.
My code that uses AVX checks for its ...
9votes
2answers
3kviews
Automatically generate FMA instructions in MSVC
MSVC supports AVX/AVX2 instructions for years now and according to this msdn blog post, it can automatically generate fused-multiply-add (FMA) instructions.
Yet neither of the following functions ...
8votes
2answers
11kviews
How do I know if I can compile with FMA instruction sets?
I have seen questions about how to use FMA instructions set but before I get to start using them, I'd first like to know if I can (does my processor support them). I found a post saying that I needed ...