I am working on a personal project involving the manipulation of linear algebra concepts. This project involves, in the first instance, replicating (translating) an R (64bit version) code into a Python code. The problem described here attempts to reproduce the problem I am facing.
Based on some internet search I have done, this problem seems to be related to the handling of decimal numbers with great precision.
Here are the two pieces of code:
R code:
library(base)
# load, subsetting and convert df to matrix
df <- read.csv('hansen99.csv')
arr <- data.matrix(df[c(4:20)])
# crossprod leads to a square matrix and do inverse calculation
A <- t(arr) %*% arr # crossprod leads to a square matrix
A_inv <- solve(A)
# affichage avec 100 décimales
sprintf("%.100f",A_inv[1,1])
# '0.0023559636604639148764472889041599046322517096996307373046875000000000000000000000000000000000000000'
Python code:
import numpy as np
import numpy.linalg as npla
import pandas as pd
# load, subsetting and convert df to array
df = pd.read_csv('hansen99.csv')
arr = df.iloc[:, 3:21].values
# crossprod leads to a square matrix and do inverse calculation
A = np.transpose(arr) @ arr
A_inv = npla.inv(A)
# affichage avec 100 décimales
"{:.100f}".format(A_inv[0,0])
# '0.0023559636604639157438090268925634518382139503955841064453125000000000000000000000000000000000000000'
A_inv is therefore the inverse of a given matrix A (csv file). If I take the first element of this matrix and print it with 100 decimal places, I obtain :
R : '0.0023559636604639*1*48764472889041599046322517096996307373046875000000000000000000000000000000000000000'
Python : '0.0023559636604639*1*57438090268925634518382139503955841064453125000000000000000000000000000000000000000'
We can see that after the 17th decimal place (between stars) the sequence of numbers displayed is different between R and Python.
I've done some research and it seems to be related to the decimal resolution of types.
In python, I have the command for the float64 type (Don't know how to get it in R):
np.finfo(np.float64)
# finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)
Notes:
This difference does not seem to be related to a difference between the respective functions of the two languages. I called the R functions in Python and the Python functions in R to make sure.
Data:
If you are interested in data follow https://github.com/ceteuf/PSTR/tree/main/data/hansen99.csv
Questions:
1) What do you think is the reason for this difference? (need clarification, and possibly good resources to learn more)
2) Is it possible, by modifying a configuration or whatever, to reproduce exactly the same result and behaviour as R?
2b) Or was I completely wrong? In that case, do you have any leads?
.Machine
..Machine$double.xmax
?decimal
module in Python, something like"%.100f"
is just pointless. IEEE floats don't have that much precision in either R or Python.