-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/implementation of mimca #163
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments:
- typing in the functions (not only in docstring)
- I do not understand the structure of the 3 files in imputations/mimca folder and utils. There seems to be some redundancies
- some tests do not pass
- remove comments
- create Class
from tqdm import tqdm | ||
|
||
|
||
def moy_p(V, weights): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not hesitate to give more explicit name for functions
missing_indices = rng.choice(total_values, n_missing, replace=False) | ||
row_indices = missing_indices // n_cols | ||
col_indices = missing_indices % n_cols | ||
for i in range(n_missing): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a vectorization to avoid loop/
Suggestion :
...
missing_indices = rng.choice(total_values, n_missing, replace=False)
row_indices, col_indices = np.unravel_index(missing_indices, (n_rows, n_cols))
data.values[row_indices, col_indices] = np.nan
return data
return df_reconstructed | ||
|
||
def imputeMCA( | ||
don, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename "don" ? Not very an explicit name...
dict | ||
Dictionary containing: | ||
- "tab_disj": Disjunctive coded table after imputation. | ||
- "completeObs": Complete dataset with missing values imputed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename completeObs ? usually, variable names in python do not contain capital letter, but _ if multiple "words"
- "completeObs": Complete dataset with missing values imputed. | ||
|
||
""" | ||
don = pd.DataFrame(don) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why pd.DataFrame(don) if don is already a DataFrame (specified as such in docstring) ?
if ( | ||
not pd.api.types.is_numeric_dtype(don[col]) | ||
or don[col].dtype == "bool" | ||
): # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need of E501
- "completeObs": Complete dataset with missing values imputed. | ||
|
||
""" | ||
# Ensure the data is a DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: since it is expected a DataFrame in input, check the type of "don". If not dataframe, report in a log and convert it.
Z = Z.subtract(Z_mean, axis=1) | ||
Zscale = Z.multiply(np.sqrt(M), axis=1) | ||
|
||
print("Centered and scaled data (Zscale):") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not fan of print.
Use logging instead
@@ -0,0 +1,665 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add general documentation for the file.
|
||
""" | ||
if verbose: | ||
print(f"{print_msg}...", end="", flush=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging instead of print
else: | ||
row_w = np.array(row_w, dtype=float) | ||
row_w /= row_w.sum() | ||
ncp = int(min(ncp, X.shape[0] - 1, X.shape[1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why shape[0] - 1 ?
I think your tests do not passed because of that.
No description provided.