-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a semi-supervised (specifically a combination of supervised and weakly-supervised data) version of weak algorithms #268
Conversation
chunks = cons.chunks(num_chunks=20) | ||
rca_semisupervised.fit(X[:n], y[:n], | ||
X[n:], chunks) | ||
rca_semisupervised.fit(X[:n], y[:n], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably add more tests around what rca_semisupervised
looks like after fitting
Just a quick reminder: "solves" is not part of the keywords that GitHub recognizes to automatically close issues ;-) |
I think this creates a major API problem due to the fact that Furthermore, this strong supervision + weak supervision is not a major use-case in practice. So indeed the overhead induced by introducing new classes, having to test and document them etc, is probably too large compared to the benefits. I would favor a solution based on helper functions which combine pairs/quadruplets/chunks provided by the user with those generated from labeled data so that users can then easily fit Note: as pointed out by @hansen7 on #233, semi-supervised is probably not the right term to describe this. This is more a combination of supervised and weakly supervised. |
Of course I am happy to hear whether @terrytangyuan @perimosocordiae @wdevazelhes have a different opinion |
I agree. In this case API compatibility is more important, especially now that we are in scikit-learn-contrib. We can start with the helper function and if it becomes popular to users we can then re-consider this. |
Closes #233
For now I only wrote what I believe to be expected for #233 for the RCA algorithm.
It is a simple modification of the supervised version of the RCA. The test is very basic as well.
It is just based on concatenating the weakly supervised information and the weakly supervised information of the transformed labeled data (strongly supervised information).
It is convenient but increases the volume of the code and documentation.
There is a
random_state
parameter passed to thefit
function in RCA, it is markedas deprecated and augments the volume of tests needed for the Semi Supervised algorithms.
I will check whether a
random_state
is present in other algorithms, to understand its relevance.I will do the other algorithms and better tests if we agree on this structure.