-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Threshold for pairs learners #168
Changes from 21 commits
676ab86
cc1c3e6
f95c456
9ffe8f7
3354fb1
12cb5f1
dd8113e
1c8cd29
d12729a
dc9e21d
402729f
aaac3de
e5b1e47
a0cb3ca
8d5fc50
0f14b25
a6458a2
fada5cc
32a4889
5cf71b9
c2bc693
e96ee00
3ed3430
69c6945
bc39392
facc546
f0ca65e
a6ec283
49fbbd7
960b174
c91acf7
a742186
9ec1ead
986fed3
3f5d6d1
7b5e4dd
a3ec02c
ccc66eb
6dff15b
719d018
551d161
594c485
14713c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -148,8 +148,44 @@ tuples you're working with (pairs, triplets...). See the docstring of the | |
`score` method of the estimator you use. | ||
|
||
|
||
Learning on pairs | ||
================= | ||
|
||
Some metric learning algorithms learn on pairs of samples. In this case, one | ||
should provide the algorithm with ``n_samples`` pairs of points, with a | ||
corresponding target containing ``n_samples`` values being either +1 or -1. | ||
These values indicate whether the given pairs are similar points or | ||
dissimilar points. | ||
|
||
|
||
.. _calibration: | ||
|
||
Thresholding | ||
------------ | ||
In order to predict whether a new pair represents similar or dissimilar | ||
samples, we need to set a distance threshold, so that points closer (in the | ||
learned space) than this threshold are predicted as similar, and points further | ||
away are predicted as dissimilar. Several methods are possible for this | ||
thresholding. | ||
|
||
- **default**: Unless explicitely stated in the `fit` method documentation | ||
of the estimator, the threshold is set with the method | ||
`set_default_threshold` on the trainset. | ||
|
||
- **manual**: calling `set_threshold`, the user can | ||
manually set the threshold to a particular value. | ||
|
||
- **calibrated**: calling `calibrate_threshold`, the user can | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
calibrate the threshold to achieve a particular score on a validation set, | ||
the score being among the classical scores for classification (accuracy, f1 | ||
score...). | ||
|
||
|
||
See also: `sklearn.calibration`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. later (when fixing #173) this could be a good place for short note on the use of CalibratedClassifierCV There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree |
||
|
||
|
||
Algorithms | ||
================== | ||
========== | ||
|
||
ITML | ||
---- | ||
|
@@ -192,39 +228,6 @@ programming. | |
.. [2] Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/ | ||
itml/ | ||
|
||
|
||
LSML | ||
---- | ||
|
||
`LSML`: Metric Learning from Relative Comparisons by Minimizing Squared | ||
Residual | ||
|
||
.. topic:: Example Code: | ||
|
||
:: | ||
|
||
from metric_learn import LSML | ||
|
||
quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]], | ||
[[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]], | ||
[[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]], | ||
[[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]] | ||
|
||
# we want to make closer points where the first feature is close, and | ||
# further if the second feature is close | ||
|
||
lsml = LSML() | ||
lsml.fit(quadruplets) | ||
|
||
.. topic:: References: | ||
|
||
.. [1] Liu et al. | ||
"Metric Learning from Relative Comparisons by Minimizing Squared | ||
Residual". ICDM 2012. http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf | ||
|
||
.. [2] Adapted from https://gist.github.com/kcarnold/5439917 | ||
|
||
|
||
SDML | ||
---- | ||
|
||
|
@@ -343,3 +346,46 @@ method. However, it is one of the earliest and a still often cited technique. | |
-with-side-information.pdf>`_ Xing, Jordan, Russell, Ng. | ||
.. [2] Adapted from Matlab code `here <http://www.cs.cmu | ||
.edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz>`_. | ||
|
||
Learning on quadruplets | ||
======================= | ||
|
||
A type of information even weaker than pairs is information about relative | ||
comparisons between pairs. The user should provide the algorithm with a | ||
quadruplet of points, where the two first points are closer than the two | ||
last points. No target vector (``y``) is needed, since the supervision is | ||
already in the order that points are given in the quadruplet. | ||
|
||
Algorithms | ||
========== | ||
|
||
LSML | ||
---- | ||
|
||
`LSML`: Metric Learning from Relative Comparisons by Minimizing Squared | ||
Residual | ||
|
||
.. topic:: Example Code: | ||
|
||
:: | ||
|
||
from metric_learn import LSML | ||
|
||
quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]], | ||
[[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]], | ||
[[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]], | ||
[[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]] | ||
|
||
# we want to make closer points where the first feature is close, and | ||
# further if the second feature is close | ||
|
||
lsml = LSML() | ||
lsml.fit(quadruplets) | ||
|
||
.. topic:: References: | ||
|
||
.. [1] Liu et al. | ||
"Metric Learning from Relative Comparisons by Minimizing Squared | ||
Residual". ICDM 2012. http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf | ||
|
||
.. [2] Adapted from https://gist.github.com/kcarnold/5439917 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid "the user can", simply say "calling set threshold will set the threshold to a particular value"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, done