Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Own pairwise distances #9

Open
mlajtos opened this issue May 23, 2015 · 3 comments
Open

Own pairwise distances #9

mlajtos opened this issue May 23, 2015 · 3 comments

Comments

@mlajtos
Copy link

mlajtos commented May 23, 2015

Would it be possible to add function which computes Barnes-Hut SNE embedings from a precomputed pairwise distance matrix?

@lvdmaaten
Copy link
Collaborator

When you say "pairwise distance matrix", do you mean a full NxN matrix? That does not make much sense to me: the whole point of Barnes-Hut SNE is that it is O(N log N) in computation and O(N) in memory, whereas a full pairwise distance matrix uses O(N^2) memory.

If you mean a sparse adjacency matrix that is O(|E|) with E the set of edges in the graph: yes, that would be possible and relatively straightforward to achieve.

@mlajtos
Copy link
Author

mlajtos commented May 25, 2015

Thank you for your reply.

Yes, I have a sparse non-symetric NxN matrix (N~=10E4) where each element represents similarity (scaled to <0, 1>) between all the pairs. So plugging this matrix would therefore skip all SNE magic and only performed embedding as force-directed layout algorithms do?

I tried to use this NxN matrix directly into the BH SNE, but performing PCA on this matrix is taking and an eternity and awful amount of memory. I was able to do so only with N~=5000 within reasonable time. Disabling the PCA in tsne.lua won't help – there was an error on line 107: local data_char = ffi.new('char[?]', nchars) as it could not determine the size of the array or it was too big.

Any hints how to modify source code (or possibly data) to get something out of it?

@lvdmaaten
Copy link
Collaborator

https://github.com/clementfarabet/manifold/pull/4/files#diff-febf6147a17c3712ede775e407101a27R114

This is where the similarity matrix P is constructed based on the input NxD matrix X. The similarity matrix is stored in row-compressed sparse format: row_P is a N+1 int array containing the row indices, col_P is a int NxK array containing the column indices, and val_P a double NxK array containing the similarity values. The call to computeGaussianPerplexitySparse() here should thus be replaced by a call to a function that copies the data from your sparse similarity matrix into this row-compressed sparse format.

This would imply making some changes in how the function run() is called by the FFI wrapper: you would need to adapt it to reflect the changes in your input format. Disable the PCA step altogether, but let run() and work directly on your sparse similarity matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants