-
Notifications
You must be signed in to change notification settings - Fork 628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LDA2Vec doesn't work at all; does anyone have the correct code for python 3? #84
Comments
It is quite broken, even on python 2. I spun up a virtualenv, and spent an hour trying to wrestle the latest spacy API into the code. The problems for me are in preprocess.py: I've updated spacy to nlp = spacy.load('en') and also converted the document attribute arrays to 64 bit integers instead of 32 bit which were overflowing: But it is still producing negative values in the matrix which fail the assertion. I can't tell if another hour will solve it, so I'm going to carry on improving my LDA, NMF and LSA topic models instead. |
Hey thanks for responding and confirming it. The nlp = spacy.load('en') shouldn't work since that's deprecated and changed to nlp = spacy.load('en_core_web_sm'). But there's so many other problems, I'm not sure if it's worth trying to fix everything. |
If you use np.uint64 as dtype, it works. Preprocess becomes:
|
I can't even successfully execute "python setup.py install". A lot of errors occur in C++ code: #86 |
Here's a port to tensorflow that allegedly works with python 3 lda2vec-tf. Here's also a port to pytorch lda2vec-pytorch (NB: in the pytorch readme, it says "Warning: I, personally, believe that it is quite hard to make lda2vec algorithm work. Not very encouraging, which is kind of disappointing. |
Hello Greg, |
I haven't actually done anything with it! I was hoping someone else had. ^_^ |
ok :) thank you for your answer |
I also have my own tensorflow implementation up, adapted from the one @MChrys linked to. Again, it works, but it is very finicky |
Hello all, I was struggling to setup, and also run some of the functions with Python 3.7. I got installed, but facing lot of issues. I could visualize 20newsgroup data as I have the the generated file available. Trying to create the file in .npz format, no luck yet. Question to Chris: Just wondering if you have a working version (most latest) that we can try out? Also facing lot of issues with Cupy install. Can we run without GPU functionality? Thank you! Ahmed Khan |
try my fork: https://github.com/whcjimmy/lda2vec. I've tested the twenty_newsgroups example. |
I will try yous, thank you so much Jimmy.
Just two questions:
1) In the doc in your Github, it says word vector for 'German' is -0.6,
any idea how to get that number. Also on the RHS, the document vector is
-0.7 - how to get that one as well?
2) I was getting these errors in compiling preprocess file under example
dir:
File
"C:\Users\Administrator\Anaconda3\lib\site-packages\lda2vec\corpus.py",
line 159, in finalize
self.specials_to_compact = {s: self.loose_to_compact[i] for s, i in
self.specials.items()}
File
"C:\Users\Administrator\Anaconda3\lib\site-packages\lda2vec\corpus.py",
line 159, in <dictcomp>
self.specials_to_compact = {s: self.loose_to_compact[i] for s, i in
self.specials.items()}
KeyError: -1
Did you get similar errors as well?
Thanks,
AK
…On Thu, Feb 14, 2019 at 1:13 AM Jimmy Wang ***@***.***> wrote:
try my fork: https://github.com/whcjimmy/lda2vec.
I've tested the twenty_newsgroups example.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#84 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AlkRX2esMIt4-CnEdG2mbLcbu_A7Pn2Jks5vNSi-gaJpZM4X2qM1>
.
|
My doc follows this repo and i didn't change any details, but I can try to answer your questions. It doesn't mean "German" is -0.6. The whole 1 * 5 word vector is used to represent a word "German". Maybe the word vector comes from a pre-trained word2vec model which is GoogleNews-vectors-negative300.bin, I am not that sure. I didn't get this error. However, in corpus.py, you can find out that the only key number less than 0 is "-2" whcih means special tokens (in line 140). Hope these answers help you! |
when using the file 'preprogress.py',the outcome of vocab is bad? |
Tried this out, doesn't work |
Basically I have tried everything out in porting it to python 3, and I'm not even able to get the preprocess functions working. Saw this issue and tried out everything here too. Going to use gensim LDA. |
Hello from 2021 |
LDA2Vec doesn't seem to work at all at this current stage. Gensim code is outdated, the general code runs on Python 2.7, and people seem to be having problems with Chainer and other stuff.
I tried to revise the code to Python 3, but I'm hitting walls here and there, especially since I don't know how exactly every function is working. Did anyone solve these general issues? Did it actually work for anyone recently?
The text was updated successfully, but these errors were encountered: