LDA_and_SOMs

Topic modelling (Latent Dirichlet Allocation) and Self Organizing Maps (in python and R)

Here is the code for a "fun" excercise using LDA and SOMs. For this excercise I have used the 20newsgroups dataset which can be downloaded from here.

The "flow" of the repo is the following (one could start from 4):

LDA_perplexity_demo.ipynb: this is a demo explaining the experiments I have run using different pre-processing for the documents that are latter passed to the LDA model. The companion python script is lda_perplexity.py. This script can be run as:

python lda_perplexity.py --n_topics N

where N is the number of topics. I run the script with 10, 20 and 50 topics and the results are in the directory data_processed/ in the form of pickle files.
LDA_coherence_demo.ipynb: here I simply explore with the CoherenceModel method implemented in gensim. The companion script is lda_coherence.py
LDA_gensim_vs_sklearn_demo.ipynb: this is a comparison between the LDA implementation in sklearn and gensim. The companion script is lda_gensim_vs_sklearn.py
lda_build_model.py: if one is interested in quickly running a model and and move to SOMs, skip 1, 2 and 3 and simply run:

python lda_build_model.py

just make sure you have downloaded the 20newsgroups group and that the directory names in the script are consistent with those in your working dir. Also the results are in data_processed, so the topic_SOM notebooks will run just after cloning this repo (provided that the neccesary packages are installed. Speaking of which, ipdb when using jupyter will only run if the version is 0.8.1 or lower).

The companion LDA_build_model.ipynb shows the pyLDAvis plot for that final model
topic_SOMs.ipynb and topic_SOMs_R.ipynb: using self organizing maps to visualize the results from the topic-modeling. I also show how one could build a "semantic fingerprint" based on articles read.

And that's it. Nothing major.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LDA_and_SOMs

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LDA_build_model.ipynb		LDA_build_model.ipynb
LDA_coherence_demo.ipynb		LDA_coherence_demo.ipynb
LDA_gensim_vs_sklearn_demo.ipynb		LDA_gensim_vs_sklearn_demo.ipynb
LDA_perplexity_demo.ipynb		LDA_perplexity_demo.ipynb
README.md		README.md
lda_build_model.py		lda_build_model.py
lda_coherence.py		lda_coherence.py
lda_gensim_vs_sklearn.py		lda_gensim_vs_sklearn.py
lda_perplexity.py		lda_perplexity.py
nlp_utils.py		nlp_utils.py
nlp_utils.pyc		nlp_utils.pyc
topic_SOMs.ipynb		topic_SOMs.ipynb
topic_SOMs_R.ipynb		topic_SOMs_R.ipynb

jrzaurin/Topic_Modelling_and_Self_Organizing_Maps

Folders and files

Latest commit

History

Repository files navigation

LDA_and_SOMs

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages