Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating CTC CPMI #3

Open
vantesy opened this issue Jun 2, 2024 · 0 comments
Open

Calculating CTC CPMI #3

vantesy opened this issue Jun 2, 2024 · 0 comments

Comments

@vantesy
Copy link

vantesy commented Jun 2, 2024

Hi, i recently found your measures in a literature review for my masterthesis and was trying to apply it on the 20Newsgroup dataset with the code you provided. I transformed everything using the preprocessing steps from the example notebook and inputed the topics (i got them from a BERTopic model) as you defined in the example (as a list of lists of topic words). After training the cpmi tree using colab for quite a long time on gpu power, i got the results for the ctc cpmi which were over 274.01 for my topics ( i had nearly 90 topics and the cpmi tree was calculated based on 86,716 segments.) I tried again with only a small percentage of the documents (resulting in 29,910 segments) and it resulted in a ctc cpmi score of 95.62. In your paper the ctc cpmi lies below zero for the BERTopic model and in the origine paper for the cpmi the score is also not higher than 20. I looked into the code but found no fault so i wonderd whether those result make sense or wether i need to do an other averaging step afterwards?

Thank you for your answer in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant