Skip to content

Clustering articles using different algorithms and in different dimensions

Notifications You must be signed in to change notification settings

skourta/articulus_divisio

Repository files navigation

articulus_divisio

Tandem Approaches

Classification of multiple text datasets using various algorithms, including

  • Kmeans
  • Agglomerative Clustering
    • Ward
    • Complete
    • Single
    • Average
  • HDBSCAN
  • Spectral Clustering
  • Gaussian Mixtures

The classification is based on multiple work representations including:

  • Word2Vec
  • GloVe
  • BERT
  • ROBERTA

Represented in various spaces by using dimensionality reduction techniques including:

  • PCA
  • t-SNE
  • UMAP
  • Simple Autoencoder

First Submission:

Labeled Data (Classic4 and BBC)

Open In Colab

Second Submission

Labeled Data (Classic4 and BBC)

Open In Colab

Unlabeled Data

Articles1

Open In Colab

Articles2

Open In Colab

Simultaneous Dimensionality Reduction and Classification

Classification of multiple text datasets using various algorithms, including

  • Reduced k-means et Factorial k-means
  • Deep Clustering Network (DCN)
  • Deep k-means (DKM)

The classification is based on multiple work representations including:

  • Word2Vec
  • GloVe
  • BERT
  • ROBERTA

First Submission:

Labeled Data (Classic4 and BBC)

Open In Colab

Second Submission:

Labeled Data (Classic4 and BBC)

Open In Colab

Unlabeled Data (Articles1 and Articles2)

Open In Colab

About

Clustering articles using different algorithms and in different dimensions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published