For visualization sake of proving the diversity of our datasets:
- Pick a text embedding model
- Get the embeddings of our datasets and the datasets used in the literature
- Plot the clusters in each dataset and show the diversity of the dimensionality-reduced datasets.
Aim: hoping that the end plot of our own data would look like the current literature's data.