Skip to content

Printing Clusters (Top terms & titles) #12

@s2hewitt

Description

@s2hewitt

I've followed all the steps down to the final one where you print the top terms per cluster, together with the film titles.
I'm using a slightly different dataset (blog titles and blog post content) but in essence my data is the same as yours, although my data is already in a dataframe, so where you call on 'synopses', I call df.Content. The one step I couldn't do was the one where you grouped the rank by clusters as obviously this doesn't apply to me. I want ten clusters from my data.

Here, you create a dictionary:

films = { 'title': titles, 'rank': ranks, 'synopsis': synopses, 'cluster': clusters, 'genre': genres }
frame = pd.DataFrame(films, index = [clusters] , columns = ['rank', 'title', 'cluster', 'genre'])

But as I already have a dataframe, I reindex-ed using clusters. The problem is, only the first ten blog post titles are being used, as this screenshot shows:

image

As this is my first attempt at kMeans (although I've been experimenting with my data for three weeks) I'm not yet clever enough to work out what's going wrong. Any ideas? Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions