Linked: “Heavy Metal and Natural Language Processing – Part 1”

via Degenerate State:

As you can see, Five Finger Death Punch have the highest number of swear words in their lyrics, and Pig Destroyer have the most complex wordplay. It also suggests that bands that swear more seem to use more complex words.

While this is an interesting way to represent the bands, it is limited in what it captures. In what follows, I’m going to explore more general ways we can looking at natural language, focusing on the “bag-of-words” model.

A “bag-of-words” model is one where we only care about the frequencies with which each word appears in the text of a document. In other words we throw away all information about the relative ordering of words. This approach obviously loses some information about the document being analyses. For example, the phrases “dog bites man” and “man bites dog” would end up with the same representation, despite referring to very different events. However, we do capture some information about the phrases, namely that both of them refer to a “dog” and to a “man”. This suggests that in looking at word frequencies we are capturing some information about the “topic” of the models.

Most Metal = Burn

Least Metal = Particularly

Great.