15 March 2021

Sentiment Analysis of MTG Flavor Text

The fantasy card game Magic: The Gathering (often called simply Magic), with its decades of history and thousands of cards, is a treasure trove of data, which curious players can explore even when a global pandemic makes face-to-face tabletop gaming much more difficult.

In summer last year I posted a tongue-in-cheek "periodic table" of the game's fantasy races and classes (known as "creature subtypes"), something I'm keen to update in a proper blog post. Today I'm interested in "flavor text": this is the italicised text at the bottom of a card which doesn't carry any rules meaning, but acts as a bridge between the card's mechanics and the story.

VADER (Valence Aware Dictionary for Sentiment Reasoning, described here) is one "off-the-shelf" model for analysing the valence (positive or negative) and intensity of the mood expressed in a piece of text. Here's a few examples of how VADER evaluates sentences:

Sentence VADER compound sentiment
This is a really great blog post! 0.6893
This is a great blog post! 0.6588
This is not a good blog post -0.3412
This is a terrible blog post! -0.5255

While VADER is designed primarily for social media comments, it can easily be applied to snippets of fantasy lore. What are the trends in the tone expressed by Magic cards? Which cards come out most positive and most negative?

To state the obvious, no analysis could ever encapsulate a text's whole emotional impact in one number. Because of this, and because a slightly different methodology could produce very different results, everything here should be taken with a pinch of salt. I should also emphasise that, while everything written here is correct at the time of writing (March 2021, between the releases of Kaldheim and Strixhaven), Magic releases new cards at a rate of hundreds per year, so this will likely be out of date pretty quickly.

With those caveats in mind, the rest of this post will cover a few results which I found interesting, looking first at cards individually, then at groups of cards and finally at long-term trends.


Highest and lowest

A natural starting point is to grade the VADER sentiment of flavor text on each of the game's cards and see which cards end up at the top and bottom of the table.

These are the five cards with the lowest overall scores. Restless Bones is the lowest with -0.9792. A common theme among the five is reference to death, which VADER's lexicon rates particularly strongly. Angel of Renewal appears to be an oddity: the analysis picks up the negative sentiment behind "fear", "failure" and "death", but isn't able to detect that in the Angel's flavor text these concepts are triumphantly rebuked.

The five cards with the highest sentiment scores have little in common, in contrast with the bottom five. Soaring Show-Off's score, 0.9545, is the highest. Borderland Minotaur's flavor text, poignantly describing a Pyrrhic victory, may be another example where this analysis is too simplistic to pick up on every nuance.


Grouping by color and subtype

Looking at the scores for cards individually appears not always to give results that exactly match our intuitions, but we can still get interesting results when aggregating groups of cards together and examining the average sentiment scores within each group.

Grouping the cards by "color" we find that white cards have the highest average sentiment (0.033) and black cards have the lowest (-0.103). Looking at creature subtypes, Orc (-0.253) and Zombie (-0.141) are among the lowest, while Druid (0.108), Dog (0.089) and Angel (0.031) are among the highest.

It's worth noting that a full ranking of creature subtypes would be distorted by some very small sample sizes. For example, "Starfish" has the lowest average sentiment of any creature subtype, but it isn't worth reading much into this given that only one Starfish has ever been printed with flavor text:


Sets over time

Is there anything we can learn by looking at long-term trends? Magic's cards are released in "sets", each with their own setting, story and tone. What do we see when we look at each set and calculate the average sentiment of its cards?

(full-size version here)

There are many details which could be picked out, but here's a few specific observations:

  • For a long portion of Magic's history, two or three sets would be grouped into a "block" sharing one setting. (This graph uses red and green coloring to group sets in the same block.) Often the average sentiment will decrease over the course of a block, as the story's crisis escalates, but we also see cases where the trend is less clear or goes in the other direction.
  • One of the sharpest contrasts between adjacent blocks is between Lorwyn and Shadowmoor. These two blocks depict a "Jekyll-and-Hyde" world which alternates between light and dark.
  • The set with by far the highest average sentiment is Kaladesh, which was consciously designed with an optimistic tone in mind.
  • Two other sets with high average scores are New Phyrexia and Amonkhet. These are both sets which deliberately create dissonance with their flavor text: New Phyrexia pairs idealism in its flavor text with metallic horror in its artwork, while Amonkhet was designed so that upbeat story components are "out of sync" with the sinister game mechanics.

While there is a lot of variation in average sentiment between nearby sets (and certainly a lot of variation in flavor text sentiment within each set), there is a slight upward trend over the course of the game's history. Some might interpret this as a sign that the game is "dumbing down" or trying to appeal to younger audience, but in fact I think you can make a good case that the opposite is true: as the game's fanbase has grown and matured, players get more comfortable acknowledging that fantasy battles between wizards don't always have to be doom and gloom.