Team Blackada

Does your beer taste like

No? Ours definitely don't.

What does this even mean?

Who doesn’t love a good cold brew on a warm summer’s day? Who doesn’t crave the sound of the can crackling open, and the spike of fizz on the tongue? Who doesn’t long for the perfect pour of a pint into an oversized stein with just enough head? But when you stand at the bar, staring at the choice of beers like a deer in headlights, can you really describe the differences between the options? 🩌 đŸ˜”â€đŸ’«

Navigating the world of beers can be a daunting task for non-experts. Beer aficionados often describe brews as having nuanced flavors such as ‘grassy notes’ and ‘biscuity/crackery malt’, with hints of ‘hay’. But do these descriptions reflect the actual tasting experience? In a hunt for answers we turned to the ultimate source of trust - data. Packed with 2,4M beer reviews, fancy natural language processing techniques, statistical tests and many more goodies in our ADA bag 🎒, we set out to settle the debate around the beer jargon once and for all. After reading this you will know if it is worth diving into the depths of the palate and become a true beer expert. And psst
 you want to sound like a beer expert even without knowing the jargon? We got your back! đŸ«‚ In the end, we reveal Layman’s guide to the beer lexicon. Pack your bags, we’re going on an ADAventure! đŸ»

Where Did We Find This Treasure Trove of Beer Reviews?


Well, we didn’t quite find it. We were lucky enough to have access to a massive dataset of beer reviews, generously provided by the ADA team. This dataset, originally sourced from the esteemed BeerAdvocate website, holds a whopping 2.4M user reviews, covering a staggering 126 thousand unique beers! Talk about a diverse selection to quench any thirst for data.

To make things more manageable (and because we love a good categorization), we’ve grouped the original 104 beer substyles into 11 broader style categories. 🎹 Also, we disregarded any reviews with less than 50 words as they were mostly gibberish. Later on our journey we will be computing consensus between reviewers. As this is only meaningful for a lot of people, we also removed all reviews of beers that had less than 50 reviews overall - sorry to all you really niche beers out there! đŸș After all this processing magic, we were still left with 1.65M reviews in our bag.

Untangling the Taste Buds: Our Three-Step Recipe


We have lots of reviews, but how can we tell if the words in the reviews actually relate to the taste of the beer? Well, we can’t - for that we would need information about the chemical composition of the beers, and unfortunately, BeerAdvocate reviewers don’t conduct PhD level molecular analysis for their review. 🧑‍🔬

But we can do the next best thing: we can see if people generally agree on the words they use to describe a beer. đŸ€

This wisdom-of-the-crowd approach leads us to the following three-step distilation process:

  1. 😋 Flavour Extraction: We extract the most important words from each review
  2. 🔱 Numerical Fermentation: We embed these words in a vector space
  3. đŸ•”ïžâ€â™‚ïž Similarity Synthesis: We compute a consensus score between reviews

Let’s dive into the details of each step.

1. 😋 Flavor Extraction

Not all words in a review are related to its taste. For example, in this review

Clear, dark amber with a fluffy tan head that leaves some decent lace on the glass. Sweet caramel malt and marshmallow with a touch of wintergreen in the aroma

biggred1 reviewing St. Patrick's Best on 19th of May 2010

we care about adjectives like ‘clear, dark amber’, ‘fluffy tan’ or ‘sweet caramel’. We might also care about non-adjective based descriptors like ‘marshmallow’ or ‘touch of wintergreen’. We tested extracting only adjectives, removing all stopwords (including custom beer-based stopwords) or keeping the reviews as is. When later conducting interpretation analysis on the reviews, we decided that simple stopword removal was a good compromise, as few irrelevant terms appeared, while retaining non-adjective descriptors. This is what the same review looks like after our flavour extraction:

clear dark amber fluffy tan leave decent lace glass sweet caramel marshmallow touch wintergreen

As we can see, most words relate to the taste - but our method is not 100% foolproof. We tried to make our custom stopword list as comprehensive as possible, but we will never catch all non-taste descriptors. For example, in this review we unfortunately retain ‘leave’, ‘glass’, and ‘touch’ - however, the vast majority of words remaining carry meaning. And so we carry on. 🎒

2. đŸ”ąïž Numerical Fermentation

Unfortunately, our computers can neither taste nor read - at least not without some help. To analyse the extracted flavours, we therefore had to convert them into a numerical representation. In NLP this is called embedding - we prefer numerical fermentation. đŸș

TF-IDF example

Three examples of numerical fermentation (embedding) using TF-IDF. We show the embedding process, as well as how to retrieve important words from the embedding.

We considered various techniques: from simply counting the number of times a word appears in a review to the latest advances in NLP using Transformer-based models like BERT and SBERT. Even though these models have proven to provide better sentence representations, the vector features they produce are not interpretable. In the end, we decided to use TF-IDF - a model that counts occurrences of words but also takes the frequency of the word in the corpus into account. The intuition behind this is that words that appear in many reviews are less relevant to the analysis, and we want to give them less weight. We can also hope that some of the non-taste-related words from our flavour extraction step are penalised because they appear across many reviews. Crucially, TF-IDF allows us to retrieve the descriptors that are most important for each review - vital for our later guide to the beer lexicon. 📖

3. đŸ•”ïžâ€â™‚ïž Similarity Synthesis

Now that all are reviews have been formulated numerically, we can measure how close they are to each other to see if they agree. We define the average of these distances as the consensus of the language used in the reviews. We tried a range of metrics such as cosine similarity, correlation and the KL and JS divergences. We found that the cosine similarity gave the best results. This is likely due to the fact that it ignores the magnitude of the vector and only looks at the angle - which in TFIDF terms means looking at which words are used rather than how common a word is. As a bonus, it was also the most computationally efficient method đŸ’». We can now use this consensus metric to compare the language used in different groups of reviews. What on earth will we find
?

Unveiling the Hidden Patterns


Steps 1 and 2 yielded some fancy vectors representing beer reviews, but do they actually tell us anything useful? How can we make sense of a whopping 431,208 dimensions?

Visualising the Beer Landscape

We pulled a neat trick called Truncated Singular Value Decomposition (TSVD) to shrink those dimensions down to a cozy 2D space. This technique is like a cousin of PCA, but it’s specially designed for handling sparse matrices.

When we plotted the reviews in this 2D space, something magical đŸȘ„ happened: they started forming distinct clusters! We focused on three popular styles — India Pale Ales (IPAs), Stouts and Porters, and Pale Lagers and Pilsners — and found that they consistently arranged themselves in a vertical pattern, with IPAs on top, Stouts and Porters in the middle, and Pale Lagers and Pilsners at the bottom. This visual fingerprint tells us that the vectors aren’t just random numbers—they capture real, meaningful differences in how people describe different beer styles.

TSVD clusters of embeddings across styles, substyles and beers. We can clearly see groupings forming over these groupings indicating that the embeddings are meaningful.

TSVD clusters of embeddings across styles, substyles and beers. We can clearly see groupings forming over these groupings indicating that the embeddings are meaningful.

Putting a Number on Beer Talk

On to step 3! We wanted to compare these languages differences numerically. However, we faced a slight challenge. Calculating cosine similarity between all 2.5 million reviews would require a mind-boggling 10TB of memory! To overcome this, we took a representative sample of 10,000 reviews and calculated the average cosine similarity within different groups (all reviews, styles, substyles, and individual beers).

distribution-of-cosine-similarity

Boxplot and histogram/kernel density plot showing the distributions of pairwise cosine similarity across groupings of style/substyle/beer name. Once again, we see clear differences in mean and variance across groups further illustrating the meaning of the embeddings.

Beer-Proofing Our Findings

As any good beer aficionado (or data scientist) knows, you can’t just trust your eyes alone 👀. We needed to make sure these differences weren’t just a fluke. So, we performed a statistical test called ANOVA (type II), which is like a t-test but for comparing multiple groups at once.

The results? Drumroll, please
 đŸ„ The p-value came back as a miniscule 3.040729e-07! That’s science-speak for we can trust these results.

To double-check, we shuffled the reviews like a deck of cards and re-ran the test. This time, the p-value came back as 0.9 — basically, no significant differences. This confirmed that our initial findings weren’t just a lucky draw. 🃏

We then used Tukey’s HSD test to pinpoint exactly which groups were significantly different from each other. It turned out that the language used for different styles and substyles was indeed distinct, as well as between the styles and individual beers. The other pairs of groupings we tested didn’t show as much variation.

With all this evidence, we can confidently say (while still leaving room for a bit of scientific caution) that beer lovers do indeed use different language to describe different styles and beers. Hooray!

But how is any of this useful?

A Layman’s Guide to the Beer Lexicon


All of this analysis showing that beer descriptors are meaningful is all well and good, but how should you go about describing a beer if you want to truly sound like an expert? Which words should you use for which kinds of beers? How will you look in front of your beer-fanatic friends if you take a misstep in the minefield â˜ąïž of beer descriptors? In this section, we take the TF-IDF embbeddings and conduct further analysis in order to find the largest differences between descriptors used for different kinds of beers.

The Most Important Words

Since TF-IDF preserves each input word as a feature, we can interpret its value as the significance of the word to the review. Furthermore, given that we have removed stopwords, and removed beer-specific stopwords (like ‘beer’ or ‘brew’), the words left behind should be descriptors that are meaningful in some way. Below is a figure showing the top 10 word for describing all beers - that is, the average TFIDF embeddings across all beers.

Top 10 words for all beers

Top 10 words for all beers

The results here are unsurprising. We have fairly basic descriptors such as ‘nice’ and ‘good’. What if we instead looked at a specific style of beer? Below are charts for 2 styles - Amber and Dark Lagers, and Wheat beers, respectively.

Top 10 words for Amber and Dark Lagers

Top 10 words for Amber and Dark Lagers

Here we see slightly more interesting examples. On the Wheat beers chart we have many fruity words appearing, where with Amber and Dark Lagers we see that they are sweet and caramel-y, with the notable inclusion of ‘oktoberfest’ coming in at No. 10! đŸ‡©đŸ‡Ș

Top 10 words for Wheat Beers

Top 10 words for Wheat Beers

Which words when?

It might be more interesting to see which words are used differently for a given beer style. If we take the difference between a group’s mean embbedding and the mean of the remainder of the beers, we can find such descriptors. This way, we can identify the words which are most important to that kind of beer - or put differently - the words which are most different in usage for a particular kind of beer.

First up we have Pale Ales. Very borinigly, the most important word is pale, followed by hop. But now with the inclusion of the difference, we can see that we should not describe pale ales as dark or chocolate-y.

Top 10 differences in word usage for Pale Ales

Top 10 differences in word usage for Pale Ales

Next up, Stouts and Porters. Clearly, this is where all the dark and chocolate-y beers can be found. Or we can even compare them to ‘roasted coffee’, with hints of ‘vanilla’ and ‘bourbon’, and a sprinkling of ‘citrus’ - we’re starting to sound like beer critics already!

Top 10 differences in word usage for Stouts and Porters

Top 10 differences in word usage for Stouts and Porters

In our dataset, we also have a number of specialty beers. Here we see some less conventionial flavours popping up in the form of raspberry, smoke, cherry and blueberry. What are these brewers up to?!

Top 10 differences in word usage for Specialty Beers

Top 10 differences in word usage for Specialty Beers

The final example we present to you is for hybrid & experimental beers. Firstly, the top 10 differences are all positive - truly some unique beers here! We’ve got some seemingly contrasting words coming out - something which one might expect of hyrbid beers. Both ‘sour’ and ‘tart’ are present, but also ‘cinnamon’ and ‘warmer’. Perhaps the best way to sum it up would be ‘funk’


Top 10 differences in word usage for Hyrbid and Experimental Beers

Top 10 differences in word usage for Hyrbid and Experimental Beers

So, what have we learned?


Turns out, navigating the beer aisle isn’t as daunting as it seems. Our analysis revealed a shared language among beer styles, and we have provided some ideas for how to sound like a beer pro.

Beyond the fun, this shared language offers real potential:

So, next time you raise a glass, remember that you’re part of a global beer language choir. Cheers to understanding, discovery, and endless brews đŸ»!

P.S Read about some of our limitations and ethical considerations.


Made with ❀ on Friday Afternoons