Does your beer taste like
No? Ours definitely don't.
What does this even mean?
Who doesnât love a good cold brew on a warm summerâs day? Who doesnât crave the sound of the can crackling open, and the spike of fizz on the tongue? Who doesnât long for the perfect pour of a pint into an oversized stein with just enough head? But when you stand at the bar, staring at the choice of beers like a deer in headlights, can you really describe the differences between the options? đŠ đ”âđ«
Navigating the world of beers can be a daunting task for non-experts. Beer aficionados often describe brews as having nuanced flavors such as âgrassy notesâ and âbiscuity/crackery maltâ, with hints of âhayâ. But do these descriptions reflect the actual tasting experience? In a hunt for answers we turned to the ultimate source of trust - data. Packed with 2,4M beer reviews, fancy natural language processing techniques, statistical tests and many more goodies in our ADA bag đ, we set out to settle the debate around the beer jargon once and for all. After reading this you will know if it is worth diving into the depths of the palate and become a true beer expert. And psst⊠you want to sound like a beer expert even without knowing the jargon? We got your back! đ« In the end, we reveal Laymanâs guide to the beer lexicon. Pack your bags, weâre going on an ADAventure! đ»
Well, we didnât quite find it. We were lucky enough to have access to a massive dataset of beer reviews, generously provided by the ADA team. This dataset, originally sourced from the esteemed BeerAdvocate website, holds a whopping 2.4M user reviews, covering a staggering 126 thousand unique beers! Talk about a diverse selection to quench any thirst for data.
To make things more manageable (and because we love a good categorization), weâve grouped the original 104 beer substyles into 11 broader style categories. đš Also, we disregarded any reviews with less than 50 words as they were mostly gibberish. Later on our journey we will be computing consensus between reviewers. As this is only meaningful for a lot of people, we also removed all reviews of beers that had less than 50 reviews overall - sorry to all you really niche beers out there! đș After all this processing magic, we were still left with 1.65M reviews in our bag.
We have lots of reviews, but how can we tell if the words in the reviews actually relate to the taste of the beer? Well, we canât - for that we would need information about the chemical composition of the beers, and unfortunately, BeerAdvocate reviewers donât conduct PhD level molecular analysis for their review. đ§âđŹ
But we can do the next best thing: we can see if people generally agree on the words they use to describe a beer. đ€
This wisdom-of-the-crowd approach leads us to the following three-step distilation process:
Letâs dive into the details of each step.
Not all words in a review are related to its taste. For example, in this review
Clear, dark amber with a fluffy tan head that leaves some decent lace on the glass. Sweet caramel malt and marshmallow with a touch of wintergreen in the aroma
biggred1 reviewing St. Patrick's Best on 19th of May 2010
we care about adjectives like âclear, dark amberâ, âfluffy tanâ or âsweet caramelâ. We might also care about non-adjective based descriptors like âmarshmallowâ or âtouch of wintergreenâ. We tested extracting only adjectives, removing all stopwords (including custom beer-based stopwords) or keeping the reviews as is. When later conducting interpretation analysis on the reviews, we decided that simple stopword removal was a good compromise, as few irrelevant terms appeared, while retaining non-adjective descriptors. This is what the same review looks like after our flavour extraction:
clear dark amber fluffy tan leave decent lace glass sweet caramel marshmallow touch wintergreen
As we can see, most words relate to the taste - but our method is not 100% foolproof. We tried to make our custom stopword list as comprehensive as possible, but we will never catch all non-taste descriptors. For example, in this review we unfortunately retain âleaveâ, âglassâ, and âtouchâ - however, the vast majority of words remaining carry meaning. And so we carry on. đ
Unfortunately, our computers can neither taste nor read - at least not without some help. To analyse the extracted flavours, we therefore had to convert them into a numerical representation. In NLP this is called embedding - we prefer numerical fermentation. đș
Three examples of numerical fermentation (embedding) using TF-IDF. We show the embedding process, as well as how to retrieve important words from the embedding.
We considered various techniques: from simply counting the number of times a word appears in a review to the latest advances in NLP using Transformer-based models like BERT and SBERT. Even though these models have proven to provide better sentence representations, the vector features they produce are not interpretable. In the end, we decided to use TF-IDF - a model that counts occurrences of words but also takes the frequency of the word in the corpus into account. The intuition behind this is that words that appear in many reviews are less relevant to the analysis, and we want to give them less weight. We can also hope that some of the non-taste-related words from our flavour extraction step are penalised because they appear across many reviews. Crucially, TF-IDF allows us to retrieve the descriptors that are most important for each review - vital for our later guide to the beer lexicon. đ
Now that all are reviews have been formulated numerically, we can measure how close they are to each other to see if they agree. We define the average of these distances as the consensus of the language used in the reviews. We tried a range of metrics such as cosine similarity, correlation and the KL and JS divergences. We found that the cosine similarity gave the best results. This is likely due to the fact that it ignores the magnitude of the vector and only looks at the angle - which in TFIDF terms means looking at which words are used rather than how common a word is. As a bonus, it was also the most computationally efficient method đ». We can now use this consensus metric to compare the language used in different groups of reviews. What on earth will we findâŠ?
Steps 1 and 2 yielded some fancy vectors representing beer reviews, but do they actually tell us anything useful? How can we make sense of a whopping 431,208 dimensions?
We pulled a neat trick called Truncated Singular Value Decomposition (TSVD) to shrink those dimensions down to a cozy 2D space. This technique is like a cousin of PCA, but itâs specially designed for handling sparse matrices.
When we plotted the reviews in this 2D space, something magical đȘ happened: they started forming distinct clusters! We focused on three popular styles â India Pale Ales (IPAs), Stouts and Porters, and Pale Lagers and Pilsners â and found that they consistently arranged themselves in a vertical pattern, with IPAs on top, Stouts and Porters in the middle, and Pale Lagers and Pilsners at the bottom. This visual fingerprint tells us that the vectors arenât just random numbersâthey capture real, meaningful differences in how people describe different beer styles.
TSVD clusters of embeddings across styles, substyles and beers. We can clearly see groupings forming over these groupings indicating that the embeddings are meaningful.
On to step 3! We wanted to compare these languages differences numerically. However, we faced a slight challenge. Calculating cosine similarity between all 2.5 million reviews would require a mind-boggling 10TB of memory! To overcome this, we took a representative sample of 10,000 reviews and calculated the average cosine similarity within different groups (all reviews, styles, substyles, and individual beers).
Boxplot and histogram/kernel density plot showing the distributions of pairwise cosine similarity across groupings of style/substyle/beer name. Once again, we see clear differences in mean and variance across groups further illustrating the meaning of the embeddings.
As any good beer aficionado (or data scientist) knows, you canât just trust your eyes alone đ. We needed to make sure these differences werenât just a fluke. So, we performed a statistical test called ANOVA (type II), which is like a t-test but for comparing multiple groups at once.
The results? Drumroll, please⊠đ„ The p-value came back as a miniscule 3.040729e-07! Thatâs science-speak for we can trust these results.
To double-check, we shuffled the reviews like a deck of cards and re-ran the test. This time, the p-value came back as 0.9 â basically, no significant differences. This confirmed that our initial findings werenât just a lucky draw. đ
We then used Tukeyâs HSD test to pinpoint exactly which groups were significantly different from each other. It turned out that the language used for different styles and substyles was indeed distinct, as well as between the styles and individual beers. The other pairs of groupings we tested didnât show as much variation.
With all this evidence, we can confidently say (while still leaving room for a bit of scientific caution) that beer lovers do indeed use different language to describe different styles and beers. Hooray!
But how is any of this useful?
All of this analysis showing that beer descriptors are meaningful is all well and good, but how should you go about describing a beer if you want to truly sound like an expert? Which words should you use for which kinds of beers? How will you look in front of your beer-fanatic friends if you take a misstep in the minefield âąïž of beer descriptors? In this section, we take the TF-IDF embbeddings and conduct further analysis in order to find the largest differences between descriptors used for different kinds of beers.
Since TF-IDF preserves each input word as a feature, we can interpret its value as the significance of the word to the review. Furthermore, given that we have removed stopwords, and removed beer-specific stopwords (like âbeerâ or âbrewâ), the words left behind should be descriptors that are meaningful in some way. Below is a figure showing the top 10 word for describing all beers - that is, the average TFIDF embeddings across all beers.
Top 10 words for all beers
The results here are unsurprising. We have fairly basic descriptors such as âniceâ and âgoodâ. What if we instead looked at a specific style of beer? Below are charts for 2 styles - Amber and Dark Lagers, and Wheat beers, respectively.
Top 10 words for Amber and Dark Lagers
Here we see slightly more interesting examples. On the Wheat beers chart we have many fruity words appearing, where with Amber and Dark Lagers we see that they are sweet and caramel-y, with the notable inclusion of âoktoberfestâ coming in at No. 10! đ©đȘ
Top 10 words for Wheat Beers
It might be more interesting to see which words are used differently for a given beer style. If we take the difference between a groupâs mean embbedding and the mean of the remainder of the beers, we can find such descriptors. This way, we can identify the words which are most important to that kind of beer - or put differently - the words which are most different in usage for a particular kind of beer.
First up we have Pale Ales. Very borinigly, the most important word is pale, followed by hop. But now with the inclusion of the difference, we can see that we should not describe pale ales as dark or chocolate-y.
Top 10 differences in word usage for Pale Ales
Next up, Stouts and Porters. Clearly, this is where all the dark and chocolate-y beers can be found. Or we can even compare them to âroasted coffeeâ, with hints of âvanillaâ and âbourbonâ, and a sprinkling of âcitrusâ - weâre starting to sound like beer critics already!
Top 10 differences in word usage for Stouts and Porters
In our dataset, we also have a number of specialty beers. Here we see some less conventionial flavours popping up in the form of raspberry, smoke, cherry and blueberry. What are these brewers up to?!
Top 10 differences in word usage for Specialty Beers
The final example we present to you is for hybrid & experimental beers. Firstly, the top 10 differences are all positive - truly some unique beers here! Weâve got some seemingly contrasting words coming out - something which one might expect of hyrbid beers. Both âsourâ and âtartâ are present, but also âcinnamonâ and âwarmerâ. Perhaps the best way to sum it up would be âfunkââŠ
Top 10 differences in word usage for Hyrbid and Experimental Beers
Turns out, navigating the beer aisle isnât as daunting as it seems. Our analysis revealed a shared language among beer styles, and we have provided some ideas for how to sound like a beer pro.
Beyond the fun, this shared language offers real potential:
So, next time you raise a glass, remember that youâre part of a global beer language choir. Cheers to understanding, discovery, and endless brews đ»!
P.S Read about some of our limitations and ethical considerations.
Made with â€ïž on Friday Afternoons