Hope you all had a spooky Halloween! Guess what I was for Halloween? Sick, lol!

Well, you can't escape the germs in San Francisco forever.

Speaking of escaping, I really tried not to write about BERT like everyone else. Then I wanted to learn more about the technology behind it... and fell down the rabbit hole. Ugh, fine.

It's certainly less exciting from an SEO but more so from a technology perspective! Not because BERT couldn't have an impact on SEO but because so far, we haven't seen much.


Taking BERT apart

The goal of this post is not to describe how to optimize for BERT because you can’t apply tactics to a machine’s understanding of language. You can only create outstanding content and products. Google tries to emulate what humans like and deem helpful.

With that out of the way, my goal for this post is to dive deeper into some of the technology behind BERT to shape our understanding of Google’s capabilities. Mind you, I don’t understand it all the way down to 0. My math is just not that good, but maybe it doesn’t have to be.

Let’s cut this cake.

Optimus Prime is a Muppet
For most people, BERT is a puppet from Sesame Street. In NLP, it’s Bidirectional Encoder Representations from Transformers and the hottest Google toy. In a nutshell, BERT a new machine learning toy that helps Google understand context better.

Context is a big challenge for machines, though easy for humans. BERT solves that problem (better) with bidirectional Transformers - the special sauce. We’re not speaking about cars that turn into robots, but about models that are able to understand context much much better than anything else out there, e.g. recurring neural nets.

Here’s how I understand the way Transformers work:
  1. read a sentence and create word embeddings (mapping a word to a vector)
  2. give each word a weight (with self-attention layers)
  3. mask some (~15% of) words
  4. guess what could come before and after each word
  5. run that iteration a couple of times
  6. Provide a solid understanding of the relationship between all words
  7. Get an understanding of the relationship between sentences

Transformers can also more reliably predict the next sentence. It’s crazy.

Step 4 is really important. That’s the bi-directional aspect of Transformers, which is groundbreaking. Before that, NLP models only read in one direction. Guessing what comes after a word is called language modeling and BERT gets it done very resource-friendly.

BERT is much better at understanding the “it” that’s so easy to get for humans but is a pain in the butt for machines (think: “The duck tried crossing the street but it was too wide” <- the duck or the street?).

Why is BERT so cool?
BERT is pretty much a technological break-through.

Let’s list some facts
  1. BERT can understand context better than any other current NLP system out there (
  2. BERT is state of the art for question-answering systems (very important for search engines). I know it’s a steep thesis, but I could see BERT playing a role in evaluation how close to consensus a site is, e.g. for fake news or medical sites. We’ve seen the drama around sites like and I think the recent resurgence could have something to do with BERT.
  3. BERT is very resource-friendly. This is important for Google because running NLP system at such a scale means tremendous cost. Saving money means more profit.
  4. BERT rocks named entity recognition and paraphrasing, which is important for Google's entity and knowledge graph.
  5. BERT understands the relationship between consecutive sentences (which previous NLP models don’t). 

Where BERT could be applied
BERT is an open-sourced NLP technology that could be applied to so many tasks, it’s too much to list. But in terms of search and SEO, I could see a couple of cases. 

I already mentioned question-answering systems (like People Also Asked boxes). Google mentioned featured snippets and organic search. I could see it also being applied to image alt-tags, backlink anchor text, and videos. 

Lastly, BERT would fit well into Google's goal to customize knowledge graph integration based on context, as I wrote about in A New Google.

Further literature:

5 things you should check out

Digital Olympus "Email Outreach: The Ultimate Guide"
I really like the design of this article but also the tactics included here. Worth checking out!

Orbit Media "Content Promotion Statistics To Help You Be a Better Blogger [New Research]"
Interesting blogger statistics to give you a feel for how you're doing ;-).

NY Times "Google, in Rare Stumble, Posts 23% Decline in Profit"
Google stock and profit drops are never good news for SEOs. Google will make that gap up and often at the expense of organic results.

Databox "24 Examples of High-Performing Pillar Pages To Draw Inspiration From"
Some really content pillar examples in here!

Forbes "What A Prison Gang Leader Taught Ben Horowitz About Running A Business"
Love the storytelling in this Ben Horowitz piece!

I'd appreciate if you could support me by forwarding this email to your friends. THX!

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.