AI, Fake AI, and Google
Welcome to Marketing BS, where I share a weekly article dismantling a little piece of the Marketing-Industrial Complex — and sometimes I offer simple ideas that actually work.
If you enjoy this article, I invite you to subscribe to Marketing BS — the weekly newsletters feature bonus content, including follow-ups from the previous week, commentary on topical marketing news, and information about unlisted career opportunities.
Thanks for reading and keep it simple,
On November 15, the Wall Street Journal published the results of their exhaustive investigation into Google’s processes for developing search algorithms. At the core of the exposé lies one question: does Google ever manually modify the search algorithms? The WSJ juxtaposed Google’s public statements alongside their own findings:
The company states in a Google blog, “We do not use human curation to collect or arrange the results on a page.” It says it can’t divulge details about how the algorithms work because the company is involved in a long-running and high-stakes battle with those who want to profit by gaming the system.
But that message often clashes with what happens behind the scenes. Over time, Google has increasingly re-engineered and interfered with search results to a far greater degree than the company and its executives have acknowledged, a Wall Street Journal investigation has found. [Emphasis mine]
The four WSJ journalists listed on the byline (Kirsten Grind, Sam Schechner, Robert McMillan and John West) claim to have interviewed more than 100 current and former Google employees. Moreover, the reporters conducted their own testing to identify six types of situations where Google does, in fact, make manual changes to the search algorithms. Please excuse the long quote, but I think it is worth listing the description for all of their examples:
Google made algorithmic changes to its search results that favor big businesses over smaller ones, and in at least one case made changes on behalf of a major advertiser, eBay Inc., contrary to its public position that it never takes that type of action. The company also boosts some major websites, such as Amazon.com Inc. and Facebook Inc., according to people familiar with the matter.
Google engineers regularly make behind-the-scenes adjustments to other information the company is increasingly layering on top of its basic search results. These features include auto-complete suggestions, boxes called “knowledge panels” and “featured snippets,” and news results, which aren’t subject to the same company policies limiting what engineers can remove or change.
Despite publicly denying doing so, Google keeps blacklists to remove certain sites or prevent others from surfacing in certain types of results. These moves are separate from those that block sites as required by U.S. or foreign law, such as those featuring child abuse or with copyright infringement, and from changes designed to demote spam sites, which attempt to game the system to appear higher in results.
In auto-complete, the feature that predicts search terms as the user types a query, Google’s engineers have created algorithms and blacklists to weed out more-incendiary suggestions for controversial subjects, such as abortion or immigration, in effect filtering out inflammatory results on high-profile topics.
Google employees and executives, including co-founders Larry Page and Sergey Brin, have disagreed on how much to intervene on search results and to what extent. Employees can push for revisions in specific search results, including on topics such as vaccinations and autism.
To evaluate its search results, Google employs thousands of low-paid contractors whose purpose the company says is to assess the quality of the algorithms’ rankings. Even so, contractors said Google gave feedback to these workers to convey what it considered to be the correct ranking of results, and they revised their assessments accordingly, according to contractors interviewed by the Journal. The contractors’ collective evaluations are then used to adjust algorithms. [Emphasis mine]
There are a lot of ideas to unpack in the six claims, so let’s get started.
A Brief History of Google SEO
In the early days of the internet, most people looking for information visited websites like Yahoo and AltaVista. By today’s standards, most of the pioneering search engines were not very good at providing quality results. In large part, the ineffectiveness of these search engines stemmed from the poor design of their algorithms. When Yahoo and AltaVista both tried to build a directory of the internet, they assigned priority to websites based on how many times a specific word was used on each page. Suppose, for instance, that you entered “how to draw a horse” into a search engine. The best website for delivering that information is not necessarily the website that includes the words “draw” and “horse” the highest number of times.
Instead of focusing on “quantity of words,” Google co-founders Sergey Brin and Lawrence Page developed an algorithm based on “quality of links.” More specifically, Google’s algorithm evaluated how many links a page had pointing to it, and — more importantly — how “authoritative” those links were. How did Google’s algorithm decide which pages were the most authoritative? Simple: just look at how many links were pointing to THAT page.
Their concept was innovative, although the math itself was not actually that complicated. Consider this statement from the research paper that described their process:
...a PageRank [Google’s term for measuring the importance of websites] for 26 million web pages can be computed in a few hours on a medium size workstation.
Back in 2011, Google described the logic behind their algorithm in plain language:
PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites
Brin and Page’s system produced significantly better results than rival search engines. In a short period of time, Google exemplified venture capitalist Peter Thiel’s mantra that new tech companies needed to be an “order of magnitude better than [their] nearest substitute.”
As I noted earlier, Google’s algorithm was not particularly complex. Companies could reverse engineer Google’s algorithm to ensure their website appeared on top of the search results. Comedian W.C. Fields once said, “A thing worth having is a thing worth cheating for.” Placement at the top of Google’s sort order was unquestionably a thing worth having; as such, cheating the algorithm became an obsession for many companies.
In a recent post, I described some tricks — known as Black Hat search engine optimization — to game Google’s algorithm. The short version: Black Hat specialists built thousands of webpages that would link to each other, with the ultimate goal of pushing the primary domain to the top of Google’s search results.
The creation of these “link farms” required a copious amount of work, but the process demanded very little technical know-how about search engine algorithms. Plus, the results paid for themselves — appearing at the top of Google’s sort order reliably boosted a company’s sales.
Before long, anyone who wanted to rank highly on Google needed to employ these manipulative techniques — they were the only way you could compete in a zero-sum game. Companies that rejected Black Hat SEO strategies were like the Tour de France cyclists in the 1990s who refused to use performance-enhancing drugs. Sure, you could stand tall with your morals, but you would never reach the podium.
Google realized that companies were gaming the system, so they took decisive action. For starters, the company stopped publicly posting the algorithms (which were freely available as part of their culture of transparency). The details about all future changes to algorithms were protected by secrecy. In order to thwart companies’ attempts to reverse engineer new versions of their algorithms, Google implemented a system that randomized the timing for indexing websites and updating search results. In short, Google constructed a comprehensive strategy to reduce Black Hat SEOs’ ability to cheat.
Penguins and Pandas
In 2011 and 2012, Google deployed two substantial changes to their search algorithm, codenamed Panda and Penguin (both black and white animals, which tech lore suggests was a nod to Black Hat and White Hat SEO tactics). Panda focused on lowering the ranking of low-quality websites, and Penguin targeted “link spam.”
Clever SEO specialists — incentivized by the rewards for pushing companies’ websites to the top of Google search results, managed to find some loopholes.
Let’s take a look at the challenge of designing and implementing an effective search algorithm. In the Panda update, Google tried to assess how people interacted with websites: were they spending time clicking various subpages, or did they just visit a website and quickly leave? Generally speaking, many people would support the idea that lengthy visits to a website demonstrate higher levels of “engagement” than instances where a visitor arrives as a website and then quickly departs. As such, Google increased the weighting for websites that exhibited a high level of “engagement.”
Despite Google’s best intentions, some unintended consequences became quickly apparent. People frequently consult search engines to check simple facts — Colorado’s time zone, for instance. In those cases, the optimal website design would allow a user to visit, collect the desired information, and leave. But the Panda update punished those sites for their “low engagement,” sending them further down Google’s search results.
On the flip side, some crafty SEOs learned to exploit the algorithm’s preference for websites that kept users on the site for a longer period of time, clicking through various subpages. Their solution? “Slideshows” — a website design that forced users to click through multiple pages to read all of the content. Although slideshows provided a terrible user experience, Google’s algorithm determined that the length of time and multiple clicking signified great “engagement,” thereby boosting the website’s rank in Google sort order.
When Google discovered the slideshow trick, they tweaked the algorithm to better understand the concept of “engagement.” But the cat-and-mouse game between Google and Google optimizers never stops. With each iteration of the algorithm, Google attempts to not only improve user experience, but also deter any devious tactics.
According to Google, here are recommendations for optimizing your website’s position on their search results:
You should build a website to benefit your users, and any optimization should be geared toward making the user experience better. One of those users is a search engine, which helps other users discover your content. Search Engine Optimization is about helping search engines understand and present content.
Overall, Google emphasizes two general concepts:
Make your site great for your users.
Make it easy for Google to understand your content.
In essence, Google encourages website developers to spend time creating quality content, not reverse engineering algorithms — even though some sneaky tactics could get your website moved up to a higher spot on their search results.
This idea brings us back to the WSJ report.
Putting a Thumb on the Scale
Suppose a person checks Google to search for a hotel. Which website should they be directed to visit?
If a person wants to read international news, which website should Google highlight?
For most people, the first option in each example would provide far more value than the second option. So how does Google ensure that their algorithm will prioritize high-quality websites like Expedia.com and nytimes.com, instead of low-quality sites such as Hotels-for-you.com and InternationalNews.com (both of which are squatting on domains)?
Yes, Google could evaluate the quantity and quality of incoming links, but we know about the likelihood of Black Hat SEOs cheating the system. What factors can an algorithm consider to rank Expedia and the NYT over hotels-for-you and InternationalNews? Furthermore, how can an algorithm feature foundational elements that would be too expensive for the smaller companies to try and duplicate?
Back to the first point from the WSJ report:
Google made algorithmic changes to its search results that favor big businesses over smaller ones.
Of course Google favored big businesses! In the vast majority of search inquiries, customers are looking for information about large corporations, rather than for fledgling businesses.
To align search results with consumer expectations, Google could manually add adjustments for size of company from public reports. But the algorithm could easily handle a complex method for determining the size of a business, based on information from a myriad of internet sources:
Direct type-in traffic (Google can harness data from their Chrome browser)
Brand searches on Google
Number of pages on the website that generate significant traffic
Media coverage from significant publications
By considering all of those factors, the algorithm would elevate the search ranking for big companies. This approach would significantly impair Black Hat SEOs’ ability to skew results. The most effective way to “win” this part of the algorithm is straightforward: BUILD A BIG BRAND. I learned this lesson firsthand in 2011, when A Place for Mom launched television ads. We could see our unbranded SEO results improve with every week that our commercials appeared on TV. Our TV spots led to more brand searches and direct type-in traffic, which led to Google thinking of us as a “big business,” which led to even more unbranded SEO traffic. The most surprising thing about the WSJ’s article is their belief that Google’s favouring of big business is a scoop. This concept was hardly news a decade ago, at least for those people paying attention.
Let’s briefly review the other five claims from the WSJ article:
Engineers have discretion on what appears in auto-complete suggestions and knowledge boxes.
Google blacklists some sites and content.
Auto-complete blacklists some controversial subjects.
Google employees and executives debate and disagree on when to adjust the algorithms.
Google uses contractors to test their algorithms to make sure the “right” results are appearing at the top.
Claim 1: Auto-complete and knowledge boxes are “newer” features on Google, so it makes sense that the engineers building those algorithms possess a degree of discretion into how, when, and what appears in those results.
Claims 2 and 3: We know that Google blacklists information in (at least) two ways:
Manually editing the algorithm to stop a specific site from appearing.
Re-designing the algorithm to exclude a specific site.
Generally, Google takes these actions to deal with websites that are clearly not providing value for users — even though they have managed to find a way to manipulate the algorithm into ranking their site highly.
Claim 4: This comment simply acknowledges that Google is not a dictatorship and that they, like every big company, conduct a lot of meetings to discuss how they should run the most important part of their business.
Claim 5: This revelation about the contracting of people to test algorithms strikes at the heart of AI. As I discussed in a previous post, Facebook invested in sophisticated AI to identify and remove harmful content, but they also hired more than 35,000 human beings to check for problems that the algorithms might have overlooked. These misfires are sent to the engineers who adjust the algorithms to automatically catch similar issues in the future. Google appears to be following a similar strategy: use human beings to search for various things, and, when the results are “not good,” submit that data to a team that will adjust the algorithms.
Essentially, the WSJ article seems to argue that a “pure” algorithm is somehow better — ethically, at least — than one that is manually adjusted by Google employees.
What do you think — should Google stick to letting the algorithm determine search results, or should Google manually modify the results to produce certain outcomes?
My two cents? I don’t think there’s a difference. A search engine algorithm is not some divine entity — it’s a program created by humans. Whether or not Google employees manually tweak the algorithm’s settings does not change the fact that humans are ultimately responsible for the ranking of websites in Google’s search results. This fact will make some people uncomfortable, and I understand why people feel that consolidating so much power with one company is disconcerting. That said, the WSJ’s belief that algorithms that have not been debated and edited by human beings are somehow “better” than the modified algorithms clearly misses the point.
Perhaps the next WSJ exposé should investigate people’s fundamental lack of knowledge about how tech companies function.
Keep it simple,
If you enjoyed this article, I invite you to subscribe to Marketing BS — the weekly newsletters feature bonus content, including follow-ups from the previous week, commentary on topical marketing news, and information about unlisted career opportunities.
Edward Nevraumont is a Senior Advisor with Warburg Pincus. The former CMO of General Assembly and A Place for Mom, Edward previously worked at Expedia and McKinsey & Company. For more information, including details about his latest book, check out Marketing BS.