were over the moon about it. They had been meticulously observing the data from the switch and waiting with collectively bated breath to see if it would work or flop. By May 2012, just a few short months after the new AI integration, the data showed that average watch time was four times what it had been the previous May. Collective sigh of relief.
The YouTube AI has changed over time to create a personalized feed based on customization. Its Homepage is no longer channel dominant but filled with a mix of videos directly chosen based on individual viewing patterns and behaviors. It now suggests, with uncanny accuracy, what a viewer might want to watch. This is a huge change from its surface recommendations. You're no longer dipping from the site (if you don't know what dipping is, ask a Gen Z kid) because the videos are just another version of the one you just watched—you're sticking around to click on the video that you've never seen before but are definitely drawn to. It's as if YouTube hired a tailor to come in and take your measurements so he could build you an outfit you didn't even know you wanted. Who doesn't love the feel of something that fits like a glove? And that also doesn't look exactly like every other outfit you own?
Diving Deep into the Deep Learning Machine
To explain further, let's rewind and reexamine the data. After the turn of the first decade in the twenty‐first century, YouTube came face‐to‐face with some hard truths. First, their users were watching videos from a bunch of other platforms instead of coming to the site directly. YouTube viewership was up, but only because people were watching YouTube videos that had been shared to big platforms like Facebook and Twitter. This made it impossible for YouTube to gather data about their consumers and to retain and monetize them.
Another tough truth was that YouTube had different operating programs for different devices and applications, so they needed to collect the pieces and reboot an operating system in one place, directly from the source. Shockingly, at the time, YouTube didn't even have a dialed‐in system for analyzing mobile usage, which was an embarrassing realization because a huge percentage of viewership was mobile. Its digitally ancient mobile development was painfully slow, and something needed to be done about it, stat.
Enter InnerTube in 2012: an interdepartmental program at YouTube HQ created to revamp algorithms and development from the top down. InnerTube was resetting the system and observing its reboot in one place to ensure everything fell into place correctly and quickly. It was imperative that implementations be made quickly and could be tested before applying across the board. If a new change didn't work, they needed to pull it promptly without it crippling the whole shebang. Then they would tweak and try again.
Another vital piece to the reboot was utilizing deep learning machines. Google's AI had undergone several phases of development and usage, and it was getting better and better. Google's deep learning AI was now capable of using gigantic neural networks that got really good at things like recommendation and search. Deep learning goes beyond basic machine learning in that it's built to mimic human neural networks. It makes nonlinear conclusions.
The input data for deep learning machines on YouTube came from the behavior of its users and monitored not only “positive” viewer behavior, like which videos they liked and kept watching, but also “negative” behavior, like which videos they skipped or even removed from their custom Homepage or “Up next” recommendations from YouTube. Monitoring both the positive and negative behavior of its users is vital to the algorithm's accuracy. This neural network has gotten so good that it can even predict what to do with new or unfamiliar videos based on current user behavior. Saying, “It has a mind of its own,” is not much of a stretch. The AI actually doesn't observe the total Internet behavior of a user; it only watches what happens on YouTube. This matters because it's what maintains its pinpoint accuracy in recommendations.
How?
Let's say you went to google.com and typed “steakhouses in Los Angeles” in the search bar. Does that mean the next time you go to youtube.com you want it to recommend videos on how to grill a perfect steak? Or that you want to take a video tour of LA? Probably not. But if you search, “How to grill the perfect rare steak,” directly on YouTube's search bar and click on the first recommended video, the suggested videos that pop up next might be, “World's strongest man—full day of eating,” then, “How to clean a cast iron skillet.” These secondary videos don't have anything to do with steak, but do you see how that viewer would be a likely candidate to continue clicking? That's a deep learning machine that knows what it's doing. And YouTube and its ecosystem are direct benefactors, because when viewers watch more, everyone makes more money and gets more brand exposure.
A Machine at Work … and It's Working
YouTube recommends hundreds of millions of videos to users every single day, in dozens of different languages, in every corner of the world. Their suggestions account for 75% of the time people spend on the site.
In 2012, daily watch time averaged out at about a hundred million hours. In 2019, that average sits at a mind‐blowing one billion hours a day. One billion hours of video content being collectively consumed by viewers on one website every single day! Over this seven‐year span and thousands if not tens of thousands of tweaks and triggers, the deep learning AI has gotten really good at recommending videos to keep viewers watching longer. It has become an expert digital gardener who knows which product to harvest for each customer based on the videos they've been “feeding” on. You can be a YouTube master gardener, too, when you arm yourself with the right tools. Just hang on to your shovel, because we are still breaking ground.
4 The Algorithm Breakdown
You just learned a lot about the history of the systems that have run YouTube since its inception, and you know that those systems have become quite good at what they do. But what does that mean literally? When you go to the website, what do the systems look like as you navigate? To really grasp these foundational concepts, let's clarify what is actually happening when a site visitor shows up.
As soon as visitors arrive at youtube.com, they are being followed. It's like when you were a kid and went to your friend's house to play and their pesky kid brother just wouldn't leave you alone, but think of it this way: instead of being pesky, the brother quietly observes your behavior and accommodates your every whim. You want a snack, so he runs to the kitchen and returns with an apple. You say, “no thanks.” So he takes the apple back and returns with a bag of Cheetos. You eat the Cheetos. Then you have a conversation about Han Solo, so he runs to the living room and plays The Empire Strikes Back for you. The next time you go to their house, as soon as you walk through the door he hands you a cookie and turns on Return of the Jedi. His prediction about what you might want to eat or watch is based on the last time you came over, and it's probably spot on. Oh, and also, you're probably going to want to go to their house more often with this kind of treatment. They know what you like. (Unless he recommends The Last Jedi or Solo, in which case you'll just go to the Zuckerbergs' next time because those movies stink.)
Let's say that in place of Cheetos, you wanted carrot sticks, and in place of Star Wars, you watched The Office reruns. The next time you showed up, li'l bro would offer broccoli and Parks and Recreation. The concept works no matter your preferences.
These examples help explain YouTube's goals:
Predict what the viewer will watch.
Maximize the viewer's long‐term engagement and satisfaction.
How they do it is broken into two parts: Gathering and Using Data, and Algorithms with an “S.”
Part 1: Gathering and Using Data
YouTube collects 80 billion data points from user