Hulu is an online video service that offers a selection of hit
TV shows, clips, movies and more on the free, ad-supported
Hulu.com service, and the subscription service Hulu Plus. One
of the top video streaming sites in the U.S., today the service
has over four million subscribers and approximately 30 million
unique viewers per month.
CHALLENGE
Inability to Scale MySQL and Memcached
In 2012, Hulu’s subscriber base passed the two million mark,
and the back-end systems that tracked viewer history started
to breakdown. When a video is played, the system records
information from the player to keep track of both the video and
the viewing position or timeframe. When the video application
is closed, the stored information allows the user to resume
the video where they left off. The system also provides
recommendations for what videos to watch next based on
user history.
Originally designed as a Python application, Hulu’s viewed history
tracking system relied on Memcached for reads on top of a
sharded MySQL database for writes. When the Hulu engineering
team started to see that MySQL couldn’t handle the volume
of writes, the only way to scale was to add more shards. Reads
were done in Memcached to preserve I/O on the database, but
Memcached could not be replicated. So, user history was served
out of one shard in one datacenter.
With the occurrence of peak time failures and an understanding
of root causes, the core Hulu engineering team began to design
a solution with four overarching requirements:
1.	 Faster reads and writes
2.	 The capacity to scale to 10,000 queries/second with low
latency
3.	 Replication of data across datacenters
4.	 High availability of cached data with no single point of failure
AT-A-GLANCE
Challenges
•	 MySQL overwhelmed by the volume of writes
•	 Memcached could not be replicated across
datacenters to distribute load
•	 Latency on queries with degrading performance
•	 No high-availability strategy
Solution
•	 Redis
Key Benefits
•	 Accelerated writing and retrieval of information
with 800% performance improvement for queries
•	 Replication across datacenters
•	 Capacity to handle at least 10,000 queries per second
with low latency
•	 Open management APIs allow for high availability
•	 Ability to use data structures for flexible and efficient
queries
CASE STUDY
Hulu
LEADING VIDEO COMPANY SCALES TO SERVE 4 BILLION VIDEOS
WITH 800% QUERY PERFORMANCE IMPROVEMENT
OVERVIEW
“	We chose Redis because it was simple to
set up, had great documentation, offered
replication, and allowed us to use data structures.
Data structures are extremely powerful and
allow us to architect solutions to many use
cases very efficiently.”
—	Andres Rangel, Senior Software Engineer, Hulu
pivotal.io
Pivotal is a registered trademark or trademark of Pivotal Software, Inc. in the United States and other countries. All other trademarks used herein are the property of their respective owners. © Copyright 2014 Pivotal Software,
Inc. All rights reserved.. Published in the USA. PVTL-CS-343-12/13
At Pivotal our mission is to enable customers to build a new class of applications, leveraging big and fast data, and do all of this with the power of cloud independence.
Uniting selected technology, people and programs from EMC and VMware, the following products and services are now part of Pivotal: Greenplum, Cloud Foundry, Spring,
GemFire and other products from the VMware vFabric Suite, Cetas and Pivotal Labs.
CASE STUDY HULU
Pivotal 3495 Deer Creek Road Palo Alto, CA 94304 pivotal.io
SOLUTION
The Path to 3 Million Subscribers
After looking at a variety of NoSQL alternatives like MongoDB,
Riak, and LevelDB, Hulu selected Redis. Describing the process,
Andres Rangel, Senior Software Engineer, stated, “We chose
Redis for several key reasons – it was simple to set up, had great
documentation, offered replication, and allowed us to use data
structures. Data structures are extremely powerful and allow
us to architect solutions to many use cases very efficiently. For
example, depending on the operation, we have the need to query
either a specific video a user watched, or all of them. With Redis,
this was easy using hashes.”
To meet all requirements, there were some minor areas that
needed additional development. First, the Hulu team took a look
at how the data was sharded. They were able to easily shard on
user_id. “We scale Redis by sharding the data, and the intelligence
about shards is in the application logic,” noted Rangel. Second,
Redis didn’t have the Sentinel implementation of monitoring and
automatic failover at that time. Since Redis has a open API, the
Hulu team was able to create their own Sentinel mechanism to
support high availability.
BUSINESS BENEFITS
Redis provided the following benefits to Hulu:
Open Ended Scaling for Reads and Writes
“Since Redis supports replication, it became possible to
reorganize the data map so writes and reads could be easily
separated, load-balanced and scaled across datacenters,” said
Rangel. Reads are routinely balanced across Redis shards. Each
shard is replicated to a set of slaves in each datacenter. A user
only exists on a single shard, which ensures that newly-added
users distribute evenly across the shards. The architecture is
highly repeatable and provides Hulu with a linear scalability path.
800% Performance Improvement for Queries
With queries running on dedicated, load-balanced slaves in
regional datacenters – instead of all out of the west coast –
speed and performance improvements were expected for
this new architecture. According to Rangel, “For performance
considerations, we decided to pre-shard the system into 64
instances. We replicate the master shard to a slave in the same
datacenter and to a slave in the second datacenter. This way,
applications in the other datacenter read locally from the Redis
slave and achieve greater performance. The result was that 75% of
the latency in reads from the east coast was reduced from 120 ms
to less than 15 ms, and 90% went from 300ms to around 25ms.”
Greater Performance with Data Durability
To build data durability into their system, Hulu decided to use
Apache Cassandra as the persistent data store where all writes
are made. As data is ingested, it is written from Cassandra to
Redis. As Rangel describes the solution, “The first time a request
comes for a user, the system will create a job to load all the
videos for this user into Redis. Once this is done, the system will
update a flag. The next time a request comes in for this user, the
flag is set. Then, the system returns whatever it has from Redis
without hitting Cassandra. This way, access to the database is
greatly reduced, and we aren’t required to have every record in
Redis. When Redis queries are faster than Cassandra by a huge
margin, we achieve the low latency reads for active users by
having their data in Redis. This means we can leave Cassandra for
batch reports where the latency is not important.”
CONCLUSION
As Hulu pursues a superior experience for users, content owners,
and advertisers in the future, they are confident in the long-term
scalability of their back-end systems. The features in Redis will
continue to provide a high-performance data tier as Hulu’s user
base grows.
LEARN MORE
To learn more about our products, services and solutions, visit us
at pivotal.io.

Hulu Case Study

  • 1.
    Hulu is anonline video service that offers a selection of hit TV shows, clips, movies and more on the free, ad-supported Hulu.com service, and the subscription service Hulu Plus. One of the top video streaming sites in the U.S., today the service has over four million subscribers and approximately 30 million unique viewers per month. CHALLENGE Inability to Scale MySQL and Memcached In 2012, Hulu’s subscriber base passed the two million mark, and the back-end systems that tracked viewer history started to breakdown. When a video is played, the system records information from the player to keep track of both the video and the viewing position or timeframe. When the video application is closed, the stored information allows the user to resume the video where they left off. The system also provides recommendations for what videos to watch next based on user history. Originally designed as a Python application, Hulu’s viewed history tracking system relied on Memcached for reads on top of a sharded MySQL database for writes. When the Hulu engineering team started to see that MySQL couldn’t handle the volume of writes, the only way to scale was to add more shards. Reads were done in Memcached to preserve I/O on the database, but Memcached could not be replicated. So, user history was served out of one shard in one datacenter. With the occurrence of peak time failures and an understanding of root causes, the core Hulu engineering team began to design a solution with four overarching requirements: 1. Faster reads and writes 2. The capacity to scale to 10,000 queries/second with low latency 3. Replication of data across datacenters 4. High availability of cached data with no single point of failure AT-A-GLANCE Challenges • MySQL overwhelmed by the volume of writes • Memcached could not be replicated across datacenters to distribute load • Latency on queries with degrading performance • No high-availability strategy Solution • Redis Key Benefits • Accelerated writing and retrieval of information with 800% performance improvement for queries • Replication across datacenters • Capacity to handle at least 10,000 queries per second with low latency • Open management APIs allow for high availability • Ability to use data structures for flexible and efficient queries CASE STUDY Hulu LEADING VIDEO COMPANY SCALES TO SERVE 4 BILLION VIDEOS WITH 800% QUERY PERFORMANCE IMPROVEMENT OVERVIEW “ We chose Redis because it was simple to set up, had great documentation, offered replication, and allowed us to use data structures. Data structures are extremely powerful and allow us to architect solutions to many use cases very efficiently.” — Andres Rangel, Senior Software Engineer, Hulu pivotal.io
  • 2.
    Pivotal is aregistered trademark or trademark of Pivotal Software, Inc. in the United States and other countries. All other trademarks used herein are the property of their respective owners. © Copyright 2014 Pivotal Software, Inc. All rights reserved.. Published in the USA. PVTL-CS-343-12/13 At Pivotal our mission is to enable customers to build a new class of applications, leveraging big and fast data, and do all of this with the power of cloud independence. Uniting selected technology, people and programs from EMC and VMware, the following products and services are now part of Pivotal: Greenplum, Cloud Foundry, Spring, GemFire and other products from the VMware vFabric Suite, Cetas and Pivotal Labs. CASE STUDY HULU Pivotal 3495 Deer Creek Road Palo Alto, CA 94304 pivotal.io SOLUTION The Path to 3 Million Subscribers After looking at a variety of NoSQL alternatives like MongoDB, Riak, and LevelDB, Hulu selected Redis. Describing the process, Andres Rangel, Senior Software Engineer, stated, “We chose Redis for several key reasons – it was simple to set up, had great documentation, offered replication, and allowed us to use data structures. Data structures are extremely powerful and allow us to architect solutions to many use cases very efficiently. For example, depending on the operation, we have the need to query either a specific video a user watched, or all of them. With Redis, this was easy using hashes.” To meet all requirements, there were some minor areas that needed additional development. First, the Hulu team took a look at how the data was sharded. They were able to easily shard on user_id. “We scale Redis by sharding the data, and the intelligence about shards is in the application logic,” noted Rangel. Second, Redis didn’t have the Sentinel implementation of monitoring and automatic failover at that time. Since Redis has a open API, the Hulu team was able to create their own Sentinel mechanism to support high availability. BUSINESS BENEFITS Redis provided the following benefits to Hulu: Open Ended Scaling for Reads and Writes “Since Redis supports replication, it became possible to reorganize the data map so writes and reads could be easily separated, load-balanced and scaled across datacenters,” said Rangel. Reads are routinely balanced across Redis shards. Each shard is replicated to a set of slaves in each datacenter. A user only exists on a single shard, which ensures that newly-added users distribute evenly across the shards. The architecture is highly repeatable and provides Hulu with a linear scalability path. 800% Performance Improvement for Queries With queries running on dedicated, load-balanced slaves in regional datacenters – instead of all out of the west coast – speed and performance improvements were expected for this new architecture. According to Rangel, “For performance considerations, we decided to pre-shard the system into 64 instances. We replicate the master shard to a slave in the same datacenter and to a slave in the second datacenter. This way, applications in the other datacenter read locally from the Redis slave and achieve greater performance. The result was that 75% of the latency in reads from the east coast was reduced from 120 ms to less than 15 ms, and 90% went from 300ms to around 25ms.” Greater Performance with Data Durability To build data durability into their system, Hulu decided to use Apache Cassandra as the persistent data store where all writes are made. As data is ingested, it is written from Cassandra to Redis. As Rangel describes the solution, “The first time a request comes for a user, the system will create a job to load all the videos for this user into Redis. Once this is done, the system will update a flag. The next time a request comes in for this user, the flag is set. Then, the system returns whatever it has from Redis without hitting Cassandra. This way, access to the database is greatly reduced, and we aren’t required to have every record in Redis. When Redis queries are faster than Cassandra by a huge margin, we achieve the low latency reads for active users by having their data in Redis. This means we can leave Cassandra for batch reports where the latency is not important.” CONCLUSION As Hulu pursues a superior experience for users, content owners, and advertisers in the future, they are confident in the long-term scalability of their back-end systems. The features in Redis will continue to provide a high-performance data tier as Hulu’s user base grows. LEARN MORE To learn more about our products, services and solutions, visit us at pivotal.io.