Facebook processes more than 500 TB of data daily
Facebook manages one heckuva lot of data, a company official told reporters today.
Jay Parikh, Facebook's vice president of infrastructure engineering, went down a list of data stats to break down the massive amount of data the social network processes each day.
Most of the site's data, he said, is stored in a single "cluster" that takes up more than 100 petabytes of disk space. Parikh claimed that Facebook's cluster is larger than any comparable cluster at other companies.
In addition to scanning 105 terabytes of data every 30 minutes -- a process Facebook's products team often use to gauge how products are doing -- the company manages millions of photos and logs billions of likes to make sure its site is tailored to its users.
Here's a breakdown of how much data flows through the Facebook machine each day:
- 2.7 billion likes made daily on and off of the Facebook site
- 300 million photos uploaded
- 70,000 queries executed by people and automated systems
- 500+ terabytes of new data "ingested"
Since Facebook uses this data to build its user experience, it wants teams from across the company -- whether they sell ads or build functions -- to be able to access any of the data as needed. Parikh said this keeps the creation and improvement of Facebook features as fast as possible.
A function like friend recommendations, for example, needs constant data updates, so that when you add a new friend, you see those connections immediately, Parikh said.
These nearly real-time efforts apply to most functions throughout the site because people won't use the site if the personalized experience is poor, or slow, he said.
"We can't afford for your photo be be uploaded and stored next week," Parikh said.
Instead of partitioning the data -- essentially dividing it up and storing it based on criteria -- like most companies do to make data more manageable, Facebook keeps it in one place for easy access.
That means an engineer who wants to identify stats or trends in a function, like how quickly people respond to messages, can easily get the data, write a code, and get results.
When pressed by reporters, Parikh said Facebook has a zero-tolerance policy when it comes to any abuse from this broad access. Additionally, all access is logged and monitored heavily, he said.
If you want to see Parikh's short presentation and a flow chart of its data system, see below.
Updated at 3:03 p.m. PT: with more info and a slideshow.