Secrets of Flickr Scaling Architecture

Came across this interesting post on highscalability about flicrk’s architecture and how they were able to grow and scale their hardware, software, business and growth.

If you are new to scaling, this slide might help.


Memcached for a caching layer.
Squid in reverse-proxy for html and images.
Linux (RedHat)
Smarty for templating
PEAR for XML and Email parsing
ImageMagick, for image processing
Java, for the node service
SystemImager for deployment
Ganglia for distributed system monitoring
Subcon stores essential system configuration files in a subversion repository for easy deployment to machines in a cluster.
Cvsup for distributing and updating collections of files across a network.

The Stats
More than 4 billion queries per day.
~35M photos in squid cache (total)
~2M photos in squid’s RAM
~470M photos, 4 or 5 sizes of each
38k req/sec to memcached (12M objects)
2 PB raw storage (consumed about ~1.5TB on Sunday
Over 400,000 photos being added every day

The Architecture
A pretty picture of Flickr’s architecture can be found on this slide . A simple depiction is:
— Pair of ServerIron’s
—- Squid Caches
—— Net App’s
—- PHP App Servers
—— Storage Manager
—— Master-master shards
—— Dual Tree Central Database
—— Memcached Cluster
—— Big Search Engine

Impressive, very impressive. I think it’s every developers dream and goal to work on a team which can build a successful application and scale it like flickr, craiglist and facebook. And to be honest, not all applications get the chance to reach the tip for the need of scaling.

One thought on “Secrets of Flickr Scaling Architecture

Leave a Reply