Bogdana, a fellow GSoC student, has started developing an activity feed for the dashboard. This will give users a better overview of recent activity relating to the review requests relevant to them.
Developing this feature in a scalable manner is important. After a short design discussion in today’s meeting, it became apparent that simply using the database for storage and retrieval of activity would be insufficient. As many social networks have discovered, a RDBMS quickly becomes too slow as the number of users and activities grows. Message queues, distributed processing, and caching can play a big role in keeping things snappy.
Review Board has many uses for distributed processing and message queues. Developing the activity feed in such a way that the architecture will be reusable for other purposes is highly desirable. Distributing the processing of diffs is one change which could greatly increase Review Boards scalability.
Recently I’ve been developing Review Bot, a distributed tool for automating static analysis with Review Board. Review Bot uses Celery, a distributed task queue, with RabbitMQ as its message queue backend. I wanted to share some articles I’ve discovered while developing Review Bot, and researching queues and activity feeds. Hopefully they will be useful.
- A presentation on Etsy’s activity feed architecture: http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture
- A short explanation of message queueing for micro blogs (Think twitter). To relate this to the activity stream, think of our activities as the messages , and our users as subscribers: http://www.russellbeattie.com/blog/let-the-microblogs-bloom
- Good explanation of the benefits of queuing and distributing processing for web sites. Gets you in the right mindset for this type of work: http://decafbad.com/blog/2008/07/04/queue-everything-and-delight-everyone
- Stack overflow with applicable information: http://stackoverflow.com/questions/762490/how-do-social-networking-websites-compute-friends-updates
- Best practices for building News Feeds, some good answers: http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed?q=news+feeds
- How to build a news feed with Redis. This is for Rails, but I’m sure everyone can abstract: http://blog.waxman.me/how-to-build-a-fast-news-feed-in-redis
- RabbitMQ – AMQP message queue: http://www.rabbitmq.com/
- Good overview of RabbitMQ and AMQP: http://blogs.digitar.com/jjww/2009/01/rabbits-and-warrens/
- Celery – Distributed task processing: http://www.celeryproject.org/
- dev docs, fixing issues with celeryd: http://ask.github.com/celery/internals/app-overview.html
- django and celery quickstart: http://mathematism.com/2010/02/16/message-queues-django-and-celery-quick-start/
- Redis – Key-value store, can be used as Celery backend: http://redis.io/
- Beanstalk – simple, fast work queue, can be used as Celery backend: http://kr.github.com/beanstalkd/