diff options
Diffstat (limited to '')
| -rw-r--r-- | posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.org | 107 |
1 files changed, 107 insertions, 0 deletions
diff --git a/posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.org b/posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.org new file mode 100644 index 0000000..a3691a5 --- /dev/null +++ b/posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.org @@ -0,0 +1,107 @@ +Last night I went to the +[[http://www.meetup.com/San-Francisco-Perl-Mongers/][SF.pm]] meetup, +hosted by Craiglist (thanks for the food!), where +[[https://twitter.com/jzawodn][Jeremy Zawodny]] talked about +[[http://redis.io][Redis]] and his module +[[https://metacpan.org/module/AnyEvent::Redis::Federated][AnyEvent::Redis::Federated]]. +We were about 30 mongers. + +I was eating at the same table as Craiglist CTO's, and he went through +some details of their infrastructure. I was surprised by the quantity of +place where they use Perl, and the amount of traffic they deal with. + +** Redis + +Jeremey started his talk by explaining what is their current problem: +they have hundred of hosts in multiple data center, and they collect +continuously dozen of metrics. They looked at MySQL to store them, but +it was too slow to support the writes. Another thing important for them +is that mostly only the most recent data matters. They want to know +what's going on /now/, they don't really care about the past. + +So their goal is simple: they need something fast, /really/ fast, and +simple. That's where Redis enter the game. + +They want data replication, but Redis don't have this feature: there's +only a master/slave replication mechanism (so, one way), and they need a +solution with multi master, where a node becoming master does not drop +data. They address this issue with a "syncer", that I'll describe later. + +Because Redis is single thread, and servers have multiple cores, they +start 8 process on each node to take advantages of them. + +To me, the main benefit of Redis over Memcached is that you can use it +as a data structure server. If you only need something to store key +value, I'll prefer to stick to memcached: the community around is +bigger, there's a lot of well know patterns, and a lot of big companies +are contributing to it (lately, Twitter and FaceBook). + +The structure they use the most are the +[[http://redis.io/commands#sorted_set][/sorted set/]]. The format to +store a metric is: + +- key: =$time_period:$host:$metric= (where the $timeperiod is usually a + day) +- score: =$timestamp= +- value: =$timestamp:$value= + +In addition of storing those metrics in the nodes, they also keep a +journal of what has changed. The journal looks like this: + +- score: =$timestamp= of the last time something has changed +- value: =$key= that changed + +The journal is only one big structure, and it's used by their syncer +(more about that in a moment). The benefit of having ZSET is that they +can delete old data easily by using the key (they don't have enough +memory to store more than a couple of days, so they need to be able to +delete by day kickly). + +The journal is use for replication. Each process has a syncer that track +all his peers, pull the data from those nodes and merge them with the +local data. Earlier Jeremy mentioned that they have 8 instances on each +node, so a the syncer from process 1 on node a will only check for the +process 1 on node b. + +He also mentioned a memory optimization done by Redis (you can read more +about that [[http://redis.io/topics/memory-optimization][here]]). + +** AnyEvent::Redis::Federated + +Now, it's time to see the Perl code. =AnyEventE::Redis::Federated= is a +layer on top of =AnyEvent::Redis= that implements a consistent hashing. +I guess now every body has gave up hope to see someday +[[http://redis.io/topics/cluster-spec][redis cluster]] (and I'm more and +more convinced that hit should never be implemented, and let the client +implement their own solution for hashing / replication). + +Some of the nice feature of the modules: + +- call chaining +- [[https://metacpan.org/module/AnyEvent::Redis::Federated#SHARED-CONNECTIONS][you + can get singleton object for the connection]] +- you can also use it in blocking mode +- query all node (where you send the same command to all the node, can + be useful to do sanity check on the data) +- the client will write to one node, and let the syncer do the job + +He then showed us some code (with a very gross example: +=new AnyEvent::Redis::Federated=, I know at least +[[http://search.cpan.org/perldoc?indirect][one person]] who would have +probably said something :). + +** Concerns + +The idea seems fine, but, as one person noted during the Q&A, how will +this scale when you have more than 2 or 4 nodes in your cluster ? Since +each process' syncer need to talk to /all/ the other nodes, it will +probably be very expensive for this process to gather information from +all the nodes and write them. Also, by adding more nodes, you're storing +less information into each process, since you replicate everything. +Maybe a good solution is to keep many small cluster of 2 to 4 nodes, and +let each of them deal with some specific metrics. + +The module is not yet used in production, but they've tested it heavily, +in a lot of conditions (but I would note that there's no unit test :). +They intent to use it soon with some home made dashboard to display the +metrics. |
