diff options
| author | Franck Cuny <franck.cuny@gmail.com> | 2016-08-10 14:33:04 -0700 |
|---|---|---|
| committer | Franck Cuny <franck.cuny@gmail.com> | 2016-08-10 20:17:56 -0700 |
| commit | 8d7d02f42c3947f756c18cb4d37d9d97fbd0d27d (patch) | |
| tree | a6cecddaaea7e87d901a6c28bebe3a531438f24b /posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.org | |
| parent | Merge branch 'convert-to-org' (diff) | |
| download | lumberjaph-8d7d02f42c3947f756c18cb4d37d9d97fbd0d27d.tar.gz | |
convert back to md
Diffstat (limited to 'posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.org')
| -rw-r--r-- | posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.org | 107 |
1 files changed, 0 insertions, 107 deletions
diff --git a/posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.org b/posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.org deleted file mode 100644 index a3691a5..0000000 --- a/posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.org +++ /dev/null @@ -1,107 +0,0 @@ -Last night I went to the -[[http://www.meetup.com/San-Francisco-Perl-Mongers/][SF.pm]] meetup, -hosted by Craiglist (thanks for the food!), where -[[https://twitter.com/jzawodn][Jeremy Zawodny]] talked about -[[http://redis.io][Redis]] and his module -[[https://metacpan.org/module/AnyEvent::Redis::Federated][AnyEvent::Redis::Federated]]. -We were about 30 mongers. - -I was eating at the same table as Craiglist CTO's, and he went through -some details of their infrastructure. I was surprised by the quantity of -place where they use Perl, and the amount of traffic they deal with. - -** Redis - -Jeremey started his talk by explaining what is their current problem: -they have hundred of hosts in multiple data center, and they collect -continuously dozen of metrics. They looked at MySQL to store them, but -it was too slow to support the writes. Another thing important for them -is that mostly only the most recent data matters. They want to know -what's going on /now/, they don't really care about the past. - -So their goal is simple: they need something fast, /really/ fast, and -simple. That's where Redis enter the game. - -They want data replication, but Redis don't have this feature: there's -only a master/slave replication mechanism (so, one way), and they need a -solution with multi master, where a node becoming master does not drop -data. They address this issue with a "syncer", that I'll describe later. - -Because Redis is single thread, and servers have multiple cores, they -start 8 process on each node to take advantages of them. - -To me, the main benefit of Redis over Memcached is that you can use it -as a data structure server. If you only need something to store key -value, I'll prefer to stick to memcached: the community around is -bigger, there's a lot of well know patterns, and a lot of big companies -are contributing to it (lately, Twitter and FaceBook). - -The structure they use the most are the -[[http://redis.io/commands#sorted_set][/sorted set/]]. The format to -store a metric is: - -- key: =$time_period:$host:$metric= (where the $timeperiod is usually a - day) -- score: =$timestamp= -- value: =$timestamp:$value= - -In addition of storing those metrics in the nodes, they also keep a -journal of what has changed. The journal looks like this: - -- score: =$timestamp= of the last time something has changed -- value: =$key= that changed - -The journal is only one big structure, and it's used by their syncer -(more about that in a moment). The benefit of having ZSET is that they -can delete old data easily by using the key (they don't have enough -memory to store more than a couple of days, so they need to be able to -delete by day kickly). - -The journal is use for replication. Each process has a syncer that track -all his peers, pull the data from those nodes and merge them with the -local data. Earlier Jeremy mentioned that they have 8 instances on each -node, so a the syncer from process 1 on node a will only check for the -process 1 on node b. - -He also mentioned a memory optimization done by Redis (you can read more -about that [[http://redis.io/topics/memory-optimization][here]]). - -** AnyEvent::Redis::Federated - -Now, it's time to see the Perl code. =AnyEventE::Redis::Federated= is a -layer on top of =AnyEvent::Redis= that implements a consistent hashing. -I guess now every body has gave up hope to see someday -[[http://redis.io/topics/cluster-spec][redis cluster]] (and I'm more and -more convinced that hit should never be implemented, and let the client -implement their own solution for hashing / replication). - -Some of the nice feature of the modules: - -- call chaining -- [[https://metacpan.org/module/AnyEvent::Redis::Federated#SHARED-CONNECTIONS][you - can get singleton object for the connection]] -- you can also use it in blocking mode -- query all node (where you send the same command to all the node, can - be useful to do sanity check on the data) -- the client will write to one node, and let the syncer do the job - -He then showed us some code (with a very gross example: -=new AnyEvent::Redis::Federated=, I know at least -[[http://search.cpan.org/perldoc?indirect][one person]] who would have -probably said something :). - -** Concerns - -The idea seems fine, but, as one person noted during the Q&A, how will -this scale when you have more than 2 or 4 nodes in your cluster ? Since -each process' syncer need to talk to /all/ the other nodes, it will -probably be very expensive for this process to gather information from -all the nodes and write them. Also, by adding more nodes, you're storing -less information into each process, since you replicate everything. -Maybe a good solution is to keep many small cluster of 2 to 4 nodes, and -let each of them deal with some specific metrics. - -The module is not yet used in production, but they've tested it heavily, -in a lot of conditions (but I would note that there's no unit test :). -They intent to use it soon with some home made dashboard to display the -metrics. |
