summaryrefslogtreecommitdiff
path: root/posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.md
diff options
context:
space:
mode:
Diffstat (limited to 'posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.md')
-rw-r--r--posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.md108
1 files changed, 0 insertions, 108 deletions
diff --git a/posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.md b/posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.md
deleted file mode 100644
index d0dc1c3..0000000
--- a/posts/2012-11-28-perl-redis-and-anyevent-at-craiglist.md
+++ /dev/null
@@ -1,108 +0,0 @@
-Last night I went to the
-[SF.pm](http://www.meetup.com/San-Francisco-Perl-Mongers/) meetup,
-hosted by Craiglist (thanks for the food!), where
-[Jeremy Zawodny](https://twitter.com/jzawodn) talked about
-[Redis](http://redis.io) and his module
-[AnyEvent::Redis::Federated](https://metacpan.org/module/AnyEvent::Redis::Federated).
-We were about 30 mongers.
-
-I was eating at the same table as Craiglist CTO's, and he went through
-some details of their infrastructure. I was surprised by the quantity
-of place where they use Perl, and the amount of traffic they deal with.
-
-## Redis
-
-Jeremey started his talk by explaining what is their current problem:
-they have hundred of hosts in multiple data center, and they collect
-continuously dozen of metrics. They looked at MySQL to store them,
-but it was too slow to support the writes. Another thing
-important for them is that mostly only the most recent data matters. They
-want to know what's going on *now*, they don't really care about the
-past.
-
-So their goal is simple: they need something fast, *really* fast, and
-simple. That's where Redis enter the game.
-
-They want data replication, but
-Redis don't have this feature: there's only a master/slave replication
-mechanism (so, one way), and they need a solution with multi master,
-where a node becoming master does not drop data. They address this
-issue with a "syncer", that I'll describe later.
-
-Because Redis is single thread, and servers have multiple cores, they
-start 8 process on each node to take advantages of them.
-
-To me, the main benefit of Redis over Memcached is that you can use
-it as a data structure server. If you only need something to store
-key value, I'll prefer to stick to memcached: the community around is
-bigger, there's a lot of well know patterns, and a lot of big
-companies are contributing to it (lately, Twitter and FaceBook).
-
-The structure they use the most are the
-[*sorted set*](http://redis.io/commands#sorted_set). The format to store a metric is:
-
- * key: `$time_period:$host:$metric` (where the $timeperiod is
- usually a day)
- * score: `$timestamp`
- * value: `$timestamp:$value`
-
-In addition of storing those metrics in the nodes, they also keep a
-journal of what has changed. The journal looks like this:
-
- * score: `$timestamp` of the last time something has changed
- * value: `$key` that changed
-
-The journal is only one big structure, and it's used by their syncer
-(more about that in a moment). The benefit of having ZSET is that
-they can delete old data easily by using the key (they don't have
-enough memory to store more than a couple of days, so they need to be
-able to delete by day kickly).
-
-The journal is use for replication. Each process has a syncer that
-track all his peers, pull the data from those nodes and merge them
-with the local data. Earlier Jeremy mentioned that they have 8
-instances on each node, so a the syncer from process 1 on node a will
-only check for the process 1 on node b.
-
-He also mentioned a memory optimization done by Redis (you can read
-more about that [here](http://redis.io/topics/memory-optimization)).
-
-## AnyEvent::Redis::Federated
-
-Now, it's time to see the Perl code. `AnyEventE::Redis::Federated` is
-a layer on top of `AnyEvent::Redis` that implements a consistent
-hashing. I guess now every body has gave up hope to see someday
-[redis cluster](http://redis.io/topics/cluster-spec) (and I'm more and
-more convinced that hit should never be implemented, and let the
-client implement their own solution for hashing / replication).
-
-Some of the nice feature of the modules:
-
- * call chaining
- * [you can get singleton object for the connection](https://metacpan.org/module/AnyEvent::Redis::Federated#SHARED-CONNECTIONS)
- * you can also use it in blocking mode
- * query all node (where you send the same command to all the node,
- can be useful to do sanity check on the data)
- * the client will write to one node, and let the syncer do the job
-
-He then showed us some code (with a very gross example: `new
-AnyEvent::Redis::Federated`, I know at least
-[one person](http://search.cpan.org/perldoc?indirect) who would have
-probably said something :).
-
-## Concerns
-
-The idea seems fine, but, as one person noted during the Q&A, how will
-this scale when you have more than 2 or 4 nodes in your cluster ?
-Since each process' syncer need to talk to *all* the other nodes, it
-will probably be very expensive for this process to gather information
-from all the nodes and write them. Also, by adding more nodes, you're
-storing less information into each process, since you replicate
-everything. Maybe a good solution is to keep many small cluster of
-2 to 4 nodes, and let each of them deal with some specific metrics.
-
-The module is not yet used in production, but they've tested it
-heavily, in a lot of conditions (but I would note that there's no unit
-test :). They intent to use it soon with some
-home made dashboard to display the metrics.
-