diff options
Diffstat (limited to 'posts/2010-03-25-github-explorer.org')
| -rw-r--r-- | posts/2010-03-25-github-explorer.org | 236 |
1 files changed, 236 insertions, 0 deletions
diff --git a/posts/2010-03-25-github-explorer.org b/posts/2010-03-25-github-explorer.org new file mode 100644 index 0000000..efa4816 --- /dev/null +++ b/posts/2010-03-25-github-explorer.org @@ -0,0 +1,236 @@ +#+BEGIN_QUOTE + /More informations about the poster are available on + [[http://lumberjaph.net/graph/2010/04/02/github-poster.html][this + post]]/ +#+END_QUOTE + +Last year, with help from my coworkers at +[[http://linkfluence.net/][Linkfluence]], I created two sets of maps of +the [[http://perl.org][Perl]] and [[http://search.cpan.org/][CPAN]]'s +community. For this, I collected data from CPAN to create three maps: + +- [[http://cpan-explorer.org/2009/07/28/new-version-of-the-distributions-map-for-yapceu/][dependencies + between distributions]] +- [[http://cpan-explorer.org/2009/07/28/version-of-the-authors-graph-for-yapceu/][which + authors wre important in term of reliability]] +- [[http://cpan-explorer.org/2009/07/28/new-web-communities-map-for-yapceu/][and + how the websites theses authors are structured]] + +I wanted to do something similar again, but not with the same data. So I +took a look at what could be a good subject. One of the things that we +saw from the map of the websites is the importance +[[http://github.com/][GitHub]] is gaining inside the Perl community. +GitHub provides a [[http://develop.github.com/][really good API]], so I +started to play with it. + +#+BEGIN_QUOTE + This graph will be printed on a poster, size will be + [[http://en.wikipedia.org/wiki/A2_paper_size][A2]] and + [[http://en.wikipedia.org/wiki/A1_paper_size][A1]]. Please, contact me + franck.cuny [at] linkfluence.net if you will be interested by one. +#+END_QUOTE + +This time, I didn't aim for the Perl community only, but the whole +github communities. I've created several graphs: + +#+BEGIN_QUOTE + all the graph are available "on my flickr + account":http://www.flickr.com/photos/franck\_/sets/72157623447857405/ +#+END_QUOTE + +- [[http://www.flickr.com/photos/franck_/4460144638/][a graph of all + languages]] +- [[http://www.flickr.com/photos/franck_/4456072255/in/set-72157623447857405/][a + graph of the Perl community]] +- [[http://www.flickr.com/photos/franck_/4456914448/][a graph of the + Ruby community]] +- [[http://www.flickr.com/photos/franck_/4456118597/in/set-72157623447857405/][a + graph of the Python community]] +- [[http://www.flickr.com/photos/franck_/4456830956/in/set-72157623447857405/][a + graph of the PHP community]] +- [[http://www.flickr.com/photos/franck_/4456862434/in/set-72157623447857405/][a + graph of the European community]] +- [[http://www.flickr.com/photos/franck_/4456129655/in/set-72157623447857405/][a + graph of the Japan community]] + +I think a disclaimer is important at this point. I know that github +doesn't represent the whole open source community. With these maps, I +don't claim to represent what the open source world looks like right +now. This is not a troll about which language is best, or used at large. +It's *ONLY* about GitHub. + +Also, I won't provide deep analysis for each of these graphs, as I lack +insight about some of those communities. So feel free to +[[http://franck.lumberjaph.net/graphs.tgz][re-use the graphs]] and +provide your own analyses. + +** Methodology + +I didn't collect all the profiles. We (with +[[http://twitter.com/gfouetil][Guilhem]] decided to limit to peoples who +are followed by at least two other people. We did the same thing for +repositories, limiting to repositories which are at least forked once. +Using this technique, more than 17k profiles have been collected, and +nearly as many repositories. + +For each profile, using the github API, I've tried to determine what the +main language for this person is. And with the help of the +[[http://www.geonames.org][geonames]], find the right country to attach +the profile to. + +Each profile is represented by a node. For each node, the following +attributes are set: + +- name of the profile +- main language used by this profile, determined by github +- name of the country +- follower count +- following count +- repository count + +An edge is a link between two profiles. Each time someone follows +another profile, a link is created. By default, the weight of this link +is 1. For each project this person forked from the target profile, the +weight is incremented. + +As always, I've used [[http://gephi.org/][Gephi]] (now in version 0.7) +to create the graphs. Feel free to download the various graph files and +use them with Gephi. + +** Github + +#+BEGIN_QUOTE + properties of the graph: 16443 nodes / 130650 edges +#+END_QUOTE + +The first map is about all the languages available on github. This one +was really heavy, with more than 17k nodes, and 130k edges. The final +version of the graph use the 2270 more connected nodes. + +You can't miss Ruby on this map. As github uses Ruby on Rails, it's not +really surprising that the Ruby community has a particular interest on +this website. The main languages on github are what we can expect, with +PHP, Python, Perl, Javascript. + +Some languages are not really well represented. We can assume that most +Haskell projects might use darcs, and therefore are not on github. Some +other languages may use other platforms, like launchpad, or sourceforge. + +** Perl + +#+BEGIN_QUOTE + properties of the graph: 365 nodes / 4440 edges +#+END_QUOTE + +The Perl community is split into two parts. On the left side, there is +the occidental community, driven by people like +"Florian":http://github.com/rafl, "Yuval":http://github.com/nothingmuch, +"rjbs":http://github.com/rjbs, ... The second part are the japanese Perl +hackers, with Tokhuirom, Typester, Yappo, ... And in between them, +Miyagawa acts as a glue. This map looks a lot like the previous map of +the CPAN. We can see that this community is international, with the +exception of Japan that don't mix with others. + +There is no main project on github that gathers people, even though we +can see a fair amount of MooseX:: projects. Most of the developers will +work on different modules, that may not have the same purpose. Lately we +have seen a fair amount of work on various Plack stuff, mainly +middleware, but also HTTP servers (twiggy, starman, ...) and web +framework (dancer). + +One important project that is not (deliberately) represented on this +graph is the gitpan, Schwern's project. The gitpan is an import of all +the CPAN modules, with complete history using the Backpan. + +To conclude about Perl, there are only 365 nodes on this graph, but no +less than 4440 edges. That's nearly two times the number of edges +compared to the Python community. Perl is a really well structured +community, probably thanks to the CPAN, which already acted as hub for +contributors. + +** Python + +#+BEGIN_QUOTE + properties of the graph: 532 nodes / 2566 edges +#+END_QUOTE + +The Python community looks a lot like the Perl community, but only in +the structure of the graph. If we look closely, Django is the main +project that represent Python on Github, in contrast with Perl where +there is no leader. Some small projects gather small community of +developers. + +** PHP + +#+BEGIN_QUOTE + properties of the graph: 301 nodes / 1071 edges +#+END_QUOTE + +PHP is the only community that is structured this way on Github. We can +clearly see that people are structured based on a project where they +mainly contribute. + +CakePHP and Symphony are the two main projects. Nearly all the projects +gather an international community, at the exception of a few +japanese-only projects + +** Ruby + +#+BEGIN_QUOTE + properties of the graph: 3742 nodes / 24571 edges +#+END_QUOTE + +As for the Github graph, we can clearly see that some countries are +isolated. On the right side, we have: the Japan community is at the +bottom; the Spanish at the top. Australian are represented on the upper +right corner, while on the left side we got the Brazilians. + +The main projects that gather most of the hackers are Rails and Sinatra, +two famous web frameworks. + +** Europe + +#+BEGIN_QUOTE + properties of the graph: 2711 nodes / 11259 edges +#+END_QUOTE + +This one shows interesting features. Some countries are really isolated. +If we look at Spain, we can see a community of Ruby programmers, with an +important connectivity between them, but no really strong connection +with any foreign developers. We can clearly see the Perl community +exists as only one community, and is not split by country. The same is +true for Python. + +** Japanese hackers community + +#+BEGIN_QUOTE + properties of the graph: 559 nodes / 5276 edges +#+END_QUOTE + +This community is unique on github. In 2007, Yappo created +coderepos.org, a repository for open source developers in Japan. It was +a subversion repository, with Trac as an HTTP front-end. It gathered +around 900 developers, with all kind of projects (Perl, Python, Ruby, +Javascript, ...). Most of these users have switched to github now. + +Three main communities are visible on this graph: Perl; Ruby; PHP. As +always, the Javascript community as a glue between them. And yes, we can +confirm that Perl is big in Japan. + +We have seen in the previous graph that the Japanese hackers are always +isolated. We can assume that their language is an obstacle. + +This is a really well-connected graph too. + +** Conclusions and graphs + +I may have not provided a deep analysis of all the graph. I don't have +knowledge of most of the community outside of Perl. Feel free to +download the graph, to load them in Gephi, experiment, and provides your +own thoughts. + +I would like to thanks everybody at Linkfluence (guilhem for his +advices, camille for giving me time to work on this, and antonin for the +amazing poster), who have helped me and let me use time and resources to +finish this work. Special thanks to blob for reviewing my prose and cdlm +for the discussion :) |
