diff options
Diffstat (limited to '')
| -rw-r--r-- | posts/2010-03-25-github-explorer.org | 236 |
1 files changed, 0 insertions, 236 deletions
diff --git a/posts/2010-03-25-github-explorer.org b/posts/2010-03-25-github-explorer.org deleted file mode 100644 index efa4816..0000000 --- a/posts/2010-03-25-github-explorer.org +++ /dev/null @@ -1,236 +0,0 @@ -#+BEGIN_QUOTE - /More informations about the poster are available on - [[http://lumberjaph.net/graph/2010/04/02/github-poster.html][this - post]]/ -#+END_QUOTE - -Last year, with help from my coworkers at -[[http://linkfluence.net/][Linkfluence]], I created two sets of maps of -the [[http://perl.org][Perl]] and [[http://search.cpan.org/][CPAN]]'s -community. For this, I collected data from CPAN to create three maps: - -- [[http://cpan-explorer.org/2009/07/28/new-version-of-the-distributions-map-for-yapceu/][dependencies - between distributions]] -- [[http://cpan-explorer.org/2009/07/28/version-of-the-authors-graph-for-yapceu/][which - authors wre important in term of reliability]] -- [[http://cpan-explorer.org/2009/07/28/new-web-communities-map-for-yapceu/][and - how the websites theses authors are structured]] - -I wanted to do something similar again, but not with the same data. So I -took a look at what could be a good subject. One of the things that we -saw from the map of the websites is the importance -[[http://github.com/][GitHub]] is gaining inside the Perl community. -GitHub provides a [[http://develop.github.com/][really good API]], so I -started to play with it. - -#+BEGIN_QUOTE - This graph will be printed on a poster, size will be - [[http://en.wikipedia.org/wiki/A2_paper_size][A2]] and - [[http://en.wikipedia.org/wiki/A1_paper_size][A1]]. Please, contact me - franck.cuny [at] linkfluence.net if you will be interested by one. -#+END_QUOTE - -This time, I didn't aim for the Perl community only, but the whole -github communities. I've created several graphs: - -#+BEGIN_QUOTE - all the graph are available "on my flickr - account":http://www.flickr.com/photos/franck\_/sets/72157623447857405/ -#+END_QUOTE - -- [[http://www.flickr.com/photos/franck_/4460144638/][a graph of all - languages]] -- [[http://www.flickr.com/photos/franck_/4456072255/in/set-72157623447857405/][a - graph of the Perl community]] -- [[http://www.flickr.com/photos/franck_/4456914448/][a graph of the - Ruby community]] -- [[http://www.flickr.com/photos/franck_/4456118597/in/set-72157623447857405/][a - graph of the Python community]] -- [[http://www.flickr.com/photos/franck_/4456830956/in/set-72157623447857405/][a - graph of the PHP community]] -- [[http://www.flickr.com/photos/franck_/4456862434/in/set-72157623447857405/][a - graph of the European community]] -- [[http://www.flickr.com/photos/franck_/4456129655/in/set-72157623447857405/][a - graph of the Japan community]] - -I think a disclaimer is important at this point. I know that github -doesn't represent the whole open source community. With these maps, I -don't claim to represent what the open source world looks like right -now. This is not a troll about which language is best, or used at large. -It's *ONLY* about GitHub. - -Also, I won't provide deep analysis for each of these graphs, as I lack -insight about some of those communities. So feel free to -[[http://franck.lumberjaph.net/graphs.tgz][re-use the graphs]] and -provide your own analyses. - -** Methodology - -I didn't collect all the profiles. We (with -[[http://twitter.com/gfouetil][Guilhem]] decided to limit to peoples who -are followed by at least two other people. We did the same thing for -repositories, limiting to repositories which are at least forked once. -Using this technique, more than 17k profiles have been collected, and -nearly as many repositories. - -For each profile, using the github API, I've tried to determine what the -main language for this person is. And with the help of the -[[http://www.geonames.org][geonames]], find the right country to attach -the profile to. - -Each profile is represented by a node. For each node, the following -attributes are set: - -- name of the profile -- main language used by this profile, determined by github -- name of the country -- follower count -- following count -- repository count - -An edge is a link between two profiles. Each time someone follows -another profile, a link is created. By default, the weight of this link -is 1. For each project this person forked from the target profile, the -weight is incremented. - -As always, I've used [[http://gephi.org/][Gephi]] (now in version 0.7) -to create the graphs. Feel free to download the various graph files and -use them with Gephi. - -** Github - -#+BEGIN_QUOTE - properties of the graph: 16443 nodes / 130650 edges -#+END_QUOTE - -The first map is about all the languages available on github. This one -was really heavy, with more than 17k nodes, and 130k edges. The final -version of the graph use the 2270 more connected nodes. - -You can't miss Ruby on this map. As github uses Ruby on Rails, it's not -really surprising that the Ruby community has a particular interest on -this website. The main languages on github are what we can expect, with -PHP, Python, Perl, Javascript. - -Some languages are not really well represented. We can assume that most -Haskell projects might use darcs, and therefore are not on github. Some -other languages may use other platforms, like launchpad, or sourceforge. - -** Perl - -#+BEGIN_QUOTE - properties of the graph: 365 nodes / 4440 edges -#+END_QUOTE - -The Perl community is split into two parts. On the left side, there is -the occidental community, driven by people like -"Florian":http://github.com/rafl, "Yuval":http://github.com/nothingmuch, -"rjbs":http://github.com/rjbs, ... The second part are the japanese Perl -hackers, with Tokhuirom, Typester, Yappo, ... And in between them, -Miyagawa acts as a glue. This map looks a lot like the previous map of -the CPAN. We can see that this community is international, with the -exception of Japan that don't mix with others. - -There is no main project on github that gathers people, even though we -can see a fair amount of MooseX:: projects. Most of the developers will -work on different modules, that may not have the same purpose. Lately we -have seen a fair amount of work on various Plack stuff, mainly -middleware, but also HTTP servers (twiggy, starman, ...) and web -framework (dancer). - -One important project that is not (deliberately) represented on this -graph is the gitpan, Schwern's project. The gitpan is an import of all -the CPAN modules, with complete history using the Backpan. - -To conclude about Perl, there are only 365 nodes on this graph, but no -less than 4440 edges. That's nearly two times the number of edges -compared to the Python community. Perl is a really well structured -community, probably thanks to the CPAN, which already acted as hub for -contributors. - -** Python - -#+BEGIN_QUOTE - properties of the graph: 532 nodes / 2566 edges -#+END_QUOTE - -The Python community looks a lot like the Perl community, but only in -the structure of the graph. If we look closely, Django is the main -project that represent Python on Github, in contrast with Perl where -there is no leader. Some small projects gather small community of -developers. - -** PHP - -#+BEGIN_QUOTE - properties of the graph: 301 nodes / 1071 edges -#+END_QUOTE - -PHP is the only community that is structured this way on Github. We can -clearly see that people are structured based on a project where they -mainly contribute. - -CakePHP and Symphony are the two main projects. Nearly all the projects -gather an international community, at the exception of a few -japanese-only projects - -** Ruby - -#+BEGIN_QUOTE - properties of the graph: 3742 nodes / 24571 edges -#+END_QUOTE - -As for the Github graph, we can clearly see that some countries are -isolated. On the right side, we have: the Japan community is at the -bottom; the Spanish at the top. Australian are represented on the upper -right corner, while on the left side we got the Brazilians. - -The main projects that gather most of the hackers are Rails and Sinatra, -two famous web frameworks. - -** Europe - -#+BEGIN_QUOTE - properties of the graph: 2711 nodes / 11259 edges -#+END_QUOTE - -This one shows interesting features. Some countries are really isolated. -If we look at Spain, we can see a community of Ruby programmers, with an -important connectivity between them, but no really strong connection -with any foreign developers. We can clearly see the Perl community -exists as only one community, and is not split by country. The same is -true for Python. - -** Japanese hackers community - -#+BEGIN_QUOTE - properties of the graph: 559 nodes / 5276 edges -#+END_QUOTE - -This community is unique on github. In 2007, Yappo created -coderepos.org, a repository for open source developers in Japan. It was -a subversion repository, with Trac as an HTTP front-end. It gathered -around 900 developers, with all kind of projects (Perl, Python, Ruby, -Javascript, ...). Most of these users have switched to github now. - -Three main communities are visible on this graph: Perl; Ruby; PHP. As -always, the Javascript community as a glue between them. And yes, we can -confirm that Perl is big in Japan. - -We have seen in the previous graph that the Japanese hackers are always -isolated. We can assume that their language is an obstacle. - -This is a really well-connected graph too. - -** Conclusions and graphs - -I may have not provided a deep analysis of all the graph. I don't have -knowledge of most of the community outside of Perl. Feel free to -download the graph, to load them in Gephi, experiment, and provides your -own thoughts. - -I would like to thanks everybody at Linkfluence (guilhem for his -advices, camille for giving me time to work on this, and antonin for the -amazing poster), who have helped me and let me use time and resources to -finish this work. Special thanks to blob for reviewing my prose and cdlm -for the discussion :) |
