diff options
| author | Franck Cuny <franckcuny@gmail.com> | 2016-08-04 11:45:44 -0700 |
|---|---|---|
| committer | Franck Cuny <franckcuny@gmail.com> | 2016-08-04 11:45:44 -0700 |
| commit | 585b48b6a605cb71ef99dd767880e1b7ee5bf24e (patch) | |
| tree | c65377350d12bd1e62e0bdd58458c1044541c27b /posts/2011-06-20-stargit.org | |
| parent | Use Bullet list for the index. (diff) | |
| parent | Mass convert all posts from markdown to org. (diff) | |
| download | lumberjaph-585b48b6a605cb71ef99dd767880e1b7ee5bf24e.tar.gz | |
Merge branch 'convert-to-org'
Diffstat (limited to 'posts/2011-06-20-stargit.org')
| -rw-r--r-- | posts/2011-06-20-stargit.org | 459 |
1 files changed, 459 insertions, 0 deletions
diff --git a/posts/2011-06-20-stargit.org b/posts/2011-06-20-stargit.org new file mode 100644 index 0000000..4d4f77f --- /dev/null +++ b/posts/2011-06-20-stargit.org @@ -0,0 +1,459 @@ +Last year I did a +[[http://lumberjaph.net/graph/2010/03/25/github-explorer.html][small +exploration of GitHub]] to show the various communities using +[[http://github.com][GitHub]] and how they work. I wanted to do it again +this year, but I was lacking time and motivation to start over. A couple +of months ago, I got a message from +[[https://twitter.com/#!/mojombo][mojombo]] asking me if I was planning +to do a new poster. This triggered the motivation to work on it again. + +This time I got help from [[https://twitter.com/#!/jacomyal][Alexis]] to +provide you with an awesome tool: [[http://www.stargit.net][a real +explorer of your graph]], but more on this later ;) + +And of course, [[http://labs.linkfluence.net][the poster]]. Feel free to +print it yourself, the size of the poster is A1. + +** The data + +All the data are available! Last year I got some mails asking me for the +dataset. So this time I asked first if I could release the +[[http://maps.startigt.net/dump/github.tgz][data]] with the +[[http://git.lumberjaph.net/p5-stargit.git/][code]] and the poster, and +the anwser is yes! So if you're intereseted, you can download it. + +The data are stored in mongodb, so I provide the dump which you can +easily use: + +#+BEGIN_SRC sh + % wget http://maps.stargit.net/dump/github. + % tar xvzf github.tgz + % cd github + % mongorestore -d github . +#+END_SRC + +Now you can use mongodb to browse the imported database. There is 5 +collections: profiles / repositories / relations / contributions / +edges. + +** Methodology + +Last year I did a simple "follower/following" graph. It was already +interesting, but it was also /really/ too simple. This time I wanted to +go deeper in the exploration. + +The various step to process all this data are: + +- using the GitHub API, fetch informations from the profiles. +- when all the profiles are collected, informations about the + repositories are fetched. Only forked repositories are kept. +- "simple" relations (followers/following) are kept and used later to + add weight to relations. +- tag user with the main programming language they use. Using the + GitHub API, I was able to categorize ~40k profiles (about 1/3 of my + whole dataset). +- using the GeoNames API, extract the name of the country the user is + in. This time, about 55k profiles were tagged. +- fetch contributions for each repositories +- compute a score between the author of the contribution and the owner + of the repo +- add a weight to each edges, using the computed score and "+1" if the + developer follow the other developer + +For all the graphs, I've used the following colors for: + +- Ruby +- JavaScript +- Python +- C (C++, C#) +- Perl +- PHP +- JVM (Java, Clojure, Scala) +- Lisp (Emacs Lisp, Common Lisp) +- Other + +** Exploring + +Feel free to do your own analysis in the comments :) For each map, +you'll find a PDF of the map, and the graph to explore using gephi (in +GEXF or GDF format). + +*** but first, some numbers + +I've collected: + +- 123 562 profiles +- 2 730 organizations +- 40 807 repositories + +This took me about a month in order to collect the data and to build the +adapted tools. + +*** Accounts creations + +The following chart show the number of account created by month. +"Everyone" means the total of accounts created. You can also see the +numbers for each communities. + +On the "Everyone" graph, you can see a huge pick around April 2008, +that's the date GitHub [[https://github.com/blog/40-we-launched][was +launched]]. + +For most of the communities, the number of created accounts start to +decrease since 2010. I think the reason is that most of the developers +from those communities are now on GitHub. + +#+BEGIN_HTML + <script language="javascript" type="text/javascript" src="/js/jquery.js"></script> +#+END_HTML + +#+BEGIN_HTML + <script language="javascript" type="text/javascript" src="/js/jquery.flot.js"></script> +#+END_HTML + +#+BEGIN_HTML + <div id="placeholder" style="width:800px;height:300px;"> +#+END_HTML + +#+BEGIN_HTML + </div> +#+END_HTML + +#+BEGIN_HTML + <ul class="actions"> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + <li class="minibutton"> +#+END_HTML + +#+BEGIN_HTML + </li> +#+END_HTML + +#+BEGIN_HTML + </ul> +#+END_HTML + +#+BEGIN_HTML + <script type="text/javascript"> + $(function () { + var options = { + lines: { show: true }, + points: { show: true }, + xaxis: { mode:"time" } + }; + var data = []; + var placeholder = $("#placeholder"); + + $.plot(placeholder, data, options); + + // fetch one series, adding to what we got + var alreadyFetched = {}; + + $("input.resetSeries").click(function() { + alreadyFetched = {}; + data = []; + $.plot(placeholder, data, options); + }); + + $("input.fetchSeries").click(function () { + var button = $(this); + + // find the URL in the link right next to us + var dataurl = button.attr('href'); + + // then fetch the data with jQuery + function onDataReceived(series) { + // extract the first coordinate pair so you can see that + // data is now an ordinary Javascript object + var firstcoordinate = '(' + series.data[0][0] + ', ' + series.data[0][1] + ')'; + + // let's add it to our current data + if (!alreadyFetched[series.label]) { + alreadyFetched[series.label] = true; + data.push(series); + } + + // and plot all we got + $.plot(placeholder, data, options); + } + + $.ajax({ + url: dataurl, + method: 'GET', + dataType: 'json', + success: onDataReceived + }); + }); + }); + </script> +#+END_HTML + +*** Languages + +(Keep in mind that these numbers are coming from the profiles I was able +to tag, roughly 40k) + +- Ruby: 10046 (28%) +- Python: 5403 (15%) +- JavaScript: 5282 (15%) (JavaScript + CoffeeScript) +- C: 5093 (14%) (C, C++, C#) +- PHP: 3933 (11%) +- JVM: 3790 (10%) (Java, Clojure, Scala, Groovy) +- Perl: 1215 (3%) +- Lisp: 348 (0%) (Emacs Lisp, Common Lisp) + +Those numbers doesn't really match "what GitHub +gave":https://github.com/languages, but it could be explained by the way +I've selected my users. + +*** Country + +- United States: 19861 (36%) +- United Kingdom: 3533 (6%) +- Germany: 3009 (5%) +- Canada: 2657 (4%) +- Brazil: 2454 (4%) +- France: 1833 (3%) +- Japan: 1799 (3%) +- Russia: 1604 (2%) +- Australia: 1441 (2%) +- China: 1159 (2%) + +The United States are still the main country represented on GitHub, no +suprise here. + +If you are interested in the "geography" of Open Source, you should read +these two articles: [[http://takhteyev.org/dissertation/][Coding +Places]] and +[[http://takhteyev.org/papers/Takhteyev-Hilts-2010.pdf][Investigating +the Geography of Open Source Software through GitHub]]. + +*** companies + +Looking at the "company" field on user's profile, here are some stats +about which companies has employees using GitHub: + +- ThoughtWorks: 102 +- Google: 66 +- Mozilla: 65 +- Yahoo!: 65 +- Red Hat: 64 +- Globo.com: 55 +- Twitter: 53 +- Facebook: 45 +- Yandex: 43 +- Intridea: 34 +- Microsoft: 33 +- Engine Yard: 32 +- Pivotal Labs: 29 +- MIT: 28 +- Rackspace: 27 +- IBM: 24 +- Caelum: 23 +- Novell: 22 +- GitHub: 22 +- VMware: 22 + +I didn't knew the first company, ThoughtWorks, and I was expecting to +see FaceBook or Twitter as the company with most developpers on GitHub. +It's also interesting to see Yandex here. + +** Global graph (1628 nodes, 9826 edges) + +([download PDF](http://maps.stargit.net/global/global.pdf, "download +GDF":http://maps.stargit.net/global/global.gdf)) + +The main difference with last year, is the android / modders community. +They're developing mostly in C and Java. The poster has been created +from this map. + +** Ruby (1968 nodes, 9662 edges) + +([[http://maps.stargit.net/ruby/ruby.pdf][download PDF]], +[[http://maps.stargit.net/ruby/ruby.gdf][download GDF]], +[[http://maps.stargit.net/ruby/ruby.gexf][download GEXF]]) + +This is still the main community on GitHub, even if JavaScript is now +[[https://github.com/languages/JavaScript][the most popular language]]. +This graph is really dense, it's not easy to read, since there is no +real cluster in this one. + +** Python (1062 nodes, 2631 edges) + +([[http://maps.stargit.net/python/python.pdf][download PDF]], +[[http://maps.stargit.net/python/python.gdf][download GDF]]) + +Here we have some clusters. I'm not familiar with the Python community, +so I can't really give any insight. + +** Perl (608 nodes, 2967 edges) + +([[http://maps.stargit.net/perl/perl.pdf][download PDF]], +[[http://maps.stargit.net/perl/perl.gdf][download GDF]], +[[http://maps.stargit.net/perl/perl.gexf][download GEXF]]) + +I really like this graph since it show (in my opinion) one of the real +strength of this community: everybody works with everybody. People +working on a webframework will collaborate with people working on Moose, +or an ORM, or other tools. It shows that in this community, people are +competent in more than one field. + +The Perl community is about the same size as last year. However, we can +extract the following informations: + +- the Japaneses Perl Hackers are still a cluster by themselves +- [[http://github.com/miyagawa][miyagawa]] is still the glue between + the Japanese community and the "rest of the world" +- other leaders are: Florian Ragwitz + ([[http://github.com/rafl][rafl]]), Andy Amstrong + ([[http://github.com/andya][AndyA]]), Dave Rolsky + ([[http://github.com/autarch][autarch]]) +- some clusters exists for Moose and Dancer. + +As we can see on the previous charts, the number of created accounts for +the Perl developpers is stalling. + +** United States (2646 nodes, 11344 edges) + +([[http://maps.startgit.net/unitedstates/unitedstates.pdf][download +PDF]], +[[http://maps.startgit.net/unitedstates/unitedstates.gdf][download +GDF]], +[[http://maps.startgit.net/unitedstates/unitedstates.gexf][download +GEXF]]) + +This one is really nice. We can clearly see all the communities. There +is something interesting: + +- C and Ruby are on the opposite side (C on the left, Ruby on the + right) +- Python and Perl are also opposed (Perl at the bottom and Python at + the top) + +I'll let you take some conclusion by yourself on this one ;) + +** France (706 nodes, 1059 edges) + +([[http://maps.stargit.net/france/france.pdf][download PDF]], +[[http://maps.stargit.net/france/france.gdf][download GDF]], +[[http://maps.stargit.net/france/france.gexf][download GEXF]]) + +We have a lot of small clusters on this one, and some very big +authorities. + +** Japan (464 nodes, 1091 edges) + +([[http://maps.stargit.net/japan/japan.pdf][download PDF]], +[[http://maps.stargit.net/japan/japan.gdf][download GDF]], +[[http://maps.stargit.net/japan/japan.gexf][download GEXF]]) + +There is three dominants clusters on this one: + +- Ruby +- Perl +- C + +The Ruby and Perl one are well connected. There is a lot of japanese +hacker on CPAN using both languages. + +** StarGit + +[[http://stargit.net][StarGit]] is a great tool we built with Alexis to +let you explore *your* community on GitHub. You can read more about the +application on +[[http://ofnodesandedges.com/2011/06/20/stargit.html][Alexis' blog]]. + +It's hosted on [[http://dotcloud.com][dotcloud]] (I'm still amazed at +how easy it was to deploy the code ...), using the Perl +[[http://perldancer.org][Dancer web framework]], MongoDB to store the +data, and Redis to do some caching. + +** Credits + +I would like to thanks the whole GitHub team for being interested in the +previous poster and to ask another one this year :) + +A *huge* thanks to Alexis for his help on building the awesome StarGit. +Another big thanks to Antonin for his work on the poster. |
