summaryrefslogtreecommitdiff
path: root/posts/2011-06-20-stargit.org
diff options
context:
space:
mode:
authorFranck Cuny <franck.cuny@gmail.com>2016-08-10 14:33:04 -0700
committerFranck Cuny <franck.cuny@gmail.com>2016-08-10 20:17:56 -0700
commit8d7d02f42c3947f756c18cb4d37d9d97fbd0d27d (patch)
treea6cecddaaea7e87d901a6c28bebe3a531438f24b /posts/2011-06-20-stargit.org
parentMerge branch 'convert-to-org' (diff)
downloadlumberjaph-8d7d02f42c3947f756c18cb4d37d9d97fbd0d27d.tar.gz
convert back to md
Diffstat (limited to '')
-rw-r--r--posts/2011-06-20-stargit.org459
1 files changed, 0 insertions, 459 deletions
diff --git a/posts/2011-06-20-stargit.org b/posts/2011-06-20-stargit.org
deleted file mode 100644
index 4d4f77f..0000000
--- a/posts/2011-06-20-stargit.org
+++ /dev/null
@@ -1,459 +0,0 @@
-Last year I did a
-[[http://lumberjaph.net/graph/2010/03/25/github-explorer.html][small
-exploration of GitHub]] to show the various communities using
-[[http://github.com][GitHub]] and how they work. I wanted to do it again
-this year, but I was lacking time and motivation to start over. A couple
-of months ago, I got a message from
-[[https://twitter.com/#!/mojombo][mojombo]] asking me if I was planning
-to do a new poster. This triggered the motivation to work on it again.
-
-This time I got help from [[https://twitter.com/#!/jacomyal][Alexis]] to
-provide you with an awesome tool: [[http://www.stargit.net][a real
-explorer of your graph]], but more on this later ;)
-
-And of course, [[http://labs.linkfluence.net][the poster]]. Feel free to
-print it yourself, the size of the poster is A1.
-
-** The data
-
-All the data are available! Last year I got some mails asking me for the
-dataset. So this time I asked first if I could release the
-[[http://maps.startigt.net/dump/github.tgz][data]] with the
-[[http://git.lumberjaph.net/p5-stargit.git/][code]] and the poster, and
-the anwser is yes! So if you're intereseted, you can download it.
-
-The data are stored in mongodb, so I provide the dump which you can
-easily use:
-
-#+BEGIN_SRC sh
- % wget http://maps.stargit.net/dump/github.
- % tar xvzf github.tgz
- % cd github
- % mongorestore -d github .
-#+END_SRC
-
-Now you can use mongodb to browse the imported database. There is 5
-collections: profiles / repositories / relations / contributions /
-edges.
-
-** Methodology
-
-Last year I did a simple "follower/following" graph. It was already
-interesting, but it was also /really/ too simple. This time I wanted to
-go deeper in the exploration.
-
-The various step to process all this data are:
-
-- using the GitHub API, fetch informations from the profiles.
-- when all the profiles are collected, informations about the
- repositories are fetched. Only forked repositories are kept.
-- "simple" relations (followers/following) are kept and used later to
- add weight to relations.
-- tag user with the main programming language they use. Using the
- GitHub API, I was able to categorize ~40k profiles (about 1/3 of my
- whole dataset).
-- using the GeoNames API, extract the name of the country the user is
- in. This time, about 55k profiles were tagged.
-- fetch contributions for each repositories
-- compute a score between the author of the contribution and the owner
- of the repo
-- add a weight to each edges, using the computed score and "+1" if the
- developer follow the other developer
-
-For all the graphs, I've used the following colors for:
-
-- Ruby
-- JavaScript
-- Python
-- C (C++, C#)
-- Perl
-- PHP
-- JVM (Java, Clojure, Scala)
-- Lisp (Emacs Lisp, Common Lisp)
-- Other
-
-** Exploring
-
-Feel free to do your own analysis in the comments :) For each map,
-you'll find a PDF of the map, and the graph to explore using gephi (in
-GEXF or GDF format).
-
-*** but first, some numbers
-
-I've collected:
-
-- 123 562 profiles
-- 2 730 organizations
-- 40 807 repositories
-
-This took me about a month in order to collect the data and to build the
-adapted tools.
-
-*** Accounts creations
-
-The following chart show the number of account created by month.
-"Everyone" means the total of accounts created. You can also see the
-numbers for each communities.
-
-On the "Everyone" graph, you can see a huge pick around April 2008,
-that's the date GitHub [[https://github.com/blog/40-we-launched][was
-launched]].
-
-For most of the communities, the number of created accounts start to
-decrease since 2010. I think the reason is that most of the developers
-from those communities are now on GitHub.
-
-#+BEGIN_HTML
- <script language="javascript" type="text/javascript" src="/js/jquery.js"></script>
-#+END_HTML
-
-#+BEGIN_HTML
- <script language="javascript" type="text/javascript" src="/js/jquery.flot.js"></script>
-#+END_HTML
-
-#+BEGIN_HTML
- <div id="placeholder" style="width:800px;height:300px;">
-#+END_HTML
-
-#+BEGIN_HTML
- </div>
-#+END_HTML
-
-#+BEGIN_HTML
- <ul class="actions">
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- <li class="minibutton">
-#+END_HTML
-
-#+BEGIN_HTML
- </li>
-#+END_HTML
-
-#+BEGIN_HTML
- </ul>
-#+END_HTML
-
-#+BEGIN_HTML
- <script type="text/javascript">
- $(function () {
- var options = {
- lines: { show: true },
- points: { show: true },
- xaxis: { mode:"time" }
- };
- var data = [];
- var placeholder = $("#placeholder");
-
- $.plot(placeholder, data, options);
-
- // fetch one series, adding to what we got
- var alreadyFetched = {};
-
- $("input.resetSeries").click(function() {
- alreadyFetched = {};
- data = [];
- $.plot(placeholder, data, options);
- });
-
- $("input.fetchSeries").click(function () {
- var button = $(this);
-
- // find the URL in the link right next to us
- var dataurl = button.attr('href');
-
- // then fetch the data with jQuery
- function onDataReceived(series) {
- // extract the first coordinate pair so you can see that
- // data is now an ordinary Javascript object
- var firstcoordinate = '(' + series.data[0][0] + ', ' + series.data[0][1] + ')';
-
- // let's add it to our current data
- if (!alreadyFetched[series.label]) {
- alreadyFetched[series.label] = true;
- data.push(series);
- }
-
- // and plot all we got
- $.plot(placeholder, data, options);
- }
-
- $.ajax({
- url: dataurl,
- method: 'GET',
- dataType: 'json',
- success: onDataReceived
- });
- });
- });
- </script>
-#+END_HTML
-
-*** Languages
-
-(Keep in mind that these numbers are coming from the profiles I was able
-to tag, roughly 40k)
-
-- Ruby: 10046 (28%)
-- Python: 5403 (15%)
-- JavaScript: 5282 (15%) (JavaScript + CoffeeScript)
-- C: 5093 (14%) (C, C++, C#)
-- PHP: 3933 (11%)
-- JVM: 3790 (10%) (Java, Clojure, Scala, Groovy)
-- Perl: 1215 (3%)
-- Lisp: 348 (0%) (Emacs Lisp, Common Lisp)
-
-Those numbers doesn't really match "what GitHub
-gave":https://github.com/languages, but it could be explained by the way
-I've selected my users.
-
-*** Country
-
-- United States: 19861 (36%)
-- United Kingdom: 3533 (6%)
-- Germany: 3009 (5%)
-- Canada: 2657 (4%)
-- Brazil: 2454 (4%)
-- France: 1833 (3%)
-- Japan: 1799 (3%)
-- Russia: 1604 (2%)
-- Australia: 1441 (2%)
-- China: 1159 (2%)
-
-The United States are still the main country represented on GitHub, no
-suprise here.
-
-If you are interested in the "geography" of Open Source, you should read
-these two articles: [[http://takhteyev.org/dissertation/][Coding
-Places]] and
-[[http://takhteyev.org/papers/Takhteyev-Hilts-2010.pdf][Investigating
-the Geography of Open Source Software through GitHub]].
-
-*** companies
-
-Looking at the "company" field on user's profile, here are some stats
-about which companies has employees using GitHub:
-
-- ThoughtWorks: 102
-- Google: 66
-- Mozilla: 65
-- Yahoo!: 65
-- Red Hat: 64
-- Globo.com: 55
-- Twitter: 53
-- Facebook: 45
-- Yandex: 43
-- Intridea: 34
-- Microsoft: 33
-- Engine Yard: 32
-- Pivotal Labs: 29
-- MIT: 28
-- Rackspace: 27
-- IBM: 24
-- Caelum: 23
-- Novell: 22
-- GitHub: 22
-- VMware: 22
-
-I didn't knew the first company, ThoughtWorks, and I was expecting to
-see FaceBook or Twitter as the company with most developpers on GitHub.
-It's also interesting to see Yandex here.
-
-** Global graph (1628 nodes, 9826 edges)
-
-([download PDF](http://maps.stargit.net/global/global.pdf, "download
-GDF":http://maps.stargit.net/global/global.gdf))
-
-The main difference with last year, is the android / modders community.
-They're developing mostly in C and Java. The poster has been created
-from this map.
-
-** Ruby (1968 nodes, 9662 edges)
-
-([[http://maps.stargit.net/ruby/ruby.pdf][download PDF]],
-[[http://maps.stargit.net/ruby/ruby.gdf][download GDF]],
-[[http://maps.stargit.net/ruby/ruby.gexf][download GEXF]])
-
-This is still the main community on GitHub, even if JavaScript is now
-[[https://github.com/languages/JavaScript][the most popular language]].
-This graph is really dense, it's not easy to read, since there is no
-real cluster in this one.
-
-** Python (1062 nodes, 2631 edges)
-
-([[http://maps.stargit.net/python/python.pdf][download PDF]],
-[[http://maps.stargit.net/python/python.gdf][download GDF]])
-
-Here we have some clusters. I'm not familiar with the Python community,
-so I can't really give any insight.
-
-** Perl (608 nodes, 2967 edges)
-
-([[http://maps.stargit.net/perl/perl.pdf][download PDF]],
-[[http://maps.stargit.net/perl/perl.gdf][download GDF]],
-[[http://maps.stargit.net/perl/perl.gexf][download GEXF]])
-
-I really like this graph since it show (in my opinion) one of the real
-strength of this community: everybody works with everybody. People
-working on a webframework will collaborate with people working on Moose,
-or an ORM, or other tools. It shows that in this community, people are
-competent in more than one field.
-
-The Perl community is about the same size as last year. However, we can
-extract the following informations:
-
-- the Japaneses Perl Hackers are still a cluster by themselves
-- [[http://github.com/miyagawa][miyagawa]] is still the glue between
- the Japanese community and the "rest of the world"
-- other leaders are: Florian Ragwitz
- ([[http://github.com/rafl][rafl]]), Andy Amstrong
- ([[http://github.com/andya][AndyA]]), Dave Rolsky
- ([[http://github.com/autarch][autarch]])
-- some clusters exists for Moose and Dancer.
-
-As we can see on the previous charts, the number of created accounts for
-the Perl developpers is stalling.
-
-** United States (2646 nodes, 11344 edges)
-
-([[http://maps.startgit.net/unitedstates/unitedstates.pdf][download
-PDF]],
-[[http://maps.startgit.net/unitedstates/unitedstates.gdf][download
-GDF]],
-[[http://maps.startgit.net/unitedstates/unitedstates.gexf][download
-GEXF]])
-
-This one is really nice. We can clearly see all the communities. There
-is something interesting:
-
-- C and Ruby are on the opposite side (C on the left, Ruby on the
- right)
-- Python and Perl are also opposed (Perl at the bottom and Python at
- the top)
-
-I'll let you take some conclusion by yourself on this one ;)
-
-** France (706 nodes, 1059 edges)
-
-([[http://maps.stargit.net/france/france.pdf][download PDF]],
-[[http://maps.stargit.net/france/france.gdf][download GDF]],
-[[http://maps.stargit.net/france/france.gexf][download GEXF]])
-
-We have a lot of small clusters on this one, and some very big
-authorities.
-
-** Japan (464 nodes, 1091 edges)
-
-([[http://maps.stargit.net/japan/japan.pdf][download PDF]],
-[[http://maps.stargit.net/japan/japan.gdf][download GDF]],
-[[http://maps.stargit.net/japan/japan.gexf][download GEXF]])
-
-There is three dominants clusters on this one:
-
-- Ruby
-- Perl
-- C
-
-The Ruby and Perl one are well connected. There is a lot of japanese
-hacker on CPAN using both languages.
-
-** StarGit
-
-[[http://stargit.net][StarGit]] is a great tool we built with Alexis to
-let you explore *your* community on GitHub. You can read more about the
-application on
-[[http://ofnodesandedges.com/2011/06/20/stargit.html][Alexis' blog]].
-
-It's hosted on [[http://dotcloud.com][dotcloud]] (I'm still amazed at
-how easy it was to deploy the code ...), using the Perl
-[[http://perldancer.org][Dancer web framework]], MongoDB to store the
-data, and Redis to do some caching.
-
-** Credits
-
-I would like to thanks the whole GitHub team for being interested in the
-previous poster and to ask another one this year :)
-
-A *huge* thanks to Alexis for his help on building the awesome StarGit.
-Another big thanks to Antonin for his work on the poster.