Stop generating a static site.

author: Franck Cuny <franckcuny@gmail.com> 2016-07-31 10:16:40 -0700
committer: Franck Cuny <franckcuny@gmail.com> 2016-07-31 13:42:48 -0700
commit: 63f413891d5adc596e4d51dfba4d0d23fdea3ca4 (patch)
tree: c2726b60515057a20f434bd89c596360ef17852b /content/post/2009-06-06-modules-i-like-web-scraper.md
parent: Add Google Analytic tracker. (diff)
download: lumberjaph-63f413891d5adc596e4d51dfba4d0d23fdea3ca4.tar.gz
1 files changed, 0 insertions, 100 deletions
diff --git a/content/post/2009-06-06-modules-i-like-web-scraper.md b/content/post/2009-06-06-modules-i-like-web-scraper.md
deleted file mode 100644
index ba383d1..0000000
--- a/content/post/2009-06-06-modules-i-like-web-scraper.md
+++ /dev/null
@@ -1,100 +0,0 @@
----
-date: 2009-06-06T00:00:00Z
-summary: In which I talk about Web::Scraper.
-title: Modules I like Web::Scraper
----
-
-For [$work](http://rtgi.fr) I need to write scrapers. It used to be boring and painful. But thanks to [miyagawa](http://search.cpan.org/~miyagawa/), this is not true anymore. [Web::Scraper](http://search.cpan.org/perldoc?Web::Scraper) offer a nice API: you can write your rules using XPath, you can chaine rules, a nice and simple syntax, etc.
-
-I wanted to export my data from my last.fm account but there is no API for this, so I would need to scrap them. All the data are available [as a web page](http://www.last.fm/user/franckcuny/tracks) that list your music. So the scraper need to find how many pages, and find the content on each page to extract a list of your listening.
-
-For the total of pages, it's easy. Let's take a look at the HTML code and search for something like this:
-
-```html
-<a class="lastpage" href="/user/franckcuny/tracks?page=272">272</a>
-```
-
-the information is in a class **lastpage**.
-
-Now we need to find our data: I need the artist name, the song name and the date I played this song.
-
-All this data are in a **table**, and each new entry is in a **td**.
-
-```html
-<tr id="r9_1580_1920248170" class="odd">
-[...]
-    <td class="subjectCell">
-    <a href="/music/Earth">Earth</a>
-    <a href="/music/Earth/_/Sonar+and+Depth+Charge">Sonar and Depth Charge</a>
-    </td>
-[...]
-<td class="dateCell last">
-    <abbr title="2009-05-13T15:18:25Z">13 May 3:18pm</abbr>
-</td>
-```
-
-It's simple: information about a song are stored in **subjectcell**, and the artist and song title are each in a tag **a**. The date is in a **dateCell**, and we need the **title** from the **abbr** tag.
-
-The scraper we need to write is
-
-```perl
-my $scrap = scraper {
-    process 'a[class="lastpage"]', 'last'    => 'TEXT';
-    process 'tr',                  'songs[]' => scraper {
-        process 'abbr',                    'date' => '@title';
-        process 'td[class="subjectCell"]', 'song' => scraper {
-            process 'a', 'info[]' => 'TEXT';
-        };
-    }
-};
-```
-
-The first rule extract the total of page. The second iter on each **tr** and store the content in an array named **songs**. This **tr** need to be scraped. So we look the the **abbr** tag, and store in **date** the property **title**. Then we look for the song and artitst information. We look for the **td** with a class named **subjectCell**, a extract all links.  
-
-Our final script will look like this:
-
-```perl
-#!/usr/bin/perl -w
-use strict;
-use feature ':5.10';
-
-use Web::Scraper;
-use URI;
-use IO::All -utf8;
-
-my $username = shift;
-my $output   = shift;
-
-my $scrap = scraper {
-    process 'a[class="lastpage"]', 'last'    => 'TEXT';
-    process 'tr',                  'songs[]' => scraper {
-        process 'abbr',                    'date' => '@title';
-        process 'td[class="subjectCell"]', 'song' => scraper {
-            process 'a', 'info[]' => 'TEXT';
-        };
-    }
-};
-
-my $url = "http://www.last.fm/user/" . $username . "/tracks?page=";
-scrap_lastfm(1);
-
-sub scrap_lastfm {
-    my $page      = shift;
-    my $scrap_uri = $url . $page;
-    say $scrap_uri;
-    my $res      = $scrap->scrape(URI->new($scrap_uri));
-    my $lastpage = $res->{last};
-    foreach my $record (@{$res->{songs}}) {
-        my $line = join("\t", @{$record->{song}->{info}}, $record->{date});
-        $line . "\n" >> io $output;
-    }
-    $page++;
-    scrap_lastfm($page) if $page <= $lastpage;
-}
-```
-
-You can use this script like this:
-
-```bash
-% perl lastfmscraper.pl franckcuny store_data.txt
-```
author	Franck Cuny <franckcuny@gmail.com>	2016-07-31 10:16:40 -0700
committer	Franck Cuny <franckcuny@gmail.com>	2016-07-31 13:42:48 -0700
commit	63f413891d5adc596e4d51dfba4d0d23fdea3ca4 (patch)
tree	c2726b60515057a20f434bd89c596360ef17852b /content/post/2009-06-06-modules-i-like-web-scraper.md
parent	Add Google Analytic tracker. (diff)
download	lumberjaph-63f413891d5adc596e4d51dfba4d0d23fdea3ca4.tar.gz