summaryrefslogtreecommitdiff
path: root/posts/2012-02-17-HTTP_requests_with_python.org
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--posts/2012-02-17-HTTP_requests_with_python.org202
1 files changed, 202 insertions, 0 deletions
diff --git a/posts/2012-02-17-HTTP_requests_with_python.org b/posts/2012-02-17-HTTP_requests_with_python.org
new file mode 100644
index 0000000..d0e0370
--- /dev/null
+++ b/posts/2012-02-17-HTTP_requests_with_python.org
@@ -0,0 +1,202 @@
+** Hey! I'm alive!
+
+I've started to write some Python for work, and since I'm new at the
+game, I've decided to start using it for some personal project too.
+
+Most of what I do is related to web stuff: writing API, API client, web
+framweork, etc. At [[http://www.saymedia.com/][Say]] I'm working on our
+platform. Nothing fancy, but really interesting (at least to me) and
+challenging work (and we're recruting, drop me a mail if you want to
+know more).
+
+** Writing HTTP requests with Python
+
+*** httplib
+
+[[http://docs.python.org/library/httplib.html][httplib]] is part of the
+standard library. The documentation says: /It is normally not used
+directly/. And when you look at the API you understand why: it's very
+low-level. It uses the HTTPMessage library (not documented, and not
+easily accessible). It will return an HTTPResponse object, but again, no
+documentation, and poor interface.
+
+*** httplib2
+
+[[http://code.google.com/p/httplib2/][httplib2]] is a very popular
+library for writing HTTP request with Python. It's the one used by
+Google for it's
+[[http://code.google.com/p/google-api-python-client/][google-api-python-client]]
+library. There's absolutly nothing in common between httplib's API and
+this one.
+
+I dont like it's API: the way the library handles the *Response* object
+seems wrong to me. You should get one object for the response, not a
+tuple with the response and the content. The request should also be an
+object. Also, The status code is considered as a header, and you lose
+the message that comes with the status.
+
+There is also an important issue with httplib2 that we discovered at
+work. In some case, if there is an error, httplib2 will retry the
+request. That means, in the case of a POST request, it will send twice
+the payload. There is
+[[http://code.google.com/p/httplib2/issues/detail?id=124][a ticket that
+ask to fix that]], marked as *won't fix*.
+[[http://codereview.appspot.com/4365054/][Even when there is a perfectly
+acceptable patch for this issue.]] (it's a
+[[https://www.destroyallsoftware.com/talks/wat][WAT]] moment). I'm
+really curious to know what was the motiviation behind this, because it
+doesn'nt makes sense at all. Why would you want your client to retry
+twice your request if it fails ?
+
+*** urllib
+
+[[http://docs.python.org/library/urllib.html][urllib]] is also part of
+the standard library. I was suprised, because given the name, I was
+expecting a lib to /manipulate/ an URL. And indeed, it also does that!
+This library mix too many different things.
+
+*** urllib2
+
+[[http://docs.python.org/library/urllib2.html][urllib2]] And because 2
+is not enough, also ...
+
+*** urllib3
+
+[[http://code.google.com/p/urllib3/][urllib3]]. I thought for a moment
+that, maybe, the number number was related to the version of Python.
+I'll spare you the suspense, it's not the case. Now I would have
+expected them to be related to each other (sharing some common API, the
+number being just a way to provides a better API than the previous
+version). Sadly it's not the case, they all implement different API.
+
+At least, urllib3 has some interesting features:
+
+- Thread-safe connection pooling and re-using with HTTP/1.1 keep-alive
+- HTTP and HTTPS (SSL) support
+
+*** request
+
+A few persons pointed me to
+[[http://pypi.python.org/pypi/requests][requests]]. And indeed, this one
+is the nicest of all. Still, not exactly what /I/'m looking for. This
+library looks like
+[[https://metacpan.org/module/LWP::Simple][LWP::Simple]], a library
+build on top of various HTTP components to help you for the common case.
+For most of the developers it will be fine and do the work as intented.
+
+** What I want
+
+Since I'm primarly a Perl developer (here is were 99% of the readers are
+leaving the page), I've been using
+[[https://metacpan.org/module/LWP][LWP]] and HTTP::Messages for more
+than 8 years. LWP is an awesome library. It's 16 years old, and it's
+still actively developed by it's original author
+[[https://metacpan.org/author/GAAS][Gisle Aas]]. He deserves a lot of
+respect for his dedication.
+
+There is a few other library in Perl to do HTTP request, like:
+
+- [[https://metacpan.org/module/AnyEvent::HTTP][AnyEvent::HTTP]]: if
+ you need to do asynchronous call
+- [[https://metacpan.org/module/Furl][Furl]]: by Tokuhiro and his
+ yakuza gang
+
+but most of the time, you end up using LWP with HTTP::Messages.
+
+One of the reason this couple is so popular is because it provides the
+right abstraction:
+
+- a user-agent is provided by LWP::UserAgent (that you can easily
+ extends to build some custom useragent)
+- a Response class to encapsulates HTTP style responses, provided by
+ HTTP::Message
+- a Request class to encapsulates HTTP style request, provided by
+ HTTP::Message
+
+The response and request objects use HTTP::Headers and HTTP::Cookies.
+This way, even if your building a web framework and not a HTTP client,
+you'll endup using HTTP::Headers and HTTP::Cookies since they provide
+the right API, they're well tested, and you only have to learn one API,
+wether you're in an HTTP client or a web framework.
+
+** http
+
+So now you start seeing where I'm going. And you're saying "ho no, don't
+tell me you're writing /another/ HTTP library". Hell yeah, I am (sorry,
+Masa). But to be honest, I doubt you'll ever use it. It's doing the job
+/I/ want, the way /I/ want. And it's probably not what you're expecting.
+
+[[http://git.lumberjaph.net/py-http.git/][http]] is providing an
+abstraction for the following things:
+
+- http.headers
+- http.request
+- http.response
+- http.date
+- http.url (by my good old friend "bl0b":https://github.com/bl0b)
+
+I could have named it *httplib3*, but *http* seems a better choice: it's
+a library that deals with the HTTP protocol and provide abstraction on
+top of it.
+
+You can found the
+[[http://http.readthedocs.org/en/latest/index.html][documentation here]]
+and install it from [[http://pypi.python.org/pypi/http/][PyPI]].
+
+*** examples
+
+A few examples
+
+#+BEGIN_SRC python
+ >>> from http import Request
+ >>> r = Request('GET', 'http://lumberjaph.net')
+ >>> print r.method
+ GET
+ >>> print r.url
+ http://lumberjaph.net
+ >>> r.headers.add('Content-Type', 'application/json')
+ >>> print r.headers
+ Content-Type: application/json
+
+
+ >>>
+#+END_SRC
+
+#+BEGIN_SRC python
+ >>> from http import Headers
+ >>> h = Headers()
+ >>> print h
+
+
+ >>> h.add('X-Foo', 'bar')
+ >>> h.add('X-Bar', 'baz', 'foobarbaz')
+ >>> print h
+ X-Foo: bar
+ X-Bar: baz
+ X-Bar: foobarbaz
+
+
+ >>> for h in h.items():
+ ... print h
+ ...
+ ('X-Foo', 'bar')
+ ('X-Bar', 'baz')
+ ('X-Bar', 'foobarbaz')
+ >>>
+#+END_SRC
+
+*** a client
+
+With this, you can easily build a very simple client combining thoses
+classes, or a more complex one. Or maybe you want to build a web
+framework, or a framework to test HTTP stuff, and you need a class to
+manipulate HTTP headers. Then you can use http.headers. The same if you
+need to create some HTTP responses: http.response.
+
+I've started to write
+[[http://git.lumberjaph.net/py-httpclient.git/][httpclient]] based on
+this library that will mimic LWP's API.
+
+I've started
+[[http://httpclient.readthedocs.org/en/latest/index.html][to document
+this library]] and I hope to put something on PyPI soon.