posts/2012-02-17-HTTP_requests_with_python.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202

** Hey! I'm alive!

I've started to write some Python for work, and since I'm new at the
game, I've decided to start using it for some personal project too.

Most of what I do is related to web stuff: writing API, API client, web
framweork, etc. At [[http://www.saymedia.com/][Say]] I'm working on our
platform. Nothing fancy, but really interesting (at least to me) and
challenging work (and we're recruting, drop me a mail if you want to
know more).

** Writing HTTP requests with Python

*** httplib

[[http://docs.python.org/library/httplib.html][httplib]] is part of the
standard library. The documentation says: /It is normally not used
directly/. And when you look at the API you understand why: it's very
low-level. It uses the HTTPMessage library (not documented, and not
easily accessible). It will return an HTTPResponse object, but again, no
documentation, and poor interface.

*** httplib2

[[http://code.google.com/p/httplib2/][httplib2]] is a very popular
library for writing HTTP request with Python. It's the one used by
Google for it's
[[http://code.google.com/p/google-api-python-client/][google-api-python-client]]
library. There's absolutly nothing in common between httplib's API and
this one.

I dont like it's API: the way the library handles the *Response* object
seems wrong to me. You should get one object for the response, not a
tuple with the response and the content. The request should also be an
object. Also, The status code is considered as a header, and you lose
the message that comes with the status.

There is also an important issue with httplib2 that we discovered at
work. In some case, if there is an error, httplib2 will retry the
request. That means, in the case of a POST request, it will send twice
the payload. There is
[[http://code.google.com/p/httplib2/issues/detail?id=124][a ticket that
ask to fix that]], marked as *won't fix*.
[[http://codereview.appspot.com/4365054/][Even when there is a perfectly
acceptable patch for this issue.]] (it's a
[[https://www.destroyallsoftware.com/talks/wat][WAT]] moment). I'm
really curious to know what was the motiviation behind this, because it
doesn'nt makes sense at all. Why would you want your client to retry
twice your request if it fails ?

*** urllib

[[http://docs.python.org/library/urllib.html][urllib]] is also part of
the standard library. I was suprised, because given the name, I was
expecting a lib to /manipulate/ an URL. And indeed, it also does that!
This library mix too many different things.

*** urllib2

[[http://docs.python.org/library/urllib2.html][urllib2]] And because 2
is not enough, also ...

*** urllib3

[[http://code.google.com/p/urllib3/][urllib3]]. I thought for a moment
that, maybe, the number number was related to the version of Python.
I'll spare you the suspense, it's not the case. Now I would have
expected them to be related to each other (sharing some common API, the
number being just a way to provides a better API than the previous
version). Sadly it's not the case, they all implement different API.

At least, urllib3 has some interesting features:

-  Thread-safe connection pooling and re-using with HTTP/1.1 keep-alive
-  HTTP and HTTPS (SSL) support

*** request

A few persons pointed me to
[[http://pypi.python.org/pypi/requests][requests]]. And indeed, this one
is the nicest of all. Still, not exactly what /I/'m looking for. This
library looks like
[[https://metacpan.org/module/LWP::Simple][LWP::Simple]], a library
build on top of various HTTP components to help you for the common case.
For most of the developers it will be fine and do the work as intented.

** What I want

Since I'm primarly a Perl developer (here is were 99% of the readers are
leaving the page), I've been using
[[https://metacpan.org/module/LWP][LWP]] and HTTP::Messages for more
than 8 years. LWP is an awesome library. It's 16 years old, and it's
still actively developed by it's original author
[[https://metacpan.org/author/GAAS][Gisle Aas]]. He deserves a lot of
respect for his dedication.

There is a few other library in Perl to do HTTP request, like:

-  [[https://metacpan.org/module/AnyEvent::HTTP][AnyEvent::HTTP]]: if
   you need to do asynchronous call
-  [[https://metacpan.org/module/Furl][Furl]]: by Tokuhiro and his
   yakuza gang

but most of the time, you end up using LWP with HTTP::Messages.

One of the reason this couple is so popular is because it provides the
right abstraction:

-  a user-agent is provided by LWP::UserAgent (that you can easily
   extends to build some custom useragent)
-  a Response class to encapsulates HTTP style responses, provided by
   HTTP::Message
-  a Request class to encapsulates HTTP style request, provided by
   HTTP::Message

The response and request objects use HTTP::Headers and HTTP::Cookies.
This way, even if your building a web framework and not a HTTP client,
you'll endup using HTTP::Headers and HTTP::Cookies since they provide
the right API, they're well tested, and you only have to learn one API,
wether you're in an HTTP client or a web framework.

** http

So now you start seeing where I'm going. And you're saying "ho no, don't
tell me you're writing /another/ HTTP library". Hell yeah, I am (sorry,
Masa). But to be honest, I doubt you'll ever use it. It's doing the job
/I/ want, the way /I/ want. And it's probably not what you're expecting.

[[http://git.lumberjaph.net/py-http.git/][http]] is providing an
abstraction for the following things:

-  http.headers
-  http.request
-  http.response
-  http.date
-  http.url (by my good old friend "bl0b":https://github.com/bl0b)

I could have named it *httplib3*, but *http* seems a better choice: it's
a library that deals with the HTTP protocol and provide abstraction on
top of it.

You can found the
[[http://http.readthedocs.org/en/latest/index.html][documentation here]]
and install it from [[http://pypi.python.org/pypi/http/][PyPI]].

*** examples

A few examples

#+BEGIN_SRC python
    >>> from http import Request
    >>> r = Request('GET', 'http://lumberjaph.net')
    >>> print r.method
    GET
    >>> print r.url
    http://lumberjaph.net
    >>> r.headers.add('Content-Type', 'application/json')
    >>> print r.headers
    Content-Type: application/json


    >>>
#+END_SRC

#+BEGIN_SRC python
    >>> from http import Headers
    >>> h = Headers()
    >>> print h


    >>> h.add('X-Foo', 'bar')
    >>> h.add('X-Bar', 'baz', 'foobarbaz')
    >>> print h
    X-Foo: bar
    X-Bar: baz
    X-Bar: foobarbaz


    >>> for h in h.items():
    ...     print h
    ...
    ('X-Foo', 'bar')
    ('X-Bar', 'baz')
    ('X-Bar', 'foobarbaz')
    >>>
#+END_SRC

*** a client

With this, you can easily build a very simple client combining thoses
classes, or a more complex one. Or maybe you want to build a web
framework, or a framework to test HTTP stuff, and you need a class to
manipulate HTTP headers. Then you can use http.headers. The same if you
need to create some HTTP responses: http.response.

I've started to write
[[http://git.lumberjaph.net/py-httpclient.git/][httpclient]] based on
this library that will mimic LWP's API.

I've started
[[http://httpclient.readthedocs.org/en/latest/index.html][to document
this library]] and I hope to put something on PyPI soon.