pants.http.client
¶
pants.http.client
implements a basic asynchronous HTTP client on top of
Pants with an API modelled after that of the wonderful
requests library. The client supports
keep-alive and SSL for connections, domain verification for SSL certificates,
basic WWW authentication, sessions with persistent cookies, automatic redirect
handling, automatic decompression of responses, connection timeouts, file
uploads, and saving large responses to temporary files to decrease
memory usage.
Logic is implemented using a series of request handlers.
Making Requests¶
It’s simple and easy to make requests, and it only requires that you have an
instance of HTTPClient
ready.
from pants.http import HTTPClient
client = HTTPClient()
Like with requests, there are simple methods for making requests with the different HTTP methods. For now, let’s get information for a bunch of Pants’ commits on GitHub.
client.get("https://api.github.com/repos/ecdavis/pants/commits")
You’ll notice that this is very similar to making a request with requests.
However, we do not get a response objects. Actually, calling
HTTPClient.get()
returns an instance of HTTPRequest
rather than
anything to do with a response, but we’ll get to that later.
The Pants HTTP client is asynchronous, so to get your response, you need a
response handler. There are several ways to set one up, but the easiest way is
to pass it to your HTTPClient
during initialization.
def handle_response(response):
if response.status_code != 200:
print "There was a problem!"
client = HTTPClient(handle_response)
response
in this situation is an instance of HTTPResponse
, and
it has an API modelled after the response objects that requests would give you.
Making Useful Requests¶
Basic GET requests are nice, but you’ll often want to send data to the server. For query parameters you can use the optional params argument of the various request methods, like so:
data = {'since': '2013-11-01'}
client.get("https://api.github.com/repos/ecdavis/pants/commits", params=data)
With that, you could eventually take your response and get the correct URL.
>>> response.url
'https://api.github.com/repos/ecdavis/pants/commits?since=2013-11-01'
You can also post data to the server, either as a pre-made string, or as a dictionary of values to be encoded.
client.post("http://httpbin.org/post", data="Hello World!")
client.post("http://httpbin.org/post", data={"greeting": "Hello"})
By default, the Content-Type
header will be set to
application/x-www-form-urlencoded
when you provide data for the request
body. If any files are present, it will instead default to
multipart/form-data
to transmit those. You can also manually set the
header when making your request.
You set files via the files parameter, which expects a dictionary of form field names and file objects. You can also provide filenames if desired.
client.post("http://httpbin.org/post", files={'file': open("test.txt")})
client.post("http://httpbin.org/post", files={'file': ("test.txt", open("test.txt"))})
You can, of course, use data and files together. Please note that, if you do use them together, you’ll need to supply data as a dictionary. Data strings are not supported.
As many of you have probably noticed, this is very similar to using requests. The Pants API was implemented this way to make it easier to switch between the two libraries.
Reading Responses¶
Making your request is only half the battle, of course. You have to read your response when it comes in. And, for that, you start with the status code.
>>> response.status_code
200
>>> response.status_text
'OK'
>>> response.status
'200 OK'
Unlike with requests, there is no raise_for_status()
method available.
Raising a strange exception in an asynchronous framework that your code isn’t
designed to catch just wouldn’t work.
Headers¶
HTTP headers are case-insensitive, and so the headers are stored in a special
case-insensitive dictionary made available as HTTPResponse.headers
.
>>> response.headers
HTTPHeaders({
'Content-Length': 986,
'Server': 'gunicorn/0.17.4',
'Connection': 'keep-alive',
'Date': 'Wed, 06 Nov 2013 05:58:53 GMT',
'Access-Control-Allow-Origin': '*',
'Content-Type': 'application/json'
})
>>> response.headers['content-length']
986
Nothing special here.
Cookies¶
Cookies are a weak point of Pants’ HTTP client at this time. Cookies are stored
in instances of Cookie.SimpleCookie
, which doesn’t handle multiple
domains. Pants has logic to prevent sending cookies to the wrong domains, but
ideally it should move to using a better cookie storage structure in future
versions that handles multiple domains elegantly.
>>> response.cookies['cake']
<Morsel: cake='lie'>
>>> response.cookies['cake'].value
'lie'
As you can see, Pants does not yet handle cookies as well as requests. Setting cookies is a bit better.
client.get("http://httpbin.org/cookies", cookies={"cake": "lie"})
Redirects¶
The HTTP client will follow redirects automatically. When this happens, the
redirecting responses are stored in the HTTPResponse.history
list.
>>> response.history
[<HTTPResponse [301 Moved Permanently] at 0x2C988F0>]
You can limit the number of times the HTTP client will automatically follow
redirects with the max_redirects
argument.
client.get("http://github.com/", max_redirects=0)
By default, Pants will follow up to 10 redirects.
Exceptions¶
-
class
pants.http.client.
HTTPClientException
[source]¶ The base exception for all the exceptions used by the HTTP client, aside from
CertificateError
.
-
class
pants.http.client.
MalformedResponse
[source]¶ The exception returned when the response is malformed in some way.
HTTPClient¶
-
class
pants.http.client.
HTTPClient
(*args, **kwargs)[source]¶ An easy to use, asynchronous HTTP client implementing HTTP 1.1. All arguments passed to HTTPClient are used to initialize the default session. See
Session
for more details. The following is a basic example of using an HTTPClient to fetch a remote resource:from pants.http import HTTPClient from pants.engine import Engine def response_handler(response): Engine.instance().stop() print response.content client = HTTPClient(response_handler) client.get("http://httpbin.org/ip") Engine.instance().start()
Groups of requests can have their behavior customized with the use of sessions:
from pants.http import HTTPClient from pants.engine import Engine def response_handler(response): Engine.instance().stop() print response.content def other_handler(response): print response.content client = HTTPClient(response_handler) client.get("http://httpbin.org/cookies") with client.session(cookies={'pie':'yummy'}): client.get("http://httpbin.org/cookies") Engine.instance().start()
-
on_error
(response, exception)[source]¶ Placeholder. Called when an error occurs.
Argument Description exception An Exception instance with information about the error that occurred.
-
on_headers
(response)[source]¶ Placeholder. Called when we’ve received headers for a request. You can abort a request at this time by returning False from this function. It must be False, and not simply a false-like value, such as an empty string.
Note
This function isn’t called for HTTP
HEAD
requests.Argument Description response A HTTPResponse
instance with information about the received response.
-
on_progress
(response, received, total)[source]¶ Placeholder. Called when progress is made in downloading a response.
Argument Description response A HTTPResponse
instance with information about the response.received The number of bytes received thus far. total The total number of bytes expected for the response. This will be 0
if we don’t know how much to expect.
-
on_response
(response)[source]¶ Placeholder. Called when a complete response has been received.
Argument Description response A HTTPResponse
instance with information about the received response.
-
on_ssl_error
(response, certificate, exception)[source]¶ Placeholder. Called when the remote server’s SSL certificate failed initial verification. If this method returns True, the certificate will be accepted, otherwise, the connection will be closed and
on_error()
will be called.Argument Description response A HTTPResponse
instance with information about the response. Notably, with thehost
to expect.certificate A dictionary representing the certificate that wasn’t automatically verified. exception A CertificateError instance with information about the error that occurred.
-
post
(url, data=None, files=None, **kwargs)[source]¶ Begin a POST request. See
request()
for more details.
-
request
(*args, **kwargs)[source]¶ Begin a request. Missing parameters will be taken from the active session when available. See
Session.request()
for more details.
-
HTTPRequest¶
-
class
pants.http.client.
HTTPRequest
(session, method, path, url, headers, cookies, body, timeout, max_redirects, keep_alive, auth)[source]¶ A very basic structure for storing HTTP request information.
-
response
¶ The
HTTPResponse
instance representing the response to this request.
-
method
¶ The HTTP method of this request, such as
GET
,POST
, orHEAD
.
-
path
¶ The path of this request.
-
url
¶ A tuple containing the full URL of the request, as processed by
urlparse.urlparse()
.
-
headers
¶ A dictionary of headers sent with this request.
A
Cookie.SimpleCookie
instance of cookies sent with this request.
-
body
¶ A list of strings and files sent as this request’s body.
-
timeout
¶ The time to wait, in seconds, of no activity to allow before timing out.
-
max_redirects
¶ The maximum remaining number of redirects before not automatically redirecting.
-
keep_alive
¶ Whether or not the connection should be reused after this request.
-
auth
¶ Either a tuple of
(username, password)
or an instance ofAuthBase
responsible for authorizing this request with the server.
-
HTTPResponse¶
-
class
pants.http.client.
HTTPResponse
(request)[source]¶ The HTTPResponse class represents a single HTTPResponse, and has all the available information about a response, including the redirect history and the original HTTPRequest.
-
length
¶ The length of the raw response.
-
http_version
¶ The HTTP version of the response.
-
status_code
¶ The HTTP status code of the response, such as
200
.
-
status_text
¶ The human readable status text explaining the status code, such as
Not Found
.
A
Cookie.SimpleCookie
instance of all the cookies received with the response.
-
headers
¶ A dictionary of all the headers received with the response.
-
content
¶ The content of the response as a byte string. Be careful when using this with large responses, as it will load the entire response into memory.
None
if no data has been received.
-
encoding
¶ This is the detected character encoding of the response. You can also set this to a specific character set to have
text
decoded properly.Pants will attempt to fill this value from the Content-Type response header. If no value was available, it will be
None
.
-
file
¶ The content of the response as a
tempfile.SpooledTemporaryFile
. Pants uses temporary files to decrease memory usage for large responses.None
if no data has been received.
-
iter_content
(chunk_size=1, decode_unicode=False)[source]¶ Iterate over the content of the response. Using this, rather than
content
ortext
can prevent the loading of large responses into memory in their entirety.Argument Default Description chunk_size 1
The number of bytes to read at once. decode_unicode False Whether or not to decode the bytes into unicode using the response’s encoding
.
-
iter_lines
(chunk_size=512, decode_unicode=False)[source]¶ Iterate over the content of the response, one line at a time. By using this rather than
content
ortext
you can prevent loading of the entire response into memory. The two arguments to this method are passed directly toiter_content()
.
-
json
(**kwargs)[source]¶ The content of the response, having been interpreted as JSON. This uses the value of
encoding
if possible. Ifencoding
is not set, it will default toUTF-8
.Any provided keyword arguments will be passed to
json.loads()
.
-
status
¶ The status code and status text as a string.
-
Session¶
-
class
pants.http.client.
Session
(client, on_response=None, on_headers=None, on_progress=None, on_ssl_error=None, on_error=None, timeout=None, max_redirects=None, keep_alive=None, auth=None, headers=None, cookies=None, verify_ssl=None, ssl_options=None)[source]¶ The Session class is the heart of the HTTP client, making it easy to share state between multiple requests, and enabling the use of
with
syntax. They’re responsible for determining everything about a request before handing it back toHTTPClient
to be executed.Argument Default Description client The HTTPClient
instance this Session is associated with.on_response Optional. A callable that will handle any received responses, rather than the HTTPClient’s own on_response()
method.on_headers Optional. A callable for when response headers have been received. on_progress Optional. A callable for progress notifications. on_ssl_error Optional. A callable responsible for handling SSL verification errors, if verify_ssl
is True.on_error Optional. A callable that will handle any errors that occur. timeout 30
Optional. The time to wait, in seconds, of no activity to allow before timing out. max_redirects 10
Optional. The maximum number of times to follow a server-issued redirect. keep_alive True
Optional. Whether or not a single connection will be reused for multiple requests. auth None
Optional. An instance of AuthBase
for authenticating requests to the server.headers None
Optional. A dictionary of default headers to send with requests. verify_ssl False
Optional. Whether or not to attempt to check the certificate of the remote secure server against its hostname. ssl_options None
Optional. Options to use when initializing SSL. See Stream.startSSL()
for more.-
client
¶ The
HTTPClient
this Session is associated with.
-
post
(url, data=None, files=None, **kwargs)[source]¶ Begin a POST request. See
request()
for more details.
-
request
(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, max_redirects=None, keep_alive=None)[source]¶ Begin a request.
Argument Description method The HTTP method of the request. url The URL to request. params Optional. A dictionary or string of query parameters to add to the request. data Optional. A dictionary or string of content to send in the request body. headers Optional. A dictionary of headers to send with the request. cookies Optional. A dictionary or CookieJar of cookies to send with the request. files Optional. A dictionary of file-like objects to upload with the request. auth Optional. An instance of AuthBase
to use to authenticate the request.timeout Optional. The time to wait, in seconds, of no activity to allow before timing out. max_redirects Optional. The maximum number of times to follow a server-issued redirect. keep_alive Optional. Whether or not to reuse the connection for multiple requests.
-