pants.http.client

pants.http.client implements a basic asynchronous HTTP client on top of Pants with an API modelled after that of the wonderful requests library. The client supports keep-alive and SSL for connections, domain verification for SSL certificates, basic WWW authentication, sessions with persistent cookies, automatic redirect handling, automatic decompression of responses, connection timeouts, file uploads, and saving large responses to temporary files to decrease memory usage.

Logic is implemented using a series of request handlers.

Making Requests

It’s simple and easy to make requests, and it only requires that you have an instance of HTTPClient ready.

from pants.http import HTTPClient
client = HTTPClient()

Like with requests, there are simple methods for making requests with the different HTTP methods. For now, let’s get information for a bunch of Pants’ commits on GitHub.

client.get("https://api.github.com/repos/ecdavis/pants/commits")

You’ll notice that this is very similar to making a request with requests. However, we do not get a response objects. Actually, calling HTTPClient.get() returns an instance of HTTPRequest rather than anything to do with a response, but we’ll get to that later.

The Pants HTTP client is asynchronous, so to get your response, you need a response handler. There are several ways to set one up, but the easiest way is to pass it to your HTTPClient during initialization.

def handle_response(response):
    if response.status_code != 200:
        print "There was a problem!"

client = HTTPClient(handle_response)

response in this situation is an instance of HTTPResponse, and it has an API modelled after the response objects that requests would give you.

Making Useful Requests

Basic GET requests are nice, but you’ll often want to send data to the server. For query parameters you can use the optional params argument of the various request methods, like so:

data = {'since': '2013-11-01'}
client.get("https://api.github.com/repos/ecdavis/pants/commits", params=data)

With that, you could eventually take your response and get the correct URL.

>>> response.url
'https://api.github.com/repos/ecdavis/pants/commits?since=2013-11-01'

You can also post data to the server, either as a pre-made string, or as a dictionary of values to be encoded.

client.post("http://httpbin.org/post", data="Hello World!")
client.post("http://httpbin.org/post", data={"greeting": "Hello"})

By default, the Content-Type header will be set to application/x-www-form-urlencoded when you provide data for the request body. If any files are present, it will instead default to multipart/form-data to transmit those. You can also manually set the header when making your request.

You set files via the files parameter, which expects a dictionary of form field names and file objects. You can also provide filenames if desired.

client.post("http://httpbin.org/post", files={'file': open("test.txt")})
client.post("http://httpbin.org/post", files={'file': ("test.txt", open("test.txt"))})

You can, of course, use data and files together. Please note that, if you do use them together, you’ll need to supply data as a dictionary. Data strings are not supported.

As many of you have probably noticed, this is very similar to using requests. The Pants API was implemented this way to make it easier to switch between the two libraries.

Reading Responses

Making your request is only half the battle, of course. You have to read your response when it comes in. And, for that, you start with the status code.

>>> response.status_code
200
>>> response.status_text
'OK'
>>> response.status
'200 OK'

Unlike with requests, there is no raise_for_status() method available. Raising a strange exception in an asynchronous framework that your code isn’t designed to catch just wouldn’t work.

Headers

HTTP headers are case-insensitive, and so the headers are stored in a special case-insensitive dictionary made available as HTTPResponse.headers.

>>> response.headers
HTTPHeaders({
    'Content-Length': 986,
    'Server': 'gunicorn/0.17.4',
    'Connection': 'keep-alive',
    'Date': 'Wed, 06 Nov 2013 05:58:53 GMT',
    'Access-Control-Allow-Origin': '*',
    'Content-Type': 'application/json'
    })
>>> response.headers['content-length']
986

Nothing special here.

Cookies

Cookies are a weak point of Pants’ HTTP client at this time. Cookies are stored in instances of Cookie.SimpleCookie, which doesn’t handle multiple domains. Pants has logic to prevent sending cookies to the wrong domains, but ideally it should move to using a better cookie storage structure in future versions that handles multiple domains elegantly.

>>> response.cookies['cake']
<Morsel: cake='lie'>
>>> response.cookies['cake'].value
'lie'

As you can see, Pants does not yet handle cookies as well as requests. Setting cookies is a bit better.

client.get("http://httpbin.org/cookies", cookies={"cake": "lie"})

Redirects

The HTTP client will follow redirects automatically. When this happens, the redirecting responses are stored in the HTTPResponse.history list.

>>> response.history
[<HTTPResponse [301 Moved Permanently] at 0x2C988F0>]

You can limit the number of times the HTTP client will automatically follow redirects with the max_redirects argument.

client.get("http://github.com/", max_redirects=0)

By default, Pants will follow up to 10 redirects.

Exceptions

class pants.http.client.CertificateError[source]
class pants.http.client.HTTPClientException[source]

The base exception for all the exceptions used by the HTTP client, aside from CertificateError.

class pants.http.client.MalformedResponse[source]

The exception returned when the response is malformed in some way.

class pants.http.client.RequestClosed[source]

The exception returned when the connection closes before the entire request has been downloaded.

class pants.http.client.RequestTimedOut[source]

The exception returned when a connection times out.

HTTPClient

class pants.http.client.HTTPClient(*args, **kwargs)[source]

An easy to use, asynchronous HTTP client implementing HTTP 1.1. All arguments passed to HTTPClient are used to initialize the default session. See Session for more details. The following is a basic example of using an HTTPClient to fetch a remote resource:

from pants.http import HTTPClient
from pants.engine import Engine

def response_handler(response):
    Engine.instance().stop()
    print response.content

client = HTTPClient(response_handler)
client.get("http://httpbin.org/ip")
Engine.instance().start()

Groups of requests can have their behavior customized with the use of sessions:

from pants.http import HTTPClient
from pants.engine import Engine

def response_handler(response):
    Engine.instance().stop()
    print response.content

def other_handler(response):
    print response.content

client = HTTPClient(response_handler)
client.get("http://httpbin.org/cookies")

with client.session(cookies={'pie':'yummy'}):
    client.get("http://httpbin.org/cookies")

Engine.instance().start()
delete(url, **kwargs)[source]

Begin a DELETE request. See request() for more details.

get(url, params=None, **kwargs)[source]

Begin a GET request. See request() for more details.

head(url, params=None, **kwargs)[source]

Begin a HEAD request. See request() for more details.

on_error(response, exception)[source]

Placeholder. Called when an error occurs.

Argument Description
exception An Exception instance with information about the error that occurred.
on_headers(response)[source]

Placeholder. Called when we’ve received headers for a request. You can abort a request at this time by returning False from this function. It must be False, and not simply a false-like value, such as an empty string.

Note

This function isn’t called for HTTP HEAD requests.

Argument Description
response A HTTPResponse instance with information about the received response.
on_progress(response, received, total)[source]

Placeholder. Called when progress is made in downloading a response.

Argument Description
response A HTTPResponse instance with information about the response.
received The number of bytes received thus far.
total The total number of bytes expected for the response. This will be 0 if we don’t know how much to expect.
on_response(response)[source]

Placeholder. Called when a complete response has been received.

Argument Description
response A HTTPResponse instance with information about the received response.
on_ssl_error(response, certificate, exception)[source]

Placeholder. Called when the remote server’s SSL certificate failed initial verification. If this method returns True, the certificate will be accepted, otherwise, the connection will be closed and on_error() will be called.

Argument Description
response A HTTPResponse instance with information about the response. Notably, with the host to expect.
certificate A dictionary representing the certificate that wasn’t automatically verified.
exception A CertificateError instance with information about the error that occurred.
options(url, **kwargs)[source]

Begin an OPTIONS request. See request() for more details.

patch(url, data=None, **kwargs)[source]

Begin a PATCH request. See request() for more details.

post(url, data=None, files=None, **kwargs)[source]

Begin a POST request. See request() for more details.

put(url, data=None, **kwargs)[source]

Begin a PUT request. See request() for more details.

request(*args, **kwargs)[source]

Begin a request. Missing parameters will be taken from the active session when available. See Session.request() for more details.

session(*args, **kwargs)[source]

Create a new session. See Session for details.

trace(url, **kwargs)[source]

Begin a TRACE request. See request() for more details.

HTTPRequest

class pants.http.client.HTTPRequest(session, method, path, url, headers, cookies, body, timeout, max_redirects, keep_alive, auth)[source]

A very basic structure for storing HTTP request information.

response

The HTTPResponse instance representing the response to this request.

session

The Session this request was made in.

method

The HTTP method of this request, such as GET, POST, or HEAD.

path

The path of this request.

url

A tuple containing the full URL of the request, as processed by urlparse.urlparse().

headers

A dictionary of headers sent with this request.

cookies

A Cookie.SimpleCookie instance of cookies sent with this request.

body

A list of strings and files sent as this request’s body.

timeout

The time to wait, in seconds, of no activity to allow before timing out.

max_redirects

The maximum remaining number of redirects before not automatically redirecting.

keep_alive

Whether or not the connection should be reused after this request.

auth

Either a tuple of (username, password) or an instance of AuthBase responsible for authorizing this request with the server.

HTTPResponse

class pants.http.client.HTTPResponse(request)[source]

The HTTPResponse class represents a single HTTPResponse, and has all the available information about a response, including the redirect history and the original HTTPRequest.

length

The length of the raw response.

http_version

The HTTP version of the response.

status_code

The HTTP status code of the response, such as 200.

status_text

The human readable status text explaining the status code, such as Not Found.

cookies

A Cookie.SimpleCookie instance of all the cookies received with the response.

headers

A dictionary of all the headers received with the response.

content

The content of the response as a byte string. Be careful when using this with large responses, as it will load the entire response into memory. None if no data has been received.

encoding

This is the detected character encoding of the response. You can also set this to a specific character set to have text decoded properly.

Pants will attempt to fill this value from the Content-Type response header. If no value was available, it will be None.

file

The content of the response as a tempfile.SpooledTemporaryFile. Pants uses temporary files to decrease memory usage for large responses. None if no data has been received.

handle_301(client)[source]

Handle the different redirect codes.

handle_401(client)[source]

Handle authorization, if we know how.

iter_content(chunk_size=1, decode_unicode=False)[source]

Iterate over the content of the response. Using this, rather than content or text can prevent the loading of large responses into memory in their entirety.

Argument Default Description
chunk_size 1 The number of bytes to read at once.
decode_unicode False Whether or not to decode the bytes into unicode using the response’s encoding.
iter_lines(chunk_size=512, decode_unicode=False)[source]

Iterate over the content of the response, one line at a time. By using this rather than content or text you can prevent loading of the entire response into memory. The two arguments to this method are passed directly to iter_content().

json(**kwargs)[source]

The content of the response, having been interpreted as JSON. This uses the value of encoding if possible. If encoding is not set, it will default to UTF-8.

Any provided keyword arguments will be passed to json.loads().

status

The status code and status text as a string.

text

The content of the response, after being decoded into unicode with encoding. Be careful when using this with large responses, as it will load the entire response into memory. None if no data has been received.

If encoding is None, this will default to UTF-8.

Session

class pants.http.client.Session(client, on_response=None, on_headers=None, on_progress=None, on_ssl_error=None, on_error=None, timeout=None, max_redirects=None, keep_alive=None, auth=None, headers=None, cookies=None, verify_ssl=None, ssl_options=None)[source]

The Session class is the heart of the HTTP client, making it easy to share state between multiple requests, and enabling the use of with syntax. They’re responsible for determining everything about a request before handing it back to HTTPClient to be executed.

Argument Default Description
client   The HTTPClient instance this Session is associated with.
on_response   Optional. A callable that will handle any received responses, rather than the HTTPClient’s own on_response() method.
on_headers   Optional. A callable for when response headers have been received.
on_progress   Optional. A callable for progress notifications.
on_ssl_error   Optional. A callable responsible for handling SSL verification errors, if verify_ssl is True.
on_error   Optional. A callable that will handle any errors that occur.
timeout 30 Optional. The time to wait, in seconds, of no activity to allow before timing out.
max_redirects 10 Optional. The maximum number of times to follow a server-issued redirect.
keep_alive True Optional. Whether or not a single connection will be reused for multiple requests.
auth None Optional. An instance of AuthBase for authenticating requests to the server.
headers None Optional. A dictionary of default headers to send with requests.
verify_ssl False Optional. Whether or not to attempt to check the certificate of the remote secure server against its hostname.
ssl_options None Optional. Options to use when initializing SSL. See Stream.startSSL() for more.
client

The HTTPClient this Session is associated with.

delete(url, **kwargs)[source]

Begin a DELETE request. See request() for more details.

get(url, params=None, **kwargs)[source]

Begin a GET request. See request() for more details.

head(url, params=None, **kwargs)[source]

Begin a HEAD request. See request() for more details.

options(url, **kwargs)[source]

Begin an OPTIONS request. See request() for more details.

patch(url, data=None, **kwargs)[source]

Begin a PATCH request. See request() for more details.

post(url, data=None, files=None, **kwargs)[source]

Begin a POST request. See request() for more details.

put(url, data=None, **kwargs)[source]

Begin a PUT request. See request() for more details.

request(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, max_redirects=None, keep_alive=None)[source]

Begin a request.

Argument Description
method The HTTP method of the request.
url The URL to request.
params Optional. A dictionary or string of query parameters to add to the request.
data Optional. A dictionary or string of content to send in the request body.
headers Optional. A dictionary of headers to send with the request.
cookies Optional. A dictionary or CookieJar of cookies to send with the request.
files Optional. A dictionary of file-like objects to upload with the request.
auth Optional. An instance of AuthBase to use to authenticate the request.
timeout Optional. The time to wait, in seconds, of no activity to allow before timing out.
max_redirects Optional. The maximum number of times to follow a server-issued redirect.
keep_alive Optional. Whether or not to reuse the connection for multiple requests.
session(*args, **kwargs)[source]

Create a new session. See Session for details.

trace(url, **kwargs)[source]

Begin a TRACE request. See request() for more details.