pants.http.client¶
pants.http.client implements a basic asynchronous HTTP client on top of
Pants with an API modelled after that of the wonderful
requests library. The client supports
keep-alive and SSL for connections, domain verification for SSL certificates,
basic WWW authentication, sessions with persistent cookies, automatic redirect
handling, automatic decompression of responses, connection timeouts, file
uploads, and saving large responses to temporary files to decrease
memory usage.
Logic is implemented using a series of request handlers.
Making Requests¶
It’s simple and easy to make requests, and it only requires that you have an
instance of HTTPClient ready.
from pants.http import HTTPClient
client = HTTPClient()
Like with requests, there are simple methods for making requests with the different HTTP methods. For now, let’s get information for a bunch of Pants’ commits on GitHub.
client.get("https://api.github.com/repos/ecdavis/pants/commits")
You’ll notice that this is very similar to making a request with requests.
However, we do not get a response objects. Actually, calling
HTTPClient.get() returns an instance of HTTPRequest rather than
anything to do with a response, but we’ll get to that later.
The Pants HTTP client is asynchronous, so to get your response, you need a
response handler. There are several ways to set one up, but the easiest way is
to pass it to your HTTPClient during initialization.
def handle_response(response):
if response.status_code != 200:
print "There was a problem!"
client = HTTPClient(handle_response)
response in this situation is an instance of HTTPResponse, and
it has an API modelled after the response objects that requests would give you.
Making Useful Requests¶
Basic GET requests are nice, but you’ll often want to send data to the server. For query parameters you can use the optional params argument of the various request methods, like so:
data = {'since': '2013-11-01'}
client.get("https://api.github.com/repos/ecdavis/pants/commits", params=data)
With that, you could eventually take your response and get the correct URL.
>>> response.url
'https://api.github.com/repos/ecdavis/pants/commits?since=2013-11-01'
You can also post data to the server, either as a pre-made string, or as a dictionary of values to be encoded.
client.post("http://httpbin.org/post", data="Hello World!")
client.post("http://httpbin.org/post", data={"greeting": "Hello"})
By default, the Content-Type header will be set to
application/x-www-form-urlencoded when you provide data for the request
body. If any files are present, it will instead default to
multipart/form-data to transmit those. You can also manually set the
header when making your request.
You set files via the files parameter, which expects a dictionary of form field names and file objects. You can also provide filenames if desired.
client.post("http://httpbin.org/post", files={'file': open("test.txt")})
client.post("http://httpbin.org/post", files={'file': ("test.txt", open("test.txt"))})
You can, of course, use data and files together. Please note that, if you do use them together, you’ll need to supply data as a dictionary. Data strings are not supported.
As many of you have probably noticed, this is very similar to using requests. The Pants API was implemented this way to make it easier to switch between the two libraries.
Reading Responses¶
Making your request is only half the battle, of course. You have to read your response when it comes in. And, for that, you start with the status code.
>>> response.status_code
200
>>> response.status_text
'OK'
>>> response.status
'200 OK'
Unlike with requests, there is no raise_for_status() method available.
Raising a strange exception in an asynchronous framework that your code isn’t
designed to catch just wouldn’t work.
Headers¶
HTTP headers are case-insensitive, and so the headers are stored in a special
case-insensitive dictionary made available as HTTPResponse.headers.
>>> response.headers
HTTPHeaders({
'Content-Length': 986,
'Server': 'gunicorn/0.17.4',
'Connection': 'keep-alive',
'Date': 'Wed, 06 Nov 2013 05:58:53 GMT',
'Access-Control-Allow-Origin': '*',
'Content-Type': 'application/json'
})
>>> response.headers['content-length']
986
Nothing special here.
Cookies¶
Cookies are a weak point of Pants’ HTTP client at this time. Cookies are stored
in instances of Cookie.SimpleCookie, which doesn’t handle multiple
domains. Pants has logic to prevent sending cookies to the wrong domains, but
ideally it should move to using a better cookie storage structure in future
versions that handles multiple domains elegantly.
>>> response.cookies['cake']
<Morsel: cake='lie'>
>>> response.cookies['cake'].value
'lie'
As you can see, Pants does not yet handle cookies as well as requests. Setting cookies is a bit better.
client.get("http://httpbin.org/cookies", cookies={"cake": "lie"})
Redirects¶
The HTTP client will follow redirects automatically. When this happens, the
redirecting responses are stored in the HTTPResponse.history list.
>>> response.history
[<HTTPResponse [301 Moved Permanently] at 0x2C988F0>]
You can limit the number of times the HTTP client will automatically follow
redirects with the max_redirects argument.
client.get("http://github.com/", max_redirects=0)
By default, Pants will follow up to 10 redirects.
Exceptions¶
-
class
pants.http.client.HTTPClientException[source]¶ The base exception for all the exceptions used by the HTTP client, aside from
CertificateError.
-
class
pants.http.client.MalformedResponse[source]¶ The exception returned when the response is malformed in some way.
HTTPClient¶
-
class
pants.http.client.HTTPClient(*args, **kwargs)[source]¶ An easy to use, asynchronous HTTP client implementing HTTP 1.1. All arguments passed to HTTPClient are used to initialize the default session. See
Sessionfor more details. The following is a basic example of using an HTTPClient to fetch a remote resource:from pants.http import HTTPClient from pants.engine import Engine def response_handler(response): Engine.instance().stop() print response.content client = HTTPClient(response_handler) client.get("http://httpbin.org/ip") Engine.instance().start()
Groups of requests can have their behavior customized with the use of sessions:
from pants.http import HTTPClient from pants.engine import Engine def response_handler(response): Engine.instance().stop() print response.content def other_handler(response): print response.content client = HTTPClient(response_handler) client.get("http://httpbin.org/cookies") with client.session(cookies={'pie':'yummy'}): client.get("http://httpbin.org/cookies") Engine.instance().start()
-
on_error(response, exception)[source]¶ Placeholder. Called when an error occurs.
Argument Description exception An Exception instance with information about the error that occurred.
-
on_headers(response)[source]¶ Placeholder. Called when we’ve received headers for a request. You can abort a request at this time by returning False from this function. It must be False, and not simply a false-like value, such as an empty string.
Note
This function isn’t called for HTTP
HEADrequests.Argument Description response A HTTPResponseinstance with information about the received response.
-
on_progress(response, received, total)[source]¶ Placeholder. Called when progress is made in downloading a response.
Argument Description response A HTTPResponseinstance with information about the response.received The number of bytes received thus far. total The total number of bytes expected for the response. This will be 0if we don’t know how much to expect.
-
on_response(response)[source]¶ Placeholder. Called when a complete response has been received.
Argument Description response A HTTPResponseinstance with information about the received response.
-
on_ssl_error(response, certificate, exception)[source]¶ Placeholder. Called when the remote server’s SSL certificate failed initial verification. If this method returns True, the certificate will be accepted, otherwise, the connection will be closed and
on_error()will be called.Argument Description response A HTTPResponseinstance with information about the response. Notably, with thehostto expect.certificate A dictionary representing the certificate that wasn’t automatically verified. exception A CertificateError instance with information about the error that occurred.
-
post(url, data=None, files=None, **kwargs)[source]¶ Begin a POST request. See
request()for more details.
-
request(*args, **kwargs)[source]¶ Begin a request. Missing parameters will be taken from the active session when available. See
Session.request()for more details.
-
HTTPRequest¶
-
class
pants.http.client.HTTPRequest(session, method, path, url, headers, cookies, body, timeout, max_redirects, keep_alive, auth)[source]¶ A very basic structure for storing HTTP request information.
-
response¶ The
HTTPResponseinstance representing the response to this request.
-
method¶ The HTTP method of this request, such as
GET,POST, orHEAD.
-
path¶ The path of this request.
-
url¶ A tuple containing the full URL of the request, as processed by
urlparse.urlparse().
-
headers¶ A dictionary of headers sent with this request.
A
Cookie.SimpleCookieinstance of cookies sent with this request.
-
body¶ A list of strings and files sent as this request’s body.
-
timeout¶ The time to wait, in seconds, of no activity to allow before timing out.
-
max_redirects¶ The maximum remaining number of redirects before not automatically redirecting.
-
keep_alive¶ Whether or not the connection should be reused after this request.
-
auth¶ Either a tuple of
(username, password)or an instance ofAuthBaseresponsible for authorizing this request with the server.
-
HTTPResponse¶
-
class
pants.http.client.HTTPResponse(request)[source]¶ The HTTPResponse class represents a single HTTPResponse, and has all the available information about a response, including the redirect history and the original HTTPRequest.
-
length¶ The length of the raw response.
-
http_version¶ The HTTP version of the response.
-
status_code¶ The HTTP status code of the response, such as
200.
-
status_text¶ The human readable status text explaining the status code, such as
Not Found.
A
Cookie.SimpleCookieinstance of all the cookies received with the response.
-
headers¶ A dictionary of all the headers received with the response.
-
content¶ The content of the response as a byte string. Be careful when using this with large responses, as it will load the entire response into memory.
Noneif no data has been received.
-
encoding¶ This is the detected character encoding of the response. You can also set this to a specific character set to have
textdecoded properly.Pants will attempt to fill this value from the Content-Type response header. If no value was available, it will be
None.
-
file¶ The content of the response as a
tempfile.SpooledTemporaryFile. Pants uses temporary files to decrease memory usage for large responses.Noneif no data has been received.
-
iter_content(chunk_size=1, decode_unicode=False)[source]¶ Iterate over the content of the response. Using this, rather than
contentortextcan prevent the loading of large responses into memory in their entirety.Argument Default Description chunk_size 1The number of bytes to read at once. decode_unicode False Whether or not to decode the bytes into unicode using the response’s encoding.
-
iter_lines(chunk_size=512, decode_unicode=False)[source]¶ Iterate over the content of the response, one line at a time. By using this rather than
contentortextyou can prevent loading of the entire response into memory. The two arguments to this method are passed directly toiter_content().
-
json(**kwargs)[source]¶ The content of the response, having been interpreted as JSON. This uses the value of
encodingif possible. Ifencodingis not set, it will default toUTF-8.Any provided keyword arguments will be passed to
json.loads().
-
status¶ The status code and status text as a string.
-
Session¶
-
class
pants.http.client.Session(client, on_response=None, on_headers=None, on_progress=None, on_ssl_error=None, on_error=None, timeout=None, max_redirects=None, keep_alive=None, auth=None, headers=None, cookies=None, verify_ssl=None, ssl_options=None)[source]¶ The Session class is the heart of the HTTP client, making it easy to share state between multiple requests, and enabling the use of
withsyntax. They’re responsible for determining everything about a request before handing it back toHTTPClientto be executed.Argument Default Description client The HTTPClientinstance this Session is associated with.on_response Optional. A callable that will handle any received responses, rather than the HTTPClient’s own on_response()method.on_headers Optional. A callable for when response headers have been received. on_progress Optional. A callable for progress notifications. on_ssl_error Optional. A callable responsible for handling SSL verification errors, if verify_sslis True.on_error Optional. A callable that will handle any errors that occur. timeout 30Optional. The time to wait, in seconds, of no activity to allow before timing out. max_redirects 10Optional. The maximum number of times to follow a server-issued redirect. keep_alive TrueOptional. Whether or not a single connection will be reused for multiple requests. auth NoneOptional. An instance of AuthBasefor authenticating requests to the server.headers NoneOptional. A dictionary of default headers to send with requests. verify_ssl FalseOptional. Whether or not to attempt to check the certificate of the remote secure server against its hostname. ssl_options NoneOptional. Options to use when initializing SSL. See Stream.startSSL()for more.-
client¶ The
HTTPClientthis Session is associated with.
-
post(url, data=None, files=None, **kwargs)[source]¶ Begin a POST request. See
request()for more details.
-
request(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, max_redirects=None, keep_alive=None)[source]¶ Begin a request.
Argument Description method The HTTP method of the request. url The URL to request. params Optional. A dictionary or string of query parameters to add to the request. data Optional. A dictionary or string of content to send in the request body. headers Optional. A dictionary of headers to send with the request. cookies Optional. A dictionary or CookieJar of cookies to send with the request. files Optional. A dictionary of file-like objects to upload with the request. auth Optional. An instance of AuthBaseto use to authenticate the request.timeout Optional. The time to wait, in seconds, of no activity to allow before timing out. max_redirects Optional. The maximum number of times to follow a server-issued redirect. keep_alive Optional. Whether or not to reuse the connection for multiple requests.
-