How to Request Webpages Using Python
Introduction to HTTP and Python’s Requests Library
HTTP (HyperText Transfer Protocol) is the backbone of communication on the web, enabling clients and servers to exchange data. Whether a weather app delivers the latest forecast or a social media platform loads the feed, HTTP operates behind the scenes to enable these interactions. HTTP requests allow developers to retrieve resources, send data, and interact with web APIs.
Whether you need to request webpages for web scraping, fetch webpage content for data analysis, or retrieve JSON data from APIs, Python’s requests
library makes these tasks straightforward.
In Python, the requests
library provides an efficient way to send HTTP requests, enabling developers to interact with web servers with minimal effort.
In this article, we will learn how to use the requests
library to make common types of HTTP requests, including GET and POST, and how to handle the responses from the server.
Let’s start by exploring how to set up the requests
library, make our first HTTP request, and start communicating with the web.
Setting Up the Environment
The requests
library is not included in Python’s standard library but can be added using Python’s package manager pip
.
The requests
library is compatible with Python versions 3.8 and above, making it suitable for most modern Python environments. This versatile library handles everything from basic webpage requests to retrieving complex JSON data structures from modern web services.
To install the library, use the following command in a terminal:
pip install requests
Once installed, you can quickly verify this by running the following command in the terminal:
pip show requests
If the requests
library is installed correctly, the above command will display information about the package, including its version, location, and dependencies like this:
Name: requestsVersion: 2.32.2Summary: Python HTTP for Humans.Home-page: https://requests.readthedocs.ioAuthor: Kenneth ReitzAuthor-email: [email protected]License: Apache-2.0Location: /opt/homebrew/lib/python3.10/site-packagesRequires: certifi, charset-normalizer, idna, urllib3Required-by: CacheControl, poetry, requests-toolbelt
Note: The output above reflects the current state of the
requests
library at the time of writing. It might vary depending on the installed version, environment, or future updates to the library.
If the above command gives an error or fails, there might be some minor issues during installation. Here are some common problems and how to resolve them:
pip command not recognized: This error indicates that
pip
is either not installed or added to the system’sPATH
. To fix this, you must ensure thatpip
is installed correctly. On some systems, we may need to usepython -m pip install requests
instead of justpip
to install the package.Permission denied error: This happens when there are insufficient permissions to install packages globally. To resolve this, use the
--user
flag with thepip
command, like this:pip install requests --user
, which installs the package in the user’s local directory.Outdated pip version: This happens when using an outdated version of
pip
. We can updatepip
by running the commandpython -m pip install --upgrade pip
, which ensures we are using the latest version of the package manager.Incorrect Python version: This happens when we are using an incompatible Python version. The requests library requires Python 3.8 or above. To verify the Python version, run
python3 --version
in the terminal. If the version is below 3.8, consider updating Python to a newer version.
Once the requests
library is successfully installed and any installation issues are resolved, we are all set to make our first HTTP request.
Making HTTP GET Requests
A GET
request is used to fetch data from a server, whether you’re trying to request webpage content or get JSON data from an API. The data can come in various formats like JSON or XML.
Making a GET Request to JSONPlaceholder API
Let’s start by making a GET
request to a public API. For this example, we’ll use the JSONPlaceholder API, a free online REST API for testing and prototyping.
import requests# Making a GET request to a public APIresponse = requests.get("https://jsonplaceholder.typicode.com/todos/1")# Checking the status code of the responseprint("Status Code:", response.status_code)# Printing the content of the responseprint("Response Content:", response.json())
This example demonstrates how to request a webpage and retrieve JSON data from the server in a single step.
In this example, we are requesting data for a specific “to-do” item from the JSONPlaceholder API.
When we make a GET request, the server responds with an HTTP response, which is an object containing several important components:
Status Code: The status code indicates the result of the HTTP request. For example:
- 200 OK: It indicates that the request was processed successfully.
- 301 Moved Permanently: It indicates that the requested resource has been assigned a new permanent URL.
- 404 Not Found: It indicates that the requested resource was not found on the server.
- 500 Internal Server Error: It indicates that there was an unexpected error on the server.
We can check the status code using
response.status_code
.
Headers: HTTP headers provide additional information about the response, such as content type or server details. We can access headers using
response.headers
.Content: The content of the response is typically the data we’re interested in. It can be in various formats, such as JSON, HTML, or plain text. Use
response.text
to retrieve the raw content andresponse.json()
to parse JSON content.
The code above produces the following output, illustrating how the server responds to a GET
request. The output includes the status code, which indicates the success of the request, and the content, which contains the data retrieved from the API:
Status Code: 200Response Content: {'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}
We can also pass parameters in a GET request to filter or customize the data we receive. Here’s how we can pass parameters in a GET request to get all the posts that belong to the first user:
import requests# Define the base URL and parametersurl = "https://jsonplaceholder.typicode.com/posts"params = {'userId': 1}# Send GET request with parametersresponse = requests.get(url, params=params)# Check if the request was successfulif response.status_code == 200:print(response.json()) # Print the filtered posts for userId 1else:print(f"Failed to retrieve data. Status code: {response.status_code}")
The above code will result in the following output:
{"userId":1,"id":1,"title":"sunt aut facere repellat provident occaecati excepturi optio reprehenderit","body":"quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"}
Now that we know how to retrieve data using GET requests let’s say we want to create a new post. This can be done using a POST request, which sends the item’s details to the server. Let’s see how to send a POST request.
Making HTTP POST Requests
While GET requests are typically used to request webpages or retrieve JSON data, POST requests enable us to send data to a server, such as submitting forms or uploading content.
Making a POST Request to JSONPlaceholder API
Let’s look at a real-world example of submitting form data using a POST request.
For this example, we’ll simulate submitting a form using the JSONPlaceholder API, which allows us to create fake posts in a blog.
import requests# API URL where the request will be senturl = "https://jsonplaceholder.typicode.com/posts"# The data to be sent in the request (JSON format)data = {"title": "My New Post","body": "This is the content of the post.","userId": 1}# Sending the POST request with the dataresponse = requests.post(url, json=data)# Checking the status code of the responseprint("Status Code:", response.status_code)# Printing the response content (the created post)print("Response Content:", response.json())
URL: The URL is the endpoint to which the POST request will be sent. In this case, we’re sending data to
https://jsonplaceholder.typicode.com/posts
, a fake API endpoint for creating posts.Data: The data is the payload we are sending in the body of the request. It is formatted as a
JSON
object, which includes thetitle
,body
, anduserId
of the post.POST Request: We use
requests.post()
to send the POST request, passing the URL and the data as aJSON
object.Response: The response returned by the server includes the created post, an ID assigned by the API, and a
201 Created
status code, indicating that the post has been successfully added.
The output for the above code will look like this -
Status Code: 201Response Content: {'title': 'My New Post', 'body': 'This is the content of the post.', 'userId': 1, 'id': 101}
When making HTTP requests, it’s crucial to handle errors properly. In the next section, we’ll discuss common issues we might encounter when sending POST requests and how to handle them effectively.
Basic Error Handling in HTTP Requests
When working with HTTP requests, there can be errors due to various factors like network issues, invalid URLs, or unexpected server responses. It’s important to handle these errors gracefully to ensure that the program continues running smoothly, even when things don’t go as planned.
Common Errors in HTTP Requests
Connection Errors: Connection errors happen when there is a problem establishing a connection to the server. This can occur if the server is down, the network is unavailable, or the URL is incorrect. If no response is received, Python’s
requests
library raises arequests.exceptions.ConnectionError
.Timeout Errors: A timeout error occurs when the server takes longer than expected to respond to a request. This error can be handled by setting a timeout value in the request. If the timeout is reached, a
requests.exceptions.Timeout
is raised.HTTP Status Code Errors: Sometimes, the server may respond with a status code that indicates something went wrong. If the HTTP status code signals an error (e.g., 404 for “Not Found” or 500 for “Internal Server Error”), it should be identified and handled appropriately.
Common HTTP Error Status Codes
The most common HTTP error status codes are the following:
400 Bad Request: Indicates that the server could not process the request because of a client error such as missing required parameters, incorrect request body formatting, etc.
401 Unauthorized: Indicates that the request lacks valid authentication credentials.
403 Forbidden: Indicates that access to the requested resource is forbidden due to lack of proper permissions. In this case, the server understood the request but refused to fulfill it.
404 Not Found: Indicates that the server could not find the requested resource.
500 Internal Server Error: Indicates that the server is unable to process a request because of an unexpected condition.
502 Bad Gateway: Indicates that there was a communication issue between the servers on the internet.
The above errors can be handled using response.raise_for_status()
to trigger an exception.
Using try-except
to Handle Errors
In Python, try-except
blocks help manage errors by allowing the program to continue running even if something goes wrong.
Code that might cause an error is placed inside the try
block, and the except
block defines how to handle the error.
import requests# The URL where we will be sending the POST requesturl = "https://jsonplaceholder.typicode.com/posts"# The data to be sent in the requestdata = {"title": "My New Post","body": "This is the content of the post.","userId": 1}try:# Sending the POST requestresponse = requests.post(url, json=data)# Check for successful requestresponse.raise_for_status()# If successful, print the response contentprint("Response Status Code:", response.status_code)print("Response Content:", response.json())except requests.exceptions.ConnectionError:print("Error: Failed to connect to the server. Please check your internet connection or URL.")except requests.exceptions.Timeout:print("Error: The request timed out. Please try again later.")except requests.exceptions.RequestException as e:# Catch all other HTTP status errors (e.g., 400, 404, etc.)print(f"An error occurred: {e}")
By checking for common issues like connection problems or unexpected status codes, we can provide users with helpful messages and avoid application crashes. Without proper error handling, the application could fail silently or display confusing error messages to users.
With basic error handling in place, we are now ready to explore more advanced topics like handling timeouts, retrying failed requests, or working with authentication in HTTP requests.
Conclusion
In this article, we have learned the basics of making HTTP requests in Python using the requests
library, how to send GET
and POST
requests, how to handle responses, and how to manage errors effectively. These skills are essential for interacting with web APIs and retrieving or sending data over the internet.
For continued learning, check out the following resources:
Author
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
HTTP Requests in Velo
Expand your website’s capabilities by using "wix-fetch" to make various HTTP requests. - Article
Back-End Web Architecture
This article provides an overview of servers, databases, routing, and anything else that happens between when a client makes a request and receives a response.
Learn more on Codecademy
- Skill path
Code Foundations
Start your programming journey with an introduction to the world of code and basic concepts.Includes 5 CoursesWith CertificateBeginner Friendly4 hours - Career path
Full-Stack Engineer
A full-stack engineer can get a project done from start to finish, back-end to front-end.Includes 51 CoursesWith Professional CertificationBeginner Friendly150 hours