How to Request Webpages Using Python

Learn how to request webpages and get JSON data using Python's requests library. A step-by-step guide with practical examples for making HTTP GET and POST requests in Python.

Introduction to HTTP and Python’s Requests Library

HTTP (HyperText Transfer Protocol) is the backbone of communication on the web, enabling clients and servers to exchange data. Whether a weather app delivers the latest forecast or a social media platform loads the feed, HTTP operates behind the scenes to enable these interactions. HTTP requests allow developers to retrieve resources, send data, and interact with web APIs.

Whether you need to request webpages for web scraping, fetch webpage content for data analysis, or retrieve JSON data from APIs, Python’s requests library makes these tasks straightforward.

In Python, the requests library provides an efficient way to send HTTP requests, enabling developers to interact with web servers with minimal effort.

In this article, we will learn how to use the requests library to make common types of HTTP requests, including GET and POST, and how to handle the responses from the server.

Let’s start by exploring how to set up the requests library, make our first HTTP request, and start communicating with the web.

Setting Up the Environment

The requests library is not included in Python’s standard library but can be added using Python’s package manager pip.

The requests library is compatible with Python versions 3.8 and above, making it suitable for most modern Python environments. This versatile library handles everything from basic webpage requests to retrieving complex JSON data structures from modern web services.

To install the library, use the following command in a terminal:

pip install requests

Once installed, you can quickly verify this by running the following command in the terminal:

pip show requests

If the requests library is installed correctly, the above command will display information about the package, including its version, location, and dependencies like this:

Name: requests
Version: 2.32.2
Summary: Python HTTP for Humans.
Home-page: https://requests.readthedocs.io
Author: Kenneth Reitz
Author-email: [email protected]
License: Apache-2.0
Location: /opt/homebrew/lib/python3.10/site-packages
Requires: certifi, charset-normalizer, idna, urllib3
Required-by: CacheControl, poetry, requests-toolbelt

Note: The output above reflects the current state of the requests library at the time of writing. It might vary depending on the installed version, environment, or future updates to the library.

If the above command gives an error or fails, there might be some minor issues during installation. Here are some common problems and how to resolve them:

  • pip command not recognized: This error indicates that pip is either not installed or added to the system’s PATH. To fix this, you must ensure that pip is installed correctly. On some systems, we may need to use python -m pip install requests instead of just pip to install the package.

  • Permission denied error: This happens when there are insufficient permissions to install packages globally. To resolve this, use the --user flag with the pip command, like this: pip install requests --user, which installs the package in the user’s local directory.

  • Outdated pip version: This happens when using an outdated version of pip. We can update pip by running the command python -m pip install --upgrade pip, which ensures we are using the latest version of the package manager.

  • Incorrect Python version: This happens when we are using an incompatible Python version. The requests library requires Python 3.8 or above. To verify the Python version, run python3 --version in the terminal. If the version is below 3.8, consider updating Python to a newer version.

Once the requests library is successfully installed and any installation issues are resolved, we are all set to make our first HTTP request.

Making HTTP GET Requests

A GET request is used to fetch data from a server, whether you’re trying to request webpage content or get JSON data from an API. The data can come in various formats like JSON or XML.

Making a GET Request to JSONPlaceholder API

Let’s start by making a GET request to a public API. For this example, we’ll use the JSONPlaceholder API, a free online REST API for testing and prototyping.

import requests
# Making a GET request to a public API
response = requests.get("https://jsonplaceholder.typicode.com/todos/1")
# Checking the status code of the response
print("Status Code:", response.status_code)
# Printing the content of the response
print("Response Content:", response.json())

This example demonstrates how to request a webpage and retrieve JSON data from the server in a single step.

In this example, we are requesting data for a specific “to-do” item from the JSONPlaceholder API.

When we make a GET request, the server responds with an HTTP response, which is an object containing several important components:

  1. Status Code: The status code indicates the result of the HTTP request. For example:

    • 200 OK: It indicates that the request was processed successfully.
    • 301 Moved Permanently: It indicates that the requested resource has been assigned a new permanent URL.
    • 404 Not Found: It indicates that the requested resource was not found on the server.
    • 500 Internal Server Error: It indicates that there was an unexpected error on the server. We can check the status code using response.status_code.
  2. Headers: HTTP headers provide additional information about the response, such as content type or server details. We can access headers using response.headers.

  3. Content: The content of the response is typically the data we’re interested in. It can be in various formats, such as JSON, HTML, or plain text. Use response.text to retrieve the raw content and response.json() to parse JSON content.

The code above produces the following output, illustrating how the server responds to a GET request. The output includes the status code, which indicates the success of the request, and the content, which contains the data retrieved from the API:

Status Code: 200
Response Content: {'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}

We can also pass parameters in a GET request to filter or customize the data we receive. Here’s how we can pass parameters in a GET request to get all the posts that belong to the first user:

import requests
# Define the base URL and parameters
url = "https://jsonplaceholder.typicode.com/posts"
params = {'userId': 1}
# Send GET request with parameters
response = requests.get(url, params=params)
# Check if the request was successful
if response.status_code == 200:
print(response.json()) # Print the filtered posts for userId 1
else:
print(f"Failed to retrieve data. Status code: {response.status_code}")

The above code will result in the following output:

{"userId":1,"id":1,"title":"sunt aut facere repellat provident occaecati excepturi optio reprehenderit","body":"quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"}

Now that we know how to retrieve data using GET requests let’s say we want to create a new post. This can be done using a POST request, which sends the item’s details to the server. Let’s see how to send a POST request.

Making HTTP POST Requests

While GET requests are typically used to request webpages or retrieve JSON data, POST requests enable us to send data to a server, such as submitting forms or uploading content.

Making a POST Request to JSONPlaceholder API

Let’s look at a real-world example of submitting form data using a POST request.

For this example, we’ll simulate submitting a form using the JSONPlaceholder API, which allows us to create fake posts in a blog.

import requests
# API URL where the request will be sent
url = "https://jsonplaceholder.typicode.com/posts"
# The data to be sent in the request (JSON format)
data = {
"title": "My New Post",
"body": "This is the content of the post.",
"userId": 1
}
# Sending the POST request with the data
response = requests.post(url, json=data)
# Checking the status code of the response
print("Status Code:", response.status_code)
# Printing the response content (the created post)
print("Response Content:", response.json())
  • URL: The URL is the endpoint to which the POST request will be sent. In this case, we’re sending data to https://jsonplaceholder.typicode.com/posts, a fake API endpoint for creating posts.

  • Data: The data is the payload we are sending in the body of the request. It is formatted as a JSON object, which includes the title, body, and userId of the post.

  • POST Request: We use requests.post() to send the POST request, passing the URL and the data as a JSON object.

  • Response: The response returned by the server includes the created post, an ID assigned by the API, and a 201 Created status code, indicating that the post has been successfully added.

The output for the above code will look like this -

Status Code: 201
Response Content: {'title': 'My New Post', 'body': 'This is the content of the post.', 'userId': 1, 'id': 101}

When making HTTP requests, it’s crucial to handle errors properly. In the next section, we’ll discuss common issues we might encounter when sending POST requests and how to handle them effectively.

Basic Error Handling in HTTP Requests

When working with HTTP requests, there can be errors due to various factors like network issues, invalid URLs, or unexpected server responses. It’s important to handle these errors gracefully to ensure that the program continues running smoothly, even when things don’t go as planned.

Common Errors in HTTP Requests

  • Connection Errors: Connection errors happen when there is a problem establishing a connection to the server. This can occur if the server is down, the network is unavailable, or the URL is incorrect. If no response is received, Python’s requests library raises a requests.exceptions.ConnectionError.

  • Timeout Errors: A timeout error occurs when the server takes longer than expected to respond to a request. This error can be handled by setting a timeout value in the request. If the timeout is reached, a requests.exceptions.Timeout is raised.

  • HTTP Status Code Errors: Sometimes, the server may respond with a status code that indicates something went wrong. If the HTTP status code signals an error (e.g., 404 for “Not Found” or 500 for “Internal Server Error”), it should be identified and handled appropriately.

Common HTTP Error Status Codes

The most common HTTP error status codes are the following:

  • 400 Bad Request: Indicates that the server could not process the request because of a client error such as missing required parameters, incorrect request body formatting, etc.

  • 401 Unauthorized: Indicates that the request lacks valid authentication credentials.

  • 403 Forbidden: Indicates that access to the requested resource is forbidden due to lack of proper permissions. In this case, the server understood the request but refused to fulfill it.

  • 404 Not Found: Indicates that the server could not find the requested resource.

  • 500 Internal Server Error: Indicates that the server is unable to process a request because of an unexpected condition.

  • 502 Bad Gateway: Indicates that there was a communication issue between the servers on the internet.

The above errors can be handled using response.raise_for_status() to trigger an exception.

Using try-except to Handle Errors

In Python, try-except blocks help manage errors by allowing the program to continue running even if something goes wrong.

Code that might cause an error is placed inside the try block, and the except block defines how to handle the error.

import requests
# The URL where we will be sending the POST request
url = "https://jsonplaceholder.typicode.com/posts"
# The data to be sent in the request
data = {
"title": "My New Post",
"body": "This is the content of the post.",
"userId": 1
}
try:
# Sending the POST request
response = requests.post(url, json=data)
# Check for successful request
response.raise_for_status()
# If successful, print the response content
print("Response Status Code:", response.status_code)
print("Response Content:", response.json())
except requests.exceptions.ConnectionError:
print("Error: Failed to connect to the server. Please check your internet connection or URL.")
except requests.exceptions.Timeout:
print("Error: The request timed out. Please try again later.")
except requests.exceptions.RequestException as e:
# Catch all other HTTP status errors (e.g., 400, 404, etc.)
print(f"An error occurred: {e}")

By checking for common issues like connection problems or unexpected status codes, we can provide users with helpful messages and avoid application crashes. Without proper error handling, the application could fail silently or display confusing error messages to users.

With basic error handling in place, we are now ready to explore more advanced topics like handling timeouts, retrying failed requests, or working with authentication in HTTP requests.

Conclusion

In this article, we have learned the basics of making HTTP requests in Python using the requests library, how to send GET and POST requests, how to handle responses, and how to manage errors effectively. These skills are essential for interacting with web APIs and retrieving or sending data over the internet.

For continued learning, check out the following resources:

Author

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team