Solving Django N+1 problem

In this article, you will find answers to the following questions:

  • django n+1 problem
  • django select_related
  • django prefetch_related
  • select_related vs prefetch_related django difference
  • invalid field name(s) given in select_related
  • invalid parameter to prefetch_related() error

Optimizing Django model relation queries with select_related and prefetch_related

When working with Django models, you may run into inefficient queries. This can have a significant impact on performance, especially when dealing with large datasets.

Django’s select_related and prefetch_related methods are powerful tools to improve the speed of those queries.

Understanding model relationships

Before diving into optimization techniques, let’s revisit the concept of model relationships:

Foreign Key. A one-to-many relationship where an instance in one model (child) references a single instance in another model (parent). A foreign key is being located in the “many” model.

Many-to-Many. A relationship where multiple instances in one model can be associated with multiple instances in another model through an intermediary model.

The N+1 Query Problem

When fetching model instances with related objects without optimization, Django’s ORM might execute separate queries for each related object. This is known as the N+1 query problem. For example, consider a Book model with a ForeignKey to an Author model:

class Author(models.Model):
    name = models.CharField(max_length=50)

class Book(models.Model):
    name = models.CharField(max_length=50)
    author = models.ForeignKey(Author, on_delete=models.CASCADE)

Do not forget to apply migrations:

python3 manage.py makemigrations
python3 manage.py migrate

Alright, without optimization, fetching all Book objects might trigger an additional query for each author field:

  1. Query to fetch all Book instances.
  2. N separate queries (where N is the number of books) to fetch the corresponding author for each book.

Let’s optimize it. Open the Django shell and use it as playground:

python3 manage.py shell

Import the models:

from app1.models import Author, Book

select_related optimization

This one makes Django fetch related objects along with the main query using SQL joins. So, there will be only one SQL query with a join.

Ideal for One-to-One or ForeignKey One-to-Many relationships where you typically need all related data for each instance. It would look like this:

Book.objects.all().select_related("author")

Keep in mind that we can use it only for the models that have ForeignKey defined. When trying to apply it vice versa:

Author.objects.all().select_related("book")

We get the error:

line 279, in get_select
    related_klass_infos = self.get_related_selections(select, select_mask)
line 1367, in get_related_selections
    raise FieldError(
django.core.exceptions.FieldError: Invalid field name(s) given in select_related: 'book'. Choices are: (none)

But when we change the relationship in favor of One-to-One using OneToOneField this way:

class Author(models.Model):
    name = models.CharField(max_length=50)

class Book(models.Model):
    name = models.CharField(max_length=50)
    author = models.OneToOneField(Author, on_delete=models.CASCADE)

We can use select_related in a bilateral fashion! Either way will work:

Book.objects.all().select_related("author")
Author.objects.all().select_related("book")

Pros and cons of select_related:

  • Efficient for small result sets, reduces database round trips.
  • Can become inefficient for large datasets due to bulky queries, not suitable for many-to-many relationships.

prefetch_related optimization

It executes separate queries for each related model but combines the results in Python. It will fire two SQL queries (one main, one for related objects)

Well-suited for Many-to-Many relationships or when you only need a limited set of related objects. So, in this case:

Book.objects.prefetch_related('author')

Django executes a separate query once to retrieve all Author objects associated with the fetched Book instances. This query might leverage optimizations like filtering based on foreign keys.

Then Django cleverly combines the results in Python memory. For each Book object, it links it to the corresponding Author instance based on foreign key relationships.

Again, when using the ForeignKey type of thing, this thing will work only for the model that has ForeignKey field defined (“many” model):

Book.objects.prefetch_related('author')

And won’t fork for the “one” model:

Author.objects.all().prefetch_related("book")

It will throw an error:

line 1273, in _prefetch_related_objects
    prefetch_related_objects(self._result_cache, *self._prefetch_related_lookups)
line 2296, in prefetch_related_objects
    raise AttributeError(
AttributeError: Cannot find 'book' on Author object, 'book' is an invalid parameter to prefetch_related()

Choosing between select_related and prefetch_related

The optimal choice depends on your specific use case:

Use select_related when:

  • You need all related data for each main object.
  • Dealing with small result sets.
  • The related model has a limited number of fields.

Use prefetch_related when:

  • You have a many-to-many relationship.
  • Dealing with large datasets.
  • You only need a limited set of fields from the related model.

Cheat sheet

Suppose Book and Author are One-to-Many. ForeignKey (“many” side) is Book. In this case, these operations will work:

Book.objects.all().select_related("author")
Book.objects.all().prefetch_related("author")

And these will fail:

Author.objects.all().select_related("book")
Author.objects.all().prefetch_related("book")

Suppose Book and Author are One-to-One. In this case, these operations will work for each model:

Book.objects.all().select_related("author")
Author.objects.all().select_related("book")
Book.objects.all().prefetch_related("author")
Author.objects.all().prefetch_related("book")

Consider Many-to-Many relationships as well.

Leave a Reply

Your email address will not be published. Required fields are marked *