Optimizing Django Performance: Database Bottlenecks

This post explores practical strategies for addressing database inefficiencies in Django Admin, focusing on optimization techniques that directly target query performance without relying on caching as the primary solution.

Introduction

Django is one of the most popular web frameworks out there, enabling rapid development of secure and maintainable websites. It provides a robust set of tools to build web applications quickly, including an auto-generated admin interface for managing application data. While Django simplifies many aspects of web development, from database to templating abstraction, optimizing its performance—particularly Django Admin pages—can be a complex task and often presents unique challenges.

While caching is commonly suggested to improve performance, it quickly becomes apparent that this approach may not be the most effective in certain cases. In this scenario, the primary performance bottlenecks stemmed from inefficient database queries rather than issues with view generation or URL processing. Caching can be helpful, but applying custom rules for query caching introduces unnecessary complexity and risks masking deeper problems.

How We Use Django

At Blip, we are the maintainers of Surface Security, a security intelligence and automation platform we developed in-house for our security needs, that is written in Python Django. Since its inception, we wanted to spend our limited resources in working on ways to retrieve data from all platforms we could get access to and to get there faster, there was a design decision to use the Django Admin as the main interface of the application, since this would provide all features we needed to display, maintain and work with the data we threw at the platform.

Over time, the application grew in data complexity and size, and we started experiencing performance bottlenecks in the application, where some pages took too long to load, or even fail to load due to timeouts. In addition, our application performance monitoring tools were reporting excessive N+1 queries in many of those models that were experiencing issues.

The Django Admin interface is a powerful feature for developers, allowing easy management of data within web applications. However, as applications scale and handle larger datasets, the performance of the Django Admin can degrade. Optimizing this interface is crucial not just for developers but for end users who rely on quick access to data and administrative functionalities.

Django provides a thoughtful and thorough page on optimizing Django applications. The keyword to any attempt to improve something in the software industry is stated at the beginning of that page: What are we optimizing for?

As with any web application, performance bottlenecks may appear due to many reasons: from database issues, all the way to HTTP issues (templating, connectivity issues in our infrastructure, inefficient logic in our components, etc). The first step is to then understand where our main problem lies and start working through the issues.

As mentioned in the introduction of the section, we were sure the gross of our performance issues lied on the database layer – reading heavy queryset, unoptimized queries. We later learned how some of these inefficiencies emerged from the Django Admin templates, which we will also mention in this post.

Broad Applicability of Optimization Strategies

While the base of our test subject is a Django application leveraging the Admin interface (which is not a common setup), the strategies and concepts discussed here are not limited to just the Admin – in fact, the same principles apply to regular Django applications, and to any application outside the Django realm.

Whether you’re building a user-facing web app, an API, or another admin-like interface, improving database efficiency and reducing these bottlenecks can lead to significant performance gains across the entire application, saving potential infrastructure costs (vertical or horizontal scaling) and improve the user experience of your customers.

The Solution: Focusing on Database Efficiency

To address the performance bottlenecks in the Django Admin interface, the focus was placed primarily on optimizing database queries. While caching can be a useful tool in some scenarios, the key insight was that database inefficiencies were the main issue—applying caching would likely have masked the underlying problems and being an intensive data I/O application caching queries specifically would be a considerable challenge on its own without introducing issues to the end-users. Instead, the optimization process revolved around reducing unnecessary queries, query efficiency, and refining how data is loaded and displayed in the admin interface.

The following summarizes the main suggestions and lessons learned during this process, all of which can be applied to any Django application facing similar challenges.

Tip 1: Leverage django-debug-toolbar for Efficient Performance Analysis

Regardless of whether you have access to more advanced Application Performance Monitoring (APM) tools, the django-debug-toolbar is an invaluable resource for performance optimization work. This tool enables detailed analysis of SQL queries, pinpointing bottlenecks, showing where the slowness is happening and goes in-depth so users can understand what function call is actually slowing down. It is not covered by this post, but it can even do wonders when working with templates too. The django-debug-toolbar provides ample information to help you begin optimizing your application with minimal friction. It’s also straightforward to set up and use. However, a crucial note: ensure that the toolbar configuration is not enabled in production environments. Exposing this tool in a live environment can lead to significant security problems, especially if your application handles sensitive data or is publicly accessible.

During the optimization of several views within the Django Admin, the SQL tab is going to be our primary focus, as it provides the most relevant information for identifying and resolving slow queries. Overall, the django-debug-toolbar is an essential tool for anyone looking to improve the performance of their Django applications.

Tip 2: select_related for all Foreign Keys

A common trait of data-driven applications is to have data relationships represented by foreign keys. This is a basic primitive concept where there is a connection between table A and table B, typically in the form of an additional column with the ID of the object representing this relationship.

In Django, this is defined using the ForeignKey model field type. In the example below, we establish a relationship between an Author and a Book. A book has (at least) one author.

copyclass Book(models.Model): 
    name = models.CharField(...) 
    author = models.ForeignKey("app.Author", on_delete=models.CASCADE) 
 
class Author(models.Model): 
    name = models.TextField(...)

And to represent this model in our Django Admin view, we typically write this admin model:

# In your admin.py 
@admin.register(models.Book) 
class BookAdmin(admin.ModelAdmin): 
    list_display = ("name", "author__name") 
    readonly_fields = ("name", "author__name") 
    search_fields = ("name", "author__name")

However, a setup like the above where Django displays a list of books and their author's name, it will execute one additional query for each book to fetch the associated author. As the number of books grows—reaching hundreds of thousands, for example—this results in a significant performance degradation. The page load time can increase to several seconds due to the inefficient execution of O(n²) queries, ultimately putting a considerable amount of strain on the entire application.

To mitigate this, there are two effective solutions. One is a straightforward approach, while the other offers more granular control:

list_select_related=True in the Admin model definition: This tells Django to automatically prefetch all foreign key relationships of the relevant fields, reducing the number of queries needed.
Explicitly specify which foreign keys to preload (e.g., list_select_related=("author",)): This allows you to selectively optimize the relationships you need, rather than loading all foreign keys. There is a performance tradeoff in this option too, where Django will have to search the model's metadata and add the foreignkey fields to the prefetch, whereas this function immediately specifies which models to add.

Both methods result in an optimized query structure using an inner join, transforming the problem from O(n²) queries into a far more efficient O(n) operation. In many cases, this simple adjustment has significantly improved performance, effectively eliminating the N+1 query problem.

Relevant references:

Tip 3: prefetch_related all those ManyToMany

Many-to-many relationships are commonly used as well to represent complex associations between data entities. These relationships provide greater flexibility in modeling real-world connections within applications. To illustrate this, let's revisit the example from the previous section and refine it further for improved accuracy. A book can have multiple authors, and a better way to represent this relationship would be to change the Book model to have a ManyToMany relationship with the Author.

copyclass Book(models.Model): 
    name = models.CharField(...) 
    author = models.ManyToMany("app.Author") 
 
class Author(models.Model): 
    name = models.TextField(...)

This change represents accurate as the real world more accurately and it's a far better and more scalable solution to represent the problem than, for instance, keeping the ForeignKey relationship and stuffing all Authors Names into the Name field. This will raise problems when searching for books by a given Author, for instance.

This approach also opens opportunities for more sophisticated data visualizations, such as tracking the number of books an author has contributed to, regardless of whether they were the sole author or part of a collaboration.

The Django Admin view will remain like the previous one, but with enhanced functionality:

copy# In your admin.py 
@admin.register(models.Book) 
class BookAdmin(admin.ModelAdmin): 
    list_display = ("name", "author__name") 
    readonly_fields = ("name", "author__name") 
    search_fields = ("name", "author__name")

In Django, ManyToMany fields create a secondary table that just olds the ID of object A and the ID of object B.

Failing to optimize calls against these tables can result in O(n³) queries for each book—one query for the intermediate table that stores the relationship IDs, and another for the destination table (e.g., authors) to fetch the associated data. This inefficiency can quickly become a performance bottleneck as data and traffic increases.

In this case, list_select_related will not work, as it is designed for ForeignKey and OneToOne relationships. Instead, prefetch_related should be used to prefetch related data from the defined models and perform the joining operation in Python, rather than relying on the database.

Unlike with select_related, there is no ModelAdmin attribute to leverage for this optimization. Instead, we need to take a more customized approach by overriding the get_queryset method for the object instance. Since this method returns a queryset, we can add the necessary prefetch_related calls to ensure the data is efficiently retrieved.

copy# In your admin.py 
@admin.register(models.Book) 
class BookAdmin(admin.ModelAdmin): 
    list_display = ("name", "author__name") 
    readonly_fields = ("name", "author__name") 
    search_fields = ("name", "author__name") 
 
    def get_queryset(self, request): 
        return super().get_queryset(request).prefetch_related("author")

What the code is doing is invoking the `super().get_queryset()` first, which runs the normal code from the framework, and returns a queryset. Remember that querysets are lazy-loaded – at this point in the code, there are no SQL commands being executed. We then add prefetch_related to the queryset and return this modification of the queryset.

With this simple adjustment, the Django Admin query execution becomes more efficient, reducing the operation to O(n) again. It now performs a single additional query to fetch the author's table, and any further processing is handled in Python, ensuring minimal strain on the database.

Relevant references:

Tip 4: Use annotations instead of model attributes

In many scenarios, we need to perform some computations on top of our data, like keeping track of how many books an author has published. Several approaches can be taken to implement this functionality:

Maintain a field in the model that updates the count each time the model is saved, incrementing the counter when a new book is added.
Create a method on the model to calculate the count when needed and call this method in the admin view.
Leverage annotations to perform the count at the database level, allowing the database to handle the computation efficiently.

For example, consider the following model with a book count field:

copyclass Author(models.Model): 
    name = models.TextField(...) 
    book_count = models.IntegerField() 
 
    def save(self): 
       # Pretty tricky logic goes here to increment book_count, ensuring no duplicate books, we are only increasing this when added to  
       return super().save(...)

Option 1 involves manually updating the book_count field, which quickly becomes overly complex and difficult to maintain. Given its challenges, this approach is generally not recommended. Furthermore, this adds logic to a method that should not be used for this purpose. What about when the object is modified using the update() method and therefore save signal is never called? You will lose the counter updates.

Option 2 involves adding a callable in the Admin class field list_display, to calculate the count when each object is loaded:

[email protected](models.Author) 
class AuthorAdmin(admin.ModelAdmin): 
    list_display = ("name", "book_count") 
    readonly_fields = ("name", "book_count") 
    search_fields = ("name") 
 
    def book_count(self, obj): 
        return models.Author.objects.filter(author=obj).count() 
 
    book_count.short_description = "Published of books"

While this approach may seem straightforward, it introduces a performance concern. For each author in the list, another query is fired individually to count the books, putting significant stress on the database to load the changelist view.

Option 3 is the correct approach, using Django's ORM capabilities more efficiently to perform the count directly within the database query, minimizing the number of queries and reducing the load on the application:

[email protected](models.Author) 
class AuthorAdmin(admin.ModelAdmin): 
    list_display = ("name", "book_count") 
    readonly_fields = ("name", "book_count") 
    search_fields = ("name") 
 
    def get_queryset(self, request): 
        return super().get_queryset(request).annotate(num_books=Count("books")) 
 
    def book_count(self, obj): 
        return obj.num_books 
 
    book_count.short_description = "Published books"

Here, get_queryset is overridden again to annotate it with a num_books field that counts the books for each author. This is a far more efficient approach, as the database handles the counting in a single query, reducing the number of queries from O(n²) to O(n).

This solution not only improves performance, but also simplifies the code. Django intelligently handles the necessary ID filtering when querying related models, which improves maintainability and readability. In the end, this method not only boosts performance, but also makes the codebase cleaner and easier to manage.

Relevant references:

Tip 5: Proper use of indexes field attribute

While the tips in this post are presented in no particular order, understanding and optimizing your models' relationships and querying patterns is crucial before diving into indexing.

One of the final optimization technique to consider is the correct use of indexes. Previously, adding db_index=True to a model field was the standard approach, but Django has since moved toward a more explicit and detailed declaration of indexes through Meta options.

By using the db_index attribute or the indexes Meta property, you can create an index on specific fields, significantly improving query speed. This should only be used on frequently queried fields. This is particularly beneficial for fields that are often used in filter conditions or join operations. However, it’s important to note that ForeignKey, ManyToMany, and fields included in UniqueConstraints do not require an explicit index definition, as Django handles indexing for these automatically.

Relevant references:

Tip 6: Query results caching

While this post has focused primarily on the SQL parts that can be optimized, caching is sometimes a solution to our problems. Repeated queries are a good candidate for a special type of caching possible in the out-of-the-box tools of Django – caching query results. This can significantly improve the performance of the code and is particularly useful for blocks that repeatedly execute the same query, even with variable parameters, which can be a common pattern in many Django applications.

To solve this problem, we can use Django's built-in caching engine, or we can use a smart solution using Python's built-in functools.cache decorator. They work differently and serve different purposes however, so it's worth to go over each one of them:

Using functools

This approach requires no additional packages or setup. functools.cache has been part of Python's standard library since Python 3.9, so any recent Python/Django project can benefit from this implementation.

This option is best suited for cases where you want to optimize a long standing I/O process, when a given query is repeatedly happening in that process. Let's assume there is a piece of logic in your application that performs a query inside a loop:

def my_method(self): 
    for entry in MyModel.objects.all(): 
        another_entry = AnotherModel.objects.filter(entry=entry).first() 
        pass

The query within the for-loop can be cached. But perhaps it's not convenient to cache it in the entire website because the next time this runs the results may have changed. However, if we cache the queries' results when it's running, we may decrease the processing time considerably.

Here's how to implement it:

[email protected] 
def getAnotherModel(entry): 
    return AnotherModel.objects.filter(entry=entry).first() 
 
 
def handle(self): 
    for entry in MyModel.objects.all(): 
        another_entry = getAnotherModel(entry) 
        pass

With this modification, for the duration of the custom command, the query results will be cached in memory. This means that subsequent calls to getAnotherModel(entry) will retrieve the cached result instead of performing the same query again, significantly reducing database load.

While this example may seem trivial, many housekeeping commands and other processes involve repetitive queries with identical parameters. By caching the results, we unload the database servers and improve overall application performance.

Relevant references:

Using Django Cache Framework

This option can suit you if you want to cache queries across the whole project. Typically, this is the solution if there is a query that repeats itself across the whole project. The remaining of the chapter purposely skips setting up the caching service as that is a different discussion in itself.

Django caching framework supports many primites but the one we are looking for accesses the low-level API to cache the queryset's results.

from django.core.cache import cache 
from .models import Book 
 
def get_expensive_query(): 
    # Try to get the data from the cache 
    books = cache.get('expensive_query_key') 
    if not books: 
        # If not cached, perform the query and store it in the cache 
        books = Books.objects.filter(author__name="Zé Carlos") 
        cache.set('expensive_query_key', products, 300)  # Cache for 5 minutes 
    return books

The main advantage of this approach is that this would be available everywhere in the project. Your view could invoke cache.get("expensive_query_key") and get the cached results, as opposed to the previous solution which would only be available at runtime and for the lifetime of the execution.

This approach also comes with its challenges:

Caching strategy is required, as we do not want to cache everything, and we need to compromise on expiration times.
We need ways to invalidate caches when the underlying data changes.
Caching may not solve your problems, so it's worth comparing first if it is actually helping (go back to tip 1, with the Django toolbar, as there is a tab specifically for caching metrics.

Inlines: A Performance Bottleneck to Consider

Inlines have been a notable challenge when optimizing the Django Admin interface. While they offer significant convenience, the performance trade-offs often outweigh their benefits, particularly in complex, data-heavy applications. Searching for solutions to inline performance issues reveals numerous discussions, many offering quick fixes that adjust querysets or apply other hacks. In our case, the added complexity of such solutions didn't justify the performance improvements, but depending on your specific application, they might still prove valuable.

This leads to the broader consideration of custom SQL queries. When standard optimizations no longer suffice, or when you’re working with particularly large datasets, custom SQL can offer a more effective solution. In our case, working with a large-scale database, custom SQL queries have been a game changer, significantly improving performance. However, this comes with its own complexities and requires careful planning.

While custom SQL is outside the scope of this article, it’s important to note that inlines—if used excessively—can become a performance bottleneck. If your application relies heavily on inlines, consider revisiting their usage and exploring alternatives that may better suit your performance needs.

Relevant References:

Conclusion

In this article, we've discussed various optimization strategies that can help improve the performance of Django applications, particularly when working with large datasets or complex models. From reducing query counts using select_related and prefetch_related to leveraging django-debug-toolbar for detailed query analysis, these tips provide a practical foundation for tackling common performance issues.

However, optimization is an ongoing process. The techniques presented here, such as proper indexing and query caching, can make a significant difference, but they need to be tailored to the specific needs of your application. Whether you’re dealing with the performance trade-offs of inlines, implementing custom SQL queries, or refining other parts of your system, the key is continuous evaluation and adaptation to ensure your application scales effectively without compromising performance.