About MongoDB Aggregations

Table of Contents

Introduction
What is an aggregation in MongoDB?
Why should you use it?
How MongoDB does that? Aggregation pipelines
Expression operators
Example
Conclusion

Introduction

Most of the things I did with MongoDB so far involved using mongoose to perform basic CRUD operations.

I had already heard of MongoDB aggregations but never stopped to learn more about it. This changed recently when I was fixing an issue that involved a MongoDB aggregation pipeline, which kind of freaked me out at first because I had no idea what it was and how it worked.

Turns out they’re pretty straightforward and really, really, useful.

TLDR

“Aggregation operations process multiple documents from different collections and return computed results” - MongoDB docs

Aggregations are pretty useful with aggregation pipelines because it combines multiple stages and can be used with operators

What is an aggregation in MongoDB?

As MongoDB docs put it:

Aggregation operations process multiple documents and return computed results

So whenever you think about certain operations like:

getting the total sum of products that a certain user has
counting the valid users (e.g. Users with removedAt: null and emailVerified: true) of your system
filtering data that involves multiple documents
grouping documents by a certain value

You can use - and probably should be using - aggregations to do that.

Why should you use it?

As I’ve said, I had some experience with Mongoose for basic CRUD operations before, so my usual approach to the operations above would be something like:

await ModelOne.find() for a certain (or multiple) document(s)
use another await ModelTwo.find() for querying/filtering a different set of documents that requires information from ModelOne
…repeat the steps above (super inefficient)
then on again on JavaScript side:
- Array.length to count the fields
- group them with Object.groupBy
- create some more complex function or use an external lib for operations

Nothing of it is inherently bad, but it’s far from optimal, especially for the amount of await’s (even using something like Promise.all()).

With aggregations instead, you can move all that logic to the database layer, this means that you’ll leave the database engine to do the joining and processing (no reason to have multiple await’s). That enables you to avoid performance bottlenecks and poor concurrency.

How MongoDB does that? Aggregation pipelines

MongoDB aggregations runs in pipelines. Each step of the pipeline is called a stage, and there are a lot of stages.

In each stage you can process, transform or filter documents in a collection. And each stage is the output for the next stage. In that way, you can have multiple stages, passing them forward in the pipeline, combine data from different documents, join collections and, in general, have MongoDB executing a much more complex and robust logic on its side (instead of your application’s). That is, you have a wide range of possibilities to deal with any document in your database, in whichever way you prefer, without worrying about making your server’s code too messy.

For more optimized queries (and a more in-depth content) see aggregation pipeline optimization, especially about indexes

Stages that I used the most and were really helpful:

$lookup to performs an outer join in the collection specified
$match to perform filters
$project to specify the inclusion or exclusion of given fields; for example:
- setting _id: 0 to exclude the _id field
$unwind for deconstruct an array field
- really useful when you expect your filter to find a single document to be used on a next stage

Expression operators

Expressions operators are some useful built-in functions. They can be used for:

string operations
- regex
- date from string
- concatenate strings
- replace text
- remove whitespaces
- split a string into substrings
arithmetic expressions
- subtract
- multiply
- power
- sum
- absolute
- value
arrays
- reduce the values
- map the values
- get a subset of an array
- sort
boolean expressions (and, not, or)
dates
- difference between two dates
- get day of the month, year, week
- get hour
much more stuff

Example

As a simple example, let’s say you have two collections: User and Product. You want to fetch all valid products (not removed) given the user that’s logged in bought:

db.users.aggregate([
  {
    $lookup: {
      from: "Product",
      let: { userId: "$_id" },
      $match: {
        removedAt: null,
      }
      pipeline: [
        {
          $match: {
            $expr: {
              $eq: ["$userId", "$$userId"]
            }
          }
        }
      ],
      as: "products"
    }
  }
])

Now we can access the products via the products array:

{
  "_id": "64a...",
  "name": "Gustavo",
  "products": [
    { "_id": "99a...", "name": "Product 1", "userId": "64a..." },
    ...
  ]
}

Conclusion

I hope this post helped you to at least grasp the idea of why and how MongoDB aggregations can be useful and more performative for your projects.

I’m open to answer any doubt or just talk more about it (or any other cool stuff you might want to chat about). Just contact me in any of my links.