Table of Contents
Introduction
Most of the things I did with MongoDB so far involved using mongoose to perform basic CRUD operations.
I had already heard of MongoDB aggregations but never stopped to learn more about it. This changed recently when I was fixing an issue that involved a MongoDB aggregation pipeline, which kind of freaked me out at first because I had no idea what it was and how it worked.
Turns out they’re pretty straightforward and really, really, useful.
TLDR
- “Aggregation operations process multiple documents from different collections and return computed results” - MongoDB docs
- Aggregations are pretty useful with aggregation pipelines because it combines multiple stages and can be used with operators
What is an aggregation in MongoDB?
As MongoDB docs put it:
Aggregation operations process multiple documents and return computed results
So whenever you think about certain operations like:
- getting the total sum of products that a certain user has
- counting the valid users (e.g.
Users
withremovedAt: null
andemailVerified: true
) of your system - filtering data that involves multiple documents
- grouping documents by a certain value
You can use - and probably should be using - aggregations to do that.
Why should you use it?
As I’ve said, I had some experience with Mongoose for basic CRUD operations before, so my usual approach to the operations above would be something like:
await ModelOne.find()
for a certain (or multiple) document(s)- use another
await ModelTwo.find()
for querying/filtering a different set of documents that requires information fromModelOne
- …repeat the steps above (super inefficient)
- then on again on JavaScript side:
Array.length
to count the fields- group them with
Object.groupBy
- create some more complex function or use an external lib for operations
Nothing of it is inherently bad, but it’s far from optimal, especially for the amount of await
’s (even using something like Promise.all()
).
With aggregations instead, you can move all that logic to the database layer, this means that you’ll leave the database engine to do the joining and processing (no reason to have multiple await
’s). That enables you to avoid performance bottlenecks and poor concurrency.
How MongoDB does that? Aggregation pipelines
MongoDB aggregations runs in pipelines. Each step of the pipeline is called a stage, and there are a lot of stages.
In each stage you can process, transform or filter documents in a collection. And each stage is the output for the next stage. In that way, you can have multiple stages, passing them forward in the pipeline, combine data from different documents, join collections and, in general, have MongoDB executing a much more complex and robust logic on its side (instead of your application’s). That is, you have a wide range of possibilities to deal with any document in your database, in whichever way you prefer, without worrying about making your server’s code too messy.
For more optimized queries (and a more in-depth content) see aggregation pipeline optimization, especially about indexes
Stages that I used the most and were really helpful:
$lookup
to performs an outer join in the collection specified$match
to perform filters$project
to specify the inclusion or exclusion of given fields; for example:- setting
_id: 0
to exclude the_id
field
- setting
$unwind
for deconstruct an array field- really useful when you expect your filter to find a single document to be used on a next stage
Expression operators
Expressions operators are some useful built-in functions. They can be used for:
- string operations
- regex
- date from string
- concatenate strings
- replace text
- remove whitespaces
- split a string into substrings
- arithmetic expressions
- subtract
- multiply
- power
- sum
- absolute
- value
- arrays
- reduce the values
- map the values
- get a subset of an array
- sort
- boolean expressions (and, not, or)
- dates
- difference between two dates
- get day of the month, year, week
- get hour
- much more stuff
Example
As a simple example, let’s say you have two collections: User
and Product
. You want to fetch all valid products (not removed) given the user that’s logged in bought:
db.users.aggregate([
{
$lookup: {
from: "Product",
let: { userId: "$_id" },
$match: {
removedAt: null,
}
pipeline: [
{
$match: {
$expr: {
$eq: ["$userId", "$$userId"]
}
}
}
],
as: "products"
}
}
])
Now we can access the products via the products
array:
{
"_id": "64a...",
"name": "Gustavo",
"products": [
{ "_id": "99a...", "name": "Product 1", "userId": "64a..." },
...
]
}
Conclusion
I hope this post helped you to at least grasp the idea of why and how MongoDB aggregations can be useful and more performative for your projects.
I’m open to answer any doubt or just talk more about it (or any other cool stuff you might want to chat about). Just contact me in any of my links.