MongoDB - Map-Reduce

In MongoDB, map-reduce is a data processing programming model that helps to perform operations on large data sets and produce aggregated results.
MongoDB provides the mapReduce() function to perform the map-reduce operations. This function has two main functions, i.e., map function and reduce function.
The map function is used to group all the data based on the key-value and the reduce function is used to perform operations on the mapped data. So, the data is independently mapped and reduced in different spaces and then combined together in the function and the result will save to the specified new collection.
This mapReduce() function generally operated on large data sets only. Using Map Reduce you can perform aggregation operations such as max, avg on the data using some key and it is similar to groupBy in SQL. It performs on data independently and parallel.
Let’s try to understand the mapReduce() using the following example:
→Requirement 1)
In this example, we have five records from which we need to take out the “maximum marks of each section” and the keys are id, sec, marks.

db.user_collection.insertMany([
{"id":1, "sec":"A", "marks":80},
{"id":2, "sec":"A", "marks":90},
{"id":1, "sec":"B", "marks":99},
{"id":1, "sec":"B", "marks":95},
{"id":1, "sec":"C", "marks":90}])
Here we need to find the maximum marks in each section. So, our key by which we will group documents is the sec key and the value will be marks. Inside the map function, we use emit(this.sec, this.marks) function, and we will return the sec and marks of each record(document) from the emit function. This is similar to group By MySQL.
var map = function(){ emit({section: this.sec},this.marks)};
After iterating over each document Emit function will give back the data like this:
{“A”:[80, 90]}, {“B”:[99, 90]}, {“C”:[90] }
and upto this point it is what map() function does. The data given by emit function is grouped by sec key, Now this data will be input to our reduce function. Reduce function is where actual aggregation of data takes place. In our example we will pick the Max of each section like for sec A:[80, 90] = 90 (Max) B:[99, 90] = 99 (max) , C:[90] = 90(max).
var reduce = function(section,marks){return { maximum_marks: Math.max.apply(null,marks)};};
Here in reduce() function, we have reduced the records now we will output them into a new collection.{out :”collectionName”}
Syntax:
db.collectionName.mapReduce(
... map(),
...reduce(),
...query{},
...output{}
);
Here,
- map() function: It uses emit() function in which it takes two parameters key and value key. Here the key is on which we make groups like groups by in MySQL. Example like group by ages or names and the second parameter is on which aggregation is performed like avg(), sum() is calculated on.
- reduce() function: It is the step in which we perform our aggregate function like avg(), sum().
- query: Here we will pass the query to filter the resultset.
- output: In this, we will specify the collection name where the result will be stored.
db.user_collection.mapReduce(map,reduce,{out:”output”});
In the above query we have already defined the map, reduce. Then for checking we need to look into the newly created collection we can use the query db.collectionName.find() we get:
{ “_id” : { “section” : “C” }, “value” : { “maximum_marks” : 90 } }
{ “_id” : { “section” : “A” }, “value” : { “maximum_marks” : 90 } }
{ “_id” : { “section” : “B” }, “value” : { “maximum_marks” : 99 } }

→Requirement 2)
Create a sample collection orders
with these documents:
db.orders.insertMany([
{ _id: 1, cust_id: “Ant O. Knee”, ord_date: new Date(“2020–03–01”), price: 25, items: [ { sku: “oranges”, qty: 5, price: 2.5 }, { sku: “apples”, qty: 5, price: 2.5 } ], status: “A” },
{ _id: 2, cust_id: “Ant O. Knee”, ord_date: new Date(“2020–03–08”), price: 70, items: [ { sku: “oranges”, qty: 8, price: 2.5 }, { sku: “chocolates”, qty: 5, price: 10 } ], status: “A” },
{ _id: 3, cust_id: “Busby Bee”, ord_date: new Date(“2020–03–08”), price: 50, items: [ { sku: “oranges”, qty: 10, price: 2.5 }, { sku: “pears”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 4, cust_id: “Busby Bee”, ord_date: new Date(“2020–03–18”), price: 25, items: [ { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 5, cust_id: “Busby Bee”, ord_date: new Date(“2020–03–19”), price: 50, items: [ { sku: “chocolates”, qty: 5, price: 10 } ], status: “A”},
{ _id: 6, cust_id: “Cam Elot”, ord_date: new Date(“2020–03–19”), price: 35, items: [ { sku: “carrots”, qty: 10, price: 1.0 }, { sku: “apples”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 7, cust_id: “Cam Elot”, ord_date: new Date(“2020–03–20”), price: 25, items: [ { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 8, cust_id: “Don Quis”, ord_date: new Date(“2020–03–20”), price: 75, items: [ { sku: “chocolates”, qty: 5, price: 10 }, { sku: “apples”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 9, cust_id: “Don Quis”, ord_date: new Date(“2020–03–20”), price: 55, items: [ { sku: “carrots”, qty: 5, price: 1.0 }, { sku: “apples”, qty: 10, price: 2.5 }, { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” },
{ _id: 10, cust_id: “Don Quis”, ord_date: new Date(“2020–03–23”), price: 25, items: [ { sku: “oranges”, qty: 10, price: 2.5 } ], status: “A” }
])
Return the Total Price Per Customer
Perform the map-reduce operation on the orders
collection to group by the cust_id
, and calculate the sum of the price
for each cust_id
:
- Define the map function to process each input document:
- In the function,
this
refers to the document that the map-reduce operation is processing. - The function maps the
price
to thecust_id
for each document and emits thecust_id
andprice
.
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
2. Define the corresponding reduce function with two arguments keyCustId
and valuesPrices
:
- The
valuesPrices
is an array whose elements are theprice
values emitted by the map function and grouped bykeyCustId
. - The function reduces the
valuesPrice
array to the sum of its elements.
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
3. Perform map-reduce on all documents in the orders
collection using the mapFunction1
map function and the reduceFunction1
reduce function:
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: “map_reduce_example” }
)
This operation outputs the results to a collection named map_reduce_example
. If the map_reduce_example
collection already exists, the operation will replace the contents with the results of this map-reduce operation.
4. Query the map_reduce_example
collection to verify the results:
db.map_reduce_example.find().sort( { _id: 1 } )
The operation returns these documents:
{ “_id” : “Ant O. Knee”, “value” : 95 }
{ “_id” : “Busby Bee”, “value” : 125 }
{ “_id” : “Cam Elot”, “value” : 60 }
{ “_id” : “Don Quis”, “value” : 155 }
Aggregation Alternative
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:
db.orders.aggregate([
{ $group: { _id: “$cust_id”, value: { $sum: “$price” } } },
{ $out: “agg_alternative_1” }
])
- The
$group
stage groups by thecust_id
and calculates thevalue
field (See also$sum
). Thevalue
field contains the totalprice
for eachcust_id
.
The stage output the following documents to the next stage:
{ “_id” : “Don Quis”, “value” : 155 }
{ “_id” : “Ant O. Knee”, “value” : 95 }
{ “_id” : “Cam Elot”, “value” : 60 }
{ “_id” : “Busby Bee”, “value” : 125 }
2. Then, the $out
writes the output to the collection agg_alternative_1
. Alternatively, you could use $merge
instead of $out
.
3. Query the agg_alternative_1
collection to verify the results:
db.agg_alternative_1.find().sort( { _id: 1 } )
The operation returns the following documents:
{ “_id” : “Ant O. Knee”, “value” : 95 }
{ “_id” : “Busby Bee”, “value” : 125 }
{ “_id” : “Cam Elot”, “value” : 60 }
{ “_id” : “Don Quis”, “value” : 155 }
→Requirement 3)
Calculate Order and Total Quantity with Average Quantity Per Item
In the following example, you will see a map-reduce operation on the orders
collection for all documents that have an ord_date
value greater than or equal to 2020-03-01
.
The operation in the example:
- Groups by the
item.sku
field, and calculates the number of orders and the total quantity ordered for eachsku
. - Calculates the average quantity per order for each
sku
value and merges the results into the output collection.
When merging results, if an existing document has the same key as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document.
steps:
- Define the map function to process each input document:
- In the function,
this
refers to the document that the map-reduce operation is processing. - For each item, the function associates the
sku
with a new objectvalue
that contains thecount
of1
and the itemqty
for the order and emits thesku
(stored in thekey
) and thevalue
.
var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = { count: 1, qty: this.items[idx].qty };emit(key, value);
}
};
2. Define the corresponding reduce function with two arguments keySKU
and countObjVals
:
countObjVals
is an array whose elements are the objects mapped to the groupedkeySKU
values passed by map function to the reducer function.- The function reduces the
countObjVals
array to a single objectreducedValue
that contains thecount
and theqty
fields. - In
reducedVal
, thecount
field contains the sum of thecount
fields from the individual array elements, and theqty
field contains the sum of theqty
fields from the individual array elements.
var reduceFunction2 = function(keySKU, countObjVals) {
reducedVal = { count: 0, qty: 0 };for (var idx = 0; idx < countObjVals.length; idx++) {
reducedVal.count += countObjVals[idx].count;
reducedVal.qty += countObjVals[idx].qty;
}return reducedVal;
};
3. Define a finalize function with two arguments key
and reducedVal
. The function modifies the reducedVal
object to add a computed field named avg
and returns the modified object:
var finalizeFunction2 = function (key, reducedVal) {
reducedVal.avg = reducedVal.qty/reducedVal.count;
return reducedVal;
};
4. Perform the map-reduce operation on the orders
collection using the mapFunction2
, reduceFunction2
, and finalizeFunction2
functions:
db.orders.mapReduce(
mapFunction2,
reduceFunction2,
{
out: { merge: “map_reduce_example2” },
query: { ord_date: { $gte: new Date(“2020–03–01”) } },
finalize: finalizeFunction2
}
);
This operation uses the query
field to select only those documents with ord_date
greater than or equal to new Date("2020-03-01")
. Then it outputs the results to a collection map_reduce_example2
.
If the map_reduce_example2
collection already exists, the operation will merge the existing contents with the results of this map-reduce operation. That is, if an existing document has the same key as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document.
5. Query the map_reduce_example2
collection to verify the results:
db.map_reduce_example2.find().sort( { _id: 1 } )
The operation returns these documents:
{ “_id” : “apples”, “value” : { “count” : 4, “qty” : 35, “avg” : 8.75 } }
{ “_id” : “carrots”, “value” : { “count” : 2, “qty” : 15, “avg” : 7.5 } }
{ “_id” : “chocolates”, “value” : { “count” : 3, “qty” : 15, “avg” : 5 } }
{ “_id” : “oranges”, “value” : { “count” : 7, “qty” : 63, “avg” : 9 } }
{ “_id” : “pears”, “value” : { “count” : 1, “qty” : 10, “avg” : 10 } }
Aggregation Alternative
Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:
db.orders.aggregate( [
{ $match: { ord_date: { $gte: new Date(“2020–03–01”) } } },
{ $unwind: “$items” },
{ $group: { _id: “$items.sku”, qty: { $sum: “$items.qty” }, orders_ids: { $addToSet: “$_id” } } },
{ $project: { value: { count: { $size: “$orders_ids” }, qty: “$qty”, avg: { $divide: [ “$qty”, { $size: “$orders_ids” } ] } } } },
{ $merge: { into: “agg_alternative_3”, on: “_id”, whenMatched: “replace”, whenNotMatched: “insert” } }
] )
- The
$match
stage selects only those documents withord_date
greater than or equal tonew Date("2020-03-01")
. - The
$unwind
stage breaks down the document by theitems
array field to output a document for each array element. For example:
{ “_id” : 1, “cust_id” : “Ant O. Knee”, “ord_date” : ISODate(“2020–03–01T00:00:00Z”), “price” : 25, “items” : { “sku” : “oranges”, “qty” : 5, “price” : 2.5 }, “status” : “A” }
{ “_id” : 1, “cust_id” : “Ant O. Knee”, “ord_date” : ISODate(“2020–03–01T00:00:00Z”), “price” : 25, “items” : { “sku” : “apples”, “qty” : 5, “price” : 2.5 }, “status” : “A” }
{ “_id” : 2, “cust_id” : “Ant O. Knee”, “ord_date” : ISODate(“2020–03–08T00:00:00Z”), “price” : 70, “items” : { “sku” : “oranges”, “qty” : 8, “price” : 2.5 }, “status” : “A” }
{ “_id” : 2, “cust_id” : “Ant O. Knee”, “ord_date” : ISODate(“2020–03–08T00:00:00Z”), “price” : 70, “items” : { “sku” : “chocolates”, “qty” : 5, “price” : 10 }, “status” : “A” }
{ “_id” : 3, “cust_id” : “Busby Bee”, “ord_date” : ISODate(“2020–03–08T00:00:00Z”), “price” : 50, “items” : { “sku” : “oranges”, “qty” : 10, “price” : 2.5 }, “status” : “A” }
{ “_id” : 3, “cust_id” : “Busby Bee”, “ord_date” : ISODate(“2020–03–08T00:00:00Z”), “price” : 50, “items” : { “sku” : “pears”, “qty” : 10, “price” : 2.5 }, “status” : “A” }
{ “_id” : 4, “cust_id” : “Busby Bee”, “ord_date” : ISODate(“2020–03–18T00:00:00Z”), “price” : 25, “items” : { “sku” : “oranges”, “qty” : 10, “price” : 2.5 }, “status” : “A” }
{ “_id” : 5, “cust_id” : “Busby Bee”, “ord_date” : ISODate(“2020–03–19T00:00:00Z”), “price” : 50, “items” : { “sku” : “chocolates”, “qty” : 5, “price” : 10 }, “status” : “A” }
…
3. The $group
stage groups by the items.sku
, calculating for each sku:
- The
qty
field. Theqty
field contains thetotalqty
ordered per eachitems.sku
(See$sum
). - The
orders_ids
array. Theorders_ids
field contains anarray of distinct order_id
's for theitems.sku
(See$addToSet
).
{ “_id” : “chocolates”, “qty” : 15, “orders_ids” : [ 2, 5, 8 ] }
{ “_id” : “oranges”, “qty” : 63, “orders_ids” : [ 4, 7, 3, 2, 9, 1, 10 ] }
{ “_id” : “carrots”, “qty” : 15, “orders_ids” : [ 6, 9 ] }
{ “_id” : “apples”, “qty” : 35, “orders_ids” : [ 9, 8, 1, 6 ] }
{ “_id” : “pears”, “qty” : 10, “orders_ids” : [ 3 ] }
- The
$project
stage reshapes the output document to mirror the map-reduce's output to have two fields_id
andvalue
. The$project
sets: - The
$unwind
stage breaks down the document by theitems
array field to output a document for each array element. For example:
{ “_id” : 1, “cust_id” : “Ant O. Knee”, “ord_date” : ISODate(“2020–03–01T00:00:00Z”), “price” : 25, “items” : { “sku” : “oranges”, “qty” : 5, “price” : 2.5 }, “status” : “A” }
{ “_id” : 1, “cust_id” : “Ant O. Knee”, “ord_date” : ISODate(“2020–03–01T00:00:00Z”), “price” : 25, “items” : { “sku” : “apples”, “qty” : 5, “price” : 2.5 }, “status” : “A” }
{ “_id” : 2, “cust_id” : “Ant O. Knee”, “ord_date” : ISODate(“2020–03–08T00:00:00Z”), “price” : 70, “items” : { “sku” : “oranges”, “qty” : 8, “price” : 2.5 }, “status” : “A” }
{ “_id” : 2, “cust_id” : “Ant O. Knee”, “ord_date” : ISODate(“2020–03–08T00:00:00Z”), “price” : 70, “items” : { “sku” : “chocolates”, “qty” : 5, “price” : 10 }, “status” : “A” }
{ “_id” : 3, “cust_id” : “Busby Bee”, “ord_date” : ISODate(“2020–03–08T00:00:00Z”), “price” : 50, “items” : { “sku” : “oranges”, “qty” : 10, “price” : 2.5 }, “status” : “A” }
{ “_id” : 3, “cust_id” : “Busby Bee”, “ord_date” : ISODate(“2020–03–08T00:00:00Z”), “price” : 50, “items” : { “sku” : “pears”, “qty” : 10, “price” : 2.5 }, “status” : “A” }
{ “_id” : 4, “cust_id” : “Busby Bee”, “ord_date” : ISODate(“2020–03–18T00:00:00Z”), “price” : 25, “items” : { “sku” : “oranges”, “qty” : 10, “price” : 2.5 }, “status” : “A” }
{ “_id” : 5, “cust_id” : “Busby Bee”, “ord_date” : ISODate(“2020–03–19T00:00:00Z”), “price” : 50, “items” : { “sku” : “chocolates”, “qty” : 5, “price” : 10 }, “status” : “A” }
…
6. The $group
stage groups by the items.sku
, calculating for each sku:
- The
qty
field. Theqty
field contains the totalqty
ordered per eachitems.sku
using$sum
. - The
orders_ids
array. Theorders_ids
field contains an array of distinct order_id
's for theitems.sku
using$addToSet
.
{ “_id” : “chocolates”, “qty” : 15, “orders_ids” : [ 2, 5, 8 ] }
{ “_id” : “oranges”, “qty” : 63, “orders_ids” : [ 4, 7, 3, 2, 9, 1, 10 ] }
{ “_id” : “carrots”, “qty” : 15, “orders_ids” : [ 6, 9 ] }
{ “_id” : “apples”, “qty” : 35, “orders_ids” : [ 9, 8, 1, 6 ] }
{ “_id” : “pears”, “qty” : 10, “orders_ids” : [ 3 ] }
7. The $project
stage reshapes the output document to mirror the map-reduce's output to have two fields _id
and value
. The $project
sets:
- the
value.count
to the size of theorders_ids
array using$size
. - the
value.qty
to theqty
field of input document. - the
value.avg
to the average number of qty per order using$divide
and$size
.
{ “_id” : “apples”, “value” : { “count” : 4, “qty” : 35, “avg” : 8.75 } }
{ “_id” : “pears”, “value” : { “count” : 1, “qty” : 10, “avg” : 10 } }
{ “_id” : “chocolates”, “value” : { “count” : 3, “qty” : 15, “avg” : 5 } }
{ “_id” : “oranges”, “value” : { “count” : 7, “qty” : 63, “avg” : 9 } }
{ “_id” : “carrots”, “value” : { “count” : 2, “qty” : 15, “avg” : 7.5 } }
8. Finally, the $merge
writes the output to the collection agg_alternative_3
. If an existing document has the same key _id
as the new result, the operation overwrites the existing document. If there is no existing document with the same key, the operation inserts the document.
9. Query the agg_alternative_3
collection to verify the results:
db.agg_alternative_3.find().sort( { _id: 1 } )
The operation returns the following documents:
{ “_id” : “apples”, “value” : { “count” : 4, “qty” : 35, “avg” : 8.75 } }
{ “_id” : “carrots”, “value” : { “count” : 2, “qty” : 15, “avg” : 7.5 } }
{ “_id” : “chocolates”, “value” : { “count” : 3, “qty” : 15, “avg” : 5 } }
{ “_id” : “oranges”, “value” : { “count” : 7, “qty” : 63, “avg” : 9 } }
{ “_id” : “pears”, “value” : { “count” : 1, “qty” : 10, “avg” : 10 } }
▶️ In the end, Thanks For Reading My Article, Hope I was Able to Explain How to Implement MongoDB Map-Reduce in Real World Scenario…