">

MongoDB: Create Operations


Understanding the CRUD operations is the first step in understanding a Database. And Create is the first step in the CRUD.

MongoDB allows for different ways of creating data. It could be implicitly created by access, or it could be added using the insert/ insertOne/ insertMany commands. Or data could be directly imported from an external file. Let's check out each of these.

Implicit


MongoDB implicitly creates a database / collection if we access it. So we do not have create a new database or a collection. We just access it and MongoDB creates it if required.

Check out the code below

> use testDb
switched to db testDb
> show dbs
admin   0.000GB
config  0.000GB
local   0.000GB
> db.testCollection.insertOne({"hello":"world"})
{
        "acknowledged" : true,
        "insertedId" : ObjectId("5f5b729c877493e07117f449")
}
> show dbs
admin   0.000GB
config  0.000GB
local   0.000GB
testdb  0.000GB
> db.testCollection.find({})
{ "_id" : ObjectId("5f5b729c877493e07117f449"), "hello" : "world" }
>

Note that the testDb was not available in the beginning. It did not throw any error when we switched over to testDb. But it did not create it. This missing DB did not have any data - so the testCollection was missing. But, it did not throw any error when we inserted data into it.

It just accepted the command. And at that point, created the database as well as the collection. We can call it a lazy implicit creation upon use.

This is very convenient, and saves a lot of time in creating and defining the structure. But, we should be careful. We can get into trouble if we have typo's in the code. An incorrectly spelled DB or Collection name will not generate an error. This problem will be identified much later when we have a functional error as a result of this broken data model.

This can be prevented by the admin - by restricting the permissions of the user

Insert


The next is Insert. MongoDB provides us three flavors of the insert command - insert(), insertOne() and insertMany().

Originally, we had only insert(). The other two were introduced later, to handle the lacking in insert(). Our code should use insertOne() or insertMany(). These two are versatile enough to handle all scenarios. So we do not need anything more.

Their syntax is quite intuitive.

//InsertOne:
db.<collection name>.insertOne(<document>, {writeConcern: <document>})

//InsertMany
db.<collection name>.insertMany([<document>,...], {writeConcern: <document>, ordered: <boolean>})

When we invoke insertOne, we pass in one document that should be added to the collection. But, when we invoke insertMany, we pass in an array of documents that should be added.

Apart from this, we have two more optional parameters: writeConcern and ordered. Let's check out what they mean

Ordered


When we insert multiple documents into the DB using the insertMany(), we have to pass in an array of documents. Now MongoDB can interpret this as an ordered sequence of documents, or just a set of documents. How does that matter?

Normally that should not matter. But, the question arises when one of them fails. If we insert 3 documents, and the second in the array fails. What should happen to the third? This is defined by the ordered parameter. If it is true, it means that the documents in the array are ordered, so the third should not be inserted if second fails. But, if it is false, then the third document failing will not stop the third from getting through.

Write Concerns


This is an important concept. When the insert commands are fired, there are several steps in which the data flows to the disk on the several DB instances in the cluster. How long will the caller wait, before it concludes that the request is successful. Potentially, it can break at the final step. But the caller cannot wait that long.

Hence, we need to define a trade-off between performance and data consistency. It can be a simple fire and forget, or it can wait till each replica responds with a success. This is configured using the Write Concerns in the insert commands. There are three parts to this. w, j and wtimeout.

  • The w defines the number of replicas that need to acknowledge before we decide that it is a success. 0 means a fire and forget case.
  • The j defines how deep a write should go into the given replica - before we conclude that it is a success. Upto the Storage Engine, or upto the disk? {j : true} means it has to be written to the Journal on the disk before we conclude.
  • And the wtimeout defines the amount of time that the caller will wait for the response. If the response is not available in that time, it will conclude it was a failure.

These fields are simple to understand, but it requires a lot of understanding and experience to assign correct values to these parameters. So, MongoDB allows us to just go with the defaults, by not specifying any of these values.

_id


Whenever we insert a document into a collection, MongoDB assigns a unique ID for it. This is based on the UUID, and hence is unique across all the documents in the DB. This is internally used by MongoDB for accessing the document. We cannot change it once assigned.

If the insert is successful, we get this _id as a response. In fact, we can assign our own _id when we create a document - in that case, the developer owns the responsibility of making sure that the _id is always unique. If it is not, if we try to insert with a duplicate, that request is rejected.

Import Data


We can also directly import data in a DB collection. A well formed JSON Object / JSON Array can be imported directly from a file.

mongoimport testDataArray.json -d testDB -c testCollection --jsonArray --drop

This will read the file testDataArray.json; validate the contents to be a JSON Array. If it is, it will add all the data to the testCollection in the testDB. As mentioned before, if the database or the collection is missing, it will be created. The final --drop specifies that if the collection already exists, it will be dropped before the new data is added. Else, it will just append.

This way, we can import a single document or an array of documents. That is specified by the --jsonArray parameter.