MongoDB

• A document-oriented database, which aims to ease scaling out
• It automatically takes care of balancing data and load across a cluster, redistributing documents automatically and routing reads and writes to the correct machines.
• By allowing embedded documents and arrays, the document-oriented approach makes it possible to represent complex hierarchical relationships with a single record.
• There are also no predefined schemas: a document’s keys and values are not of fixed types or sizes so adding or removing fields as needed becomes easier.
• It is type- and case-sensitive.
• It can query for all documents where <bla_bla> is an element of the <bla_bla> array.
• If there is a common query, you can even create an index on the <bla_bla> key to improve the query’s speed.
• It allows atomic updates that modify the contents of arrays.
• It understands the structure of embedded documents and is able to reach inside them to build indexes, perform queries, or make updates.

Basic concepts

Collection

• MongoDB groups documents into collections.
• Analog of table in RDBMs
• It has dynamic schemas.
• A single collection can have any number of different shapes.
• Defining schemas is good practice forced by
• MongoDB’s document validation functionality
• Object-document mapping libraries available for many PLs
• It is identified by its name.
• "" is not valid.
• It must not contain \0 (the null chracter), which signifies the end of a key.
• Not start with system..
• $ have special properties and should only be used in certain circumstances. • Naming: <database_name>.<collection_name> Why we should use more than one collection • If we’re querying for blog posts, it’s a hassle to weed out documents containing author data. • It’s much faster to get a list of collections than to extract a list of the types of documents in a collection. • Grouping documents of the same kind together in the same collection allows for data locality. • By putting only documents of a single type into the same collection, we can index our collections more efficiently. Subcollections • <collection>.<subcollection> only for organizational purposes Database • MongoDB groups collections into databases. • A single instance of MongoDB can host multiple independent databases, each grouping together zero or more collections. • Storing all data for a single application in the same database is practical. • /, \, ., ", *, <, >, :, |, ?, $,  , or \0 not be contained in naming.
• Database names are canse-insensitive, limited to a max of 64 bytes.
• Reserved names
• admin db plays a role in authentication and authorization.
• local db stores data specific to a single server.
• config db stores information about each shard.

show dbs use <database_name> <database_name>

Creating Users

db.createUser({ user : <user_name>, pwd : <password>, roles : ["readWrite", "dbAdmin"] });

Document

• The basic unit of data for MongoDB (~row in RDBMs)
• An ordered set of keys with associated values
• Its representation depends on the PL
• Keys are strings.
• It must not contain \0 (the null chracter), which signifies the end of a key.
• . and $ have special properties and should only be used in certain circumstances. • It cannot contain replicated keys. Mongo shell • Built-in support for administering MongoDB instances and manipulating data using the MongoDB query language. • mongo <code1>.js <code2>.js Data types Arrays • Arrays can contain different data types as values. • {"x" : [ "pie", 3.14 ]} • Ordered operations • lists • stacks • queues • Unordered operations • sets • Arrays use 0-based indexing. • You can manipulate the values in array in two ways: • by position • by position operator $

Array operators

• "$push" adds elements to the end of an array if the array exists. If not, it creates an array. Modifiers with push • "$each" pushes multiple values in one operation.

• i.e. db.stock.ticker.updateOne({"_id" : "GOOG"}, {"$push" : {"hourly" : {"$each" : [562.776, 562.790, 559.123]}}})
• "$slice" prevents an array from growing beyond a certain size. • i.e. db.movies.updateOne({"genre" : "horror"}, {"$push" : {"top10" : {"$each" : ["Nightmare on Elm Street", "Saw"], "$slice" : -10}}}) limits the array to the last 10 elements pushed.
• "$sort" sorts all of the objects in the array by <field>. • "$ne" adds values if they are not present.

• i.e. db.papers.updateOne({"authors cited" : {"$ne" : "Richie"}}, {$push : {"authors cited" : "Richie"}})
• "$addToSet" adds values if they are not present. It prevents duplications, which is useful where "$ne" does not work. "$addToSet"+"$each" can add multiple unique values, which cannot be done with "$ne"+"$push".

• "$pop" treats an array like a queue or stack. • {"$pop" : {"key" : 1}} removes an element from the end of the array. {"$pop" : {"key" : -1}} removes it from the beginning. • "$pull" removes all elements that match the given criteria. ([1, 1, 2, 1] - pull 1 -> [2])

db.lists.insertOne({"todo" : ["dishes", "laundry", "dry cleaning"]})

find()

• It performs queries in MongoDB.
• It returns a subset of documents in a collection.
• find(<query_criteria>, <specify_the_keys_you_want>)
• i.e. db.users.find({}, {"username" : 1, "email" : 1}) returns only the username and email keys.
• i.e. db.users.find({}, {"fatal_weakness" : 0}) returns except the fatal_weakness key.
• {} matches everything in the collection.

Query conditionals

• "$lt", "$lte", "$gt", "$gte"

db.users.find({"age" : {"$gte" : 18, "$lte" : 30}}) // 18 < age < 30

// Dates
start = new Date("01/01/2007")
db.users.find("registered" : {"$lt" : start}) // before January 1, 2007  • "$ne" := not equal to

• i.e. db.users.find({"username" : {"$ne" : "joe"}}) • "$in", "$nin" (not in), and "$or" queries for a variety of values for a single key.

• While "$or" will always work, use "$in" whenever possible as the query optimizer handles it more efficiently.

db.raffle.find({"ticket_no" : {"$in" : [12345, "joe"]}}) db.raffle.find({"$or" : [{"ticket_no" : 725}, {"winner" : true}]})
db.raffle.find({"$or" : [{"ticket_no" : {"$in" : [725, 542, 390]}}, {"winner" : true}]}) // $or can contain other conditionals.  • "$not" is a metaconditional so it can be applied on top of any other criteria.

Type-specific queries

null

• It also matches “does not exist.” Thus, querying for a key with the value null will return all documents lacking that key.

• i.e. db.users.find({"name" : {"$regex" : /joe/i }}) search for case-insensitive i.e. Joe and joe Querying arrays db.food.insertOne({"fruit" : ["apple", "banana", "peach"]}) db.food.find({"fruit" : "banana"}) // will match  • "$all" matches a list of elements. Order does not matter.
db.food.insertOne({"_id" : 1, "fruit" : ["apple", "banana", "peach"]})
db.food.insertOne({"_id" : 2, "fruit" : ["apple", "kumquat", "orange"]})
db.food.insertOne({"_id" : 3, "fruit" : ["cherry", "banana", "apple"]})

db.food.find({"fruit" : {"$all" : ["apple", "banana"]}}) // ["cherry", "banana", "apple"], ["apple", "banana", "peach"] // index @2 db.food.find({"fruit.2" : "peach"}) // ["apple", "banana", "peach"]  • "$size" queries for arrays of a given size.

• It cannot be combined with another conditional.
• i.e. db.food.find({"fruit" : {"$size" : 3}}) // returns all of the above • i.e. db.blog.posts.findOne(criteria, {"comments" : {"$slice" : 10}}) limits to the first 10 elements.
• $ operator returns the matching element. It helpful when you do not know the index of the element. • i.e. db.blog.posts.find({"comments.name" : "bob"}, {"comments.$" : 1})
• "$elemMatch" forces MongoDB to compare clauses with a single array element. // For the following documents {"x" : 5}, {"x" : 15}, {"x" : 25}, {"x" : [5, 25]} db.test.find({"x" : {"$gt" : 10, "$lt" : 20}}) // {"x" : 15}, {"x" : [5, 25]} db.test.find({"x" : {"$elemMatch" : {"$gt" : 10, "$lt" : 20}}}) // {}
db.test.find({"x" : {"$gt" : 10, "$lt" : 20}}).min({"x" : 10}).max({"x" : 20}) // {"x" : 15}


Querying on embedded documents

• There are two ways:

1. query for the whole documents
2. query for the individual key/value pairs
• Query documents can contain dots, which mean “reach into an embedded document”.

• i.e. db.people.find({"name.first" : "Joe", "name.last" : "Schmoe"})

$where queries • It allows you to execute arbitrary JavaScript as part of your query. • It should not be used unless strictly necessary because it is much slower than a regular query. • For security, use of “$where” clauses should be highly restricted or eliminated. End users should never be allowed to execute arbitrary “$where” clauses. Cursors • Queries return a database cursor, which lazily returns batches of documents as you need them. • The database returns results from find using a cursor. • Metaoperations on a cursor includes • skipping a certain number of results, • limiting the number of results returned, and • sorting results by any combination of keys in any direction. How to create a cursor 1. Put some documents into a collection. 2. Do a query on them. 3. Assign the results to a local variable. for(i=0; i<100; i++) { db.collection.insertOne({x : i}); } var cursor = db.collection.find(); cursor.forEach(function(x) { print(x.name); }); while (cursor.hasNext()) { obj = cursor.next(); // do stuff }  Covered queries • When an index contains all the values requested by a query, the query is covered. • If your query is only looking for the fields that are included in the index, it does not need to fetch the document. CRUD operations Create • Adding new documents to a collection • MongoDB will add _id key. • <database_name>.<collection_name>.insertOne(<document>), where <document> = { <key>: <value> }. • <database_name>.<collection_name>.insertMany([<document1>, <document2>]) passes an array of documents to the database. • default: true := ordered, but slow Read • <database_name>.<collection_name>.findOne(); • <database_name>.<collection_name>.find(); displays up to 20 documents. • <database_name>.<collection_name>.find().pretty();, where pretty() as a helper function. Update • Updating existing documents • Updating a document is atomic: if two updates happen at the same time, whichever one reaches the server first will be applied, and then the next one will be applied. • <database_name>.<collection_name>.updateOne(<filter_document>, <modifier_document>) • <database_name>.<collection_name>.updateMany(<filter_document>, <modifier_document>) • <database_name>.<collection_name>.replaceOne(<filter_document>, <replace_document>) fully replaces a matching document with a new one. • i.e. db.customers.update({first_name:"Steven"}, {$set:{gender:"male"}}); adds gender: male to the document of which first_name is Steven.

• Your update should always specify a unique document, perhaps by matching on a key like _id.

Update operators

• Special keys that can alter, add, or remove keys, and even manipulate arrays and embedded documents.

• $inc increments the value of the field by the specified amount. • If the field does not yet exist, it will be created. • It can be used only on values of type integer, long, double, or decimal. • $currentDate sets the value of a field to current date, either as a Date or a Timestamp.

• $min only updates the field if the specified value is less than the existing field value. • $max only updates the field if the specified value is greater than the existing field value.

• $mul multiplies the value of the field by the specified amount. • $rename renames a field.

• $set sets the value of a field in a document. • If the field does not yet exist, it will be created. • $setOnInsert sets the value of a field if an update results in an insert of a document.

• It has no effect on update operations that modify existing documents.

Inefficient operators

• In general, negation is inefficient.

Indexing object and arrays

• MongoDB allows you to reach into your documents and create indexes on nested fields and arrays.

Indexing embedded docs

• Indexes can be created on keys in embedded documents in the same way that they are created on normal keys.
• i.e. db.users.createIndex({"<field>.<subfield>" : 1})

Indexing arrays

• Indexing an array field indexes each element of the array, not the array itself.
• Array indexes are more expensive than single-value ones. For a single insert, update, or remove, every array entry might have to be updated.

Multikey index implications

• If any document has an array field for the indexed key, the index immediately is flagged as a multikey index, which may be a bit slower than non-multikey indexes.

Index cardinality

• Cardinality is the number of distinct values for a field in a collection.
• In general, the greater the cardinality of a field, the more helpful an index on that field can be.
• gender := 2, name := N -> name is better.

GridFS

• A protocol for storing large files
• It uses subcollections to store file metadata separately from content chunks.

Geospatial Indexes

1. 2d

• It indexes for points stored on a 2D-plane.
• 2d indexes support both flat geometries and distance-only calculations on spheres.

2. 2dsphere

• It works with spherical geometries that model the surface of the earth based on the WGS84 datum.
• It allows you to specify geometries for points, lines, and polygons in the GeoJSON format.
• Queries using spherical geometries will be more performant and accurate with a 2dsphere index.
• You can create a geospatial index using the “2dsphere” type with createIndex.
• i.e. db.openStreetMap.createIndex({"loc" : "2dsphere"})
// A point is given by a two-element array, representing [longitude, latitude].
{
"name" : "New York City",
"loc" : {
"type" : "Point",
"coordinates" : [50, 2]
}
}

// A line is given by an array of points.
{
"name" : "Hudson River",
"loc" : {
"type" : "LineString",
"coordinates" : [[0,1], [0,2], [1,2]]
}
}

// A polygon is given by an array of points, but with a different "type".
{
"name" : "New England",
"loc" : {
"type" : "Polygon",
"coordinates" : [[0,1], [0,2], [1,2]]
}
}


Types of geospatial queries: intersection, within, nearness

var eastVillage = {
"type" : "Polygon",
"coordinates" : [
[
[ -73.9732566, 40.7187272 ],
[ -73.9724573, 40.7217745 ],
[ -73.9732566, 40.7187272 ]
]
]}

db.openStreetMap.find({"loc" : {"$geoIntersects" : {"$geometry" : eastVillage}}}) // intersection operator
db.openStreetMap.find({"loc" : {"$geoWithin" : {"$geometry" : eastVillage}}}) // within operator
db.openStreetMap.find({"loc" : {"$near" : {"$geometry" : eastVillage}}}) // nearness operator
`