Friday, April 6, 2018

MongoDB Evaluation

Introduction
MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling. MongoDB obviates the need for an Object Relational Mapping (ORM) to facilitate development.

Documents

A record in MongoDB is a document, which is a data structure composed of field and value pairs. MongoDB documents are similar to JSON objects. The values of fields may include other documents, arrays, and arrays of documents.
{
  "_id" : ObjectId("54c955492b7c8eb21818bd09"),
  "address" : {
     "street" : "2 Avenue",
     "zipcode" : "10075",
     "building" : "1480",
     "coord" : [ -73.955741340.7720266 ]
  },
  "borough" : "Manhattan",
  "cuisine" : "Italian",
  "grades" : [
     {
        "date" : ISODate("2014-10-01T00:00:00Z"),
        "grade" : "A",
        "score" : 11
     },
     {
        "date" : ISODate("2014-01-16T00:00:00Z"),
        "grade" : "B",
        "score" : 17
     }
  ],
  "name" : "Vella",
  "restaurant_id" : "41704620"
}

Collections

MongoDB stores documents in collections. Collections are analogous to tables in relational databases. Unlike a table, however, a collection does not require its documents to have the same schema.
In MongoDB, documents stored in a collection must have a unique _id field that acts as a primary key.

Installation

Docker

Easiest way to run MongoDB is to run the Docker container. 
$ docker run --name some-mongo -d mongo
This image includes EXPOSE 27017 (the mongo port), so standard container linking will make it automatically available to the linked containers (as the following examples illustrate).
See configuration details here.

Local

Install MongoDB Community Edition
These documents provide instructions to install MongoDB Community Edition.
Install MongoDB Community Edition and required dependencies on Linux.
Install MongoDB Community Edition on macOS systems from Homebrew packages or from MongoDB archives.
Install MongoDB Community Edition on Windows systems and optionally start MongoDB as a Windows service.

Data Modeling Concepts

Data Model Design

Presents the different strategies that you can choose from when determining your data model, their strengths and their weaknesses.

Embedded Data Models

With MongoDB, you may embed related data in a single structure or document. These schema are generally known as “denormalized” models, and take advantage of MongoDB’s rich documents. Consider the following diagram:
Embedded data models allow applications to store related pieces of information in the same database record. As a result, applications may need to issue fewer queries and updates to complete common operations.
In general, use embedded data models when:

Normalized Data Models

Normalized data models describe relationships using references between documents.
In general, use normalized data models:
      • when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
      • to represent more complex many-to-many relationships.
      • to model large hierarchical data sets.
References provides more flexibility than embedding. However, client-side applications must issue follow-up queries to resolve the references. In other words, normalized data models can require more round trips to the server.
See Model One-to-Many Relationships with Document References for an example of referencing. For examples of various tree models using references, see Model Tree Structures.

Data Model Patterns

Overview

Data in MongoDB has a flexible schemaCollections do not enforce document structure. Decisions that affect how you model data can affect application performance and database capacity.

Data Model Examples and Patterns

See official documentation here.

Modeling Relationships with Embedding and Referencing

Embedding 

Embedding Data with a 1:1 or 1:many relationship (where the “many” objects always appear with, or are viewed in the context of their parent documents) are natural candidates for embedding within a single document. The concept of data ownership and containment can also be modeled with embedding. Using the product data example above, product pricing – both current and historical – should be embedded within the product document since it is owned by and contained within that specific product. If the product is deleted, the pricing becomes irrelevant.
Architects should also embed fields that need to be modified together atomically. (Refer to the Application Integration section of this guide for more information.)
Not all 1:1 and 1:m relationships should be embedded in a single document. Referencing between documents in different collections should be used when:
    •  A document is frequently read, but contains an embedded document that is rarely accessed. An example might be a customer record that embeds copies of the annual general report. Embedding the report only increases the in-memory requirements (the working set) of the collection
    •  One part of a document is frequently updated and constantly growing in size, while the remainder of the document is relatively static
    • The combined document size would exceed MongoDB’s 16MB document limit

Referencing

Referencing enables data normalization, and can give more flexibility than embedding. But the application will issue follow-up queries to resolve the reference, requiring additional round-trips to the server. References are usually implemented by saving the _id field1 of one document in the related document as a reference. A second query is then executed by the application to return the referenced data.
 Referencing should be used:
    • With m:1 or m:m relationships where embedding would not provide sufficient read performance advantages to outweigh the implications of data duplication • Where the object is referenced from many different sources
    • To represent complex many-to-many relationships
    • To model large, hierarchical data sets
The $lookup stage in an aggregation pipeline can be used to match the references with the _ids from the second collection to automatically embed the referenced data in the result set.

MongoDB CRUD Operations

CRUD operations createreadupdate, and delete documents.

Create Operations

Create or insert operations add new documents to a collection. If the collection does not currently exist, insert operations will create the collection.
MongoDB provides the following methods to insert documents into a collection:
In MongoDB, insert operations target a single collection. All write operations in MongoDB are atomic on the level of a single document.
The components of a MongoDB insertOne operations.
For examples, see Insert Documents.
For other operations, see CRUD.

Schema Validation

MongoDB provides the capability to perform schema validation during updates and insertions.

Specify Validation Rules

Validation rules are on a per-collection basis.
To specify validation rules when creating a new collection, use db.createCollection() with the validator option.
To add document validation to an existing collection, use collMod command with the validator option.
MongoDB also provides the following related options:
    • validationLevel option, which determines how strictly MongoDB applies validation rules to existing documents during an update, and
    • validationAction option, which determines whether MongoDB should error and reject documents that violate the validation rules or warn about the violations in the log but allow invalid documents.

JSON Schema

Starting in version 3.6, MongoDB supports JSON Schema validation. To specify JSON Schema validation, use the $jsonSchema operator in your validator expression.
NOTE: JSON Schema is the recommended means of performing schema validation.

For example, the following example specifies validation rules using JSON schema:

db.createCollection("students", {
   validator: {
      $jsonSchema: {
         bsonType: "object",
         required: [ "name", "year", "major", "gpa" ],
         properties: {
            name: {
               bsonType: "string",
               description: "must be a string and is required"
            },
            gender: {
               bsonType: "string",
               description: "must be a string and is not required"
            },
            year: {
               bsonType: "int",
               minimum: 2017,
               maximum: 3017,
               exclusiveMaximum: false,
               description: "must be an integer in [ 2017, 3017 ] and is required"
            },
            major: {
               enum: [ "Math", "English", "Computer Science", "History", null ],
               description: "can only be one of the enum values and is required"
            },
            gpa: {
               bsonType: [ "double" ],
               minimum: 0,
               description: "must be a double and is required"
            }
         }
      }
   }
})
For more information, see $jsonSchema.

Query Expressions

In addition to JSON Schema validation, MongoDB supports validation with query filter expressions using the query operators, with the exception of $near$nearSphere$text, and $where.
For example, the following example specifies validator rules using the query expression:
db.createCollection( "contacts",
   { validator: { $or:
      [
         { phone: { $type: "string" } },
         { email: { $regex: /@mongodb\.com$/ } },
         { status: { $in: [ "Unknown", "Incomplete" ] } }
      ]
   }
} )
SEE ALSO query operators

Spring Data MongoDB

Introduction

The Spring Data MongoDB project provides integration with the MongoDB document database. Key functional areas of Spring Data MongoDB are a POJO centric model for interacting with a MongoDB DBCollection and easily writing a Repository style data access layer.

Features

    • Spring configuration support using Java based @Configuration classes or an XML namespace for a Mongo driver instance and replica sets.
    • MongoTemplate helper class that increases productivity performing common Mongo operations. Includes integrated object mapping between documents and POJOs.
    • Exception translation into Spring’s portable Data Access Exception hierarchy
    • Feature Rich Object Mapping integrated with Spring’s Conversion Service
    • Annotation based mapping metadata but extensible to support other metadata formats
    • Persistence and mapping lifecycle events
    • Low-level mapping using MongoReader/MongoWriter abstractions
    • Java based Query, Criteria, and Update DSLs
    • Automatic implementation of Repository interfaces including support for custom finder methods.
    • QueryDSL integration to support type-safe queries.
    • Cross-store persistence - support for JPA Entities with fields transparently persisted/retrieved using MongoDB
    • Log4j log appender
    • GeoSpatial integration
    • Map-Reduce integration
    • JMX administration and monitoring
    • CDI support for repositories
    • GridFS support
The recommended way to get started using spring-data-mongodb in your project is with a dependency management system – the snippet below can be copied and pasted into your build. Need help? See our getting started guides on building with Maven and Gradle.
<dependencies>
   <dependency>
       <groupId>org.springframework.data</groupId>
       <artifactId>spring-data-mongodb</artifactId>
       <version>2.0.4.RELEASE</version>
   </dependency>
</dependencies>

Mapping annotation overview

The MappingMongoConverter can use metadata to drive the mapping of objects to documents. An overview of the annotations is provided below
    • @Id - applied at the field level to mark the field used for identiy purpose.
    • @Document - applied at the class level to indicate this class is a candidate for mapping to the database. You can specify the name of the collection where the database will be stored.
    • @DBRef - applied at the field to indicate it is to be stored using a com.mongodb.DBRef.
    • @Indexed - applied at the field level to describe how to index the field.
    • @CompoundIndex - applied at the type level to declare Compound Indexes
    • @GeoSpatialIndexed - applied at the field level to describe how to geoindex the field.
    • @Transient - by default all private fields are mapped to the document, this annotation excludes the field where it is applied from being stored in the database
    • @PersistenceConstructor - marks a given constructor - even a package protected one - to use when instantiating the object from the database. Constructor arguments are mapped by name to the key values in the retrieved DBObject.
    • @Value - this annotation is part of the Spring Framework . Within the mapping framework it can be applied to constructor arguments. This lets you use a Spring Expression Language statement to transform a key's value retrieved in the database before it is used to construct a domain object.
More details here.

MongoDBRepository

Spring Data MongoDB focuses on storing data in MongoDB. It also inherits functionality from the Spring Data Commons project, such as the ability to derive queries. Essentially, you don’t have to learn the query language of MongoDB; you can simply write a handful of methods and the queries are written for you.
package hello;
import java.util.List;
import org.springframework.data.mongodb.repository.MongoRepository;
public interface CustomerRepository extends MongoRepository<Customer, String>
Unknown macro: { public Customer findByFirstName(String firstName); public List<Customer> findByLastName(String lastName); }
CustomerRepository extends the MongoRepository interface and plugs in the type of values and id it works with: Customer and String. Out-of-the-box, this interface comes with many operations, including standard CRUD operations (create-read-update-delete).
You can define other queries as needed by simply declaring their method signature. In this case, you add findByFirstName, which essentially seeks documents of type Customer and finds the one that matches on firstName.
You also have findByLastName to find a list of people by last name.

Examples

MongoDB UI

Overview

Some UI UI tools exist so you can manage your DB operation thru a client instead of using the MongoDB shell. Here is a couple of them:

MongoDB Compass Community

MongoDB Compass Community is a free tool for developing with MongoDB and includes a subset of the features of Compass. It allows you to:
      • View, add, and delete databases and collections
      • View and interact with documents with full CRUD functionality
      • Build and run ad hoc queries
      • View and optimize query performance with visual explain plans
      • Manage indexes: view stats, create, and delete

Intellij Mongo Plugin:

MongoDB Limits and Thresholds

MongoDB has some limitations:
This document provides a collection of hard and soft limitations of the MongoDB system.

Auditing

MongoDB Enterprise supports auditing of various operations. MongoDB community edition does not. A complete auditing solution must involve all mongod server and mongos router processes.

Account Management Plugin schema

This ERD diagram represent the Account Management schema with all the microservices schemas.
 PDF

MongoDB vs RDBMS

RDBMS (SQL Database)
MongoDB (NoSQL Database)
Relational databaseNon-relational and document-oriented database
Need to design your tables, data structure, relations first, then only you can start coding.You can start coding without worrying about tables. You can modify your objects at a lesser cost of development.
Supports SQL query languageSupports JSON query language also
Does not provide JavaScript client for queryingProvides JavaScript client for querying
Table basedCollection based and key-value pair
Row basedDocument based
Column basedField based
IndexingIndexing
Primary KeyPrimary Key (Default key _id provided by MongoDB itself)
Group ByAggregation
Not that easy to set up.Comparitively easy to set up and get it running. It’s Java client is also very simple.
Support foreign keyNo support for the foreign key. But if you need these type of constraint, you have to handle it in the code itself which is a bit complex.
Support for JoinsNo support for Joins. But you can change your document structure and embed the other document inside the first document. But keep in mind that MongoDB has maximum document size of 16MB.
Support for triggersNo Support for triggers
Provides very fine granularity of lockingProvides only one level of locking (i.e. document (row) level). In the previous version of MongoDB(2), it supported collection (table) level locking.
Contains schema which is predefinedContains dynamic schema
Not suitable for hierarchical data storageBest suitable for hierarchical data storage
Vertically scalable – increasing RAMHorizontally scalable – add more servers (i.e Sharding)
SQL injection vulnerabilityUnaffected by SQL injection
Emphasizes on ACID properties (Atomicity, Consistency, Isolation and Durability)Emphasizes on CAP theorem (Consistency, Availability, and Partition tolerance)
Slower as compared to NoSQL databasesMongoDB is almost 100 times faster than traditional database systems.
Using a NoSQL database like MongoDB will provide the following:

Pros

      • Document oriented
      • High performance
      • High availability — Replication
      • High scalability – Sharding
      • Dynamic — No rigid schema.
      • Flexible – field addition/deletion have less or no impact on the application
      • Heterogeneous Data
      • No Joins
      • Distributed
      • Data Representation in JSON or BSON
      • Geospatial support
      • Easy Integration with BigData Hadoop
      • Document-based query language that’s nearly as powerful as SQL
      • Cloud distributions such as AWS, Microsoft, RedHat,dotCloud and SoftLayer etc:-. In fact, MongoDB is built for the cloud. Its native scale-out architecture, enabled by ‘sharding,’ aligns well with the horizontal scaling and agility afforded by cloud computing.

Cons

      • A downside of NoSQL is that most solutions are not as strongly ACID-compliant (Atomic, Consistency, Isolation, Durability) as the more well-established RDBMS systems.
      • Complex transaction
      • No function or stored procedure exists where you can bind the logic

Conclusion

One of the main advantages to use Non-Relational database are:
      • Flexible Data Model. Unlike relational databases, NoSQL databases easily store and combine any type of data, both structured and unstructured. You can also dynamically update the schema to evolve with changing requirements and without any interruption or downtime to your application.
      • Elastic Scalability. NoSQL databases scale out on low cost, commodity hardware, allowing for almost unlimited growth.
      • High Performance. NoSQL databases are built for great performance, measured in terms of both throughput and latency. (MongoDB is almost 100 times faster than traditional database systems.)

Model

Based on the ERD diagram above, we can notice that all microservices (JenkinsService, BambooService, GitlfsService, SeleniumBoxService, DedicatedJenkinsService,SonarQubeService) are fairly straightforward and contains only one or two tables. They are perfect candidates for denormalization and should benefit from it.
What we store in the database is mainly a JSON description of how the service is structured with no relationship with other entities thus making it a great fit for MongoDB.
ArtifactoryService is a bit more complex but we have only a couple of one to many relationships to handle, therfore I don't think that would be an issue. If that doesn't prove to be the case, Artifactory could use a RDBMS database as every microservice own it's onw database. The Spring implementation is fairly similar in both cases. 

Transaction

The main drawback with MongoDB is that it doesn't handle Transaction management very well, at least not until Version 4.0 is available.
In my opinion, if we have to create multiple collections (think tables) with a lot of dependencies(joins) between them and have to update different collections at the same time, then it means that the denormalization process is unsuccessful and that  this schema shouldn't be used on a NoSQL database. 
in our case, it isn't an issue as the services' schemas are very limited in scope.

Scalabilty

The fact that you don't have any schema to manage allow us to get rid of the all the complexity of keeping those schemas in sync. No need to use LiquideBase to track the different versions in place, thus it will be easier to install new instances when needed.

References

 PDF

No comments:

Post a Comment