Popular NoSql DB

NO SQL DBs

Use Case :

If application requires low latency.
If datas are unstructured, or do not have any relational datas.
If only need to serialize and deserialize data (JSON, XML, YAML, etc.).
if need to store a massive amount of data.

1. CouchDB: CouchDB uses multiple formats and protocols to store, transfer, and process its data. It uses JSON to store data, JavaScript as its query language using MapReduce, and HTTP for an API.

Couchbase	MongoDB
Multi-document ACID transactions	Extremely limited	Yes
Ability to choose your own shard key and refine it at any time without downtime	No	Yes
Continuous backup with cross cluster consistency and point in time recovery	No	Yes
Client Side Field Level Encryption	No Requires significant application code, limited data type support, no enforcement, no queries	Yes
Built-in full-text search powered by Lucene	No	Yes
Partner ecosystem	~100	1k+
Performance	Fast compared to competitors like Oracle	Fast compared to competitors like Oracle
Supported languages	~ 10 Including: JavaScript C Go .NET Python	20+ Including: JavaScript .NET Python Go Java PHP Ruby Scala

Note : credit : https://www.mongodb.com/

2. Neo4j :

3. Cassandra :

4. HBase :

Amazon DynamoDB :

DynamoDB is basically a key-value store. It can be as a hash-map backed by some persistent storage. Two most important operations supported by DynamoDB are Get and Put.
Put(“key”, “value”);
Get(“Key”) → Returns “value”;

DynamoDB uses a cluster of machines and each machine is responsible for storing a portion of the data in its disks. When a machine is added to the DynamoDB cluster, it is assigned an integer value randomly.
For example, we assume the DynamoDB has 3 machines.
A is assigned a token of 10, B 15 and C 20

When we insert a key value pair to the DynamoDB, the key is first hashed into an integer. Then the key-value pair is stored on the machine. Keys that are hashed to (20, 2⁶⁴-1] and [0, 10] are stored on machine A. Machine B therefore stores keys whose hash values are between 10 and 15. The others are stored on machine C.

Replication

If the data is stored on only one machine, if the machine crashes or the disk is corrupted, the data is lost. When the machine is down, we can’t access the data. To resolve this issue, DynamoDB replicates the data N times. Lets assume N=3, which is commonly used in AWS.

Each machine has complete knowledge of the hash of all machines inside the cluster. For example, all machines know that A has a hash value of 10. When a machine receives a request to persist data, it will forward the request to other two machines that have the next two greater Hash.

Machine A considers the data got persisted after it receives acknowledgement from both machine B and C. It then responds to the client. However, this has an issue.

To resolve the issue, DynamoDB doesn’t require ack from all N machines.

It only requires M machines to write the data to the disk. M is normally set to 2 in AWS. Machine A returns to the client when 2 of the following have happened

1. Write to disk inside A succeeds
2. Received ack from B
3. Received ack from C

This approach reduces write latency, however it introduces another issue. Imagine a case that machine A got the ack from B and C abut it failed to write in its own local disk. Then when client makes the get request, A would return a false result since its local disk doesn’t have the updated data.

Thus, DynamoDB further requires machine A to get copies of data and return the latest copy to the client. Read +Write should be greater than N and we can guarantee the user will always see the latest data. R is normally set to 2. In our case, machine A is required to get a copy of the data from at least one of the machine B and C and return the latest value to the client. Therefore the correct value will be returned.

Multi-masters and Circular replication

JavaOnEase

Search This Blog

Popular NoSql DB

Comments

Post a Comment

Popular posts from this blog

How to create Annotation in Spring boot

Kafka And Zookeeper SetUp

Cache Policy