Skip to main content

Popular NoSql DB

NO SQL DBs

 

Use Case :

  • If application requires low latency.

  • If datas are unstructured, or do not have any relational datas.

  • If only need to serialize and deserialize data (JSON, XML, YAML, etc.).

  • if need to store a massive amount of data.

1. CouchDB: CouchDB uses multiple formats and protocols to store, transfer, and process its data. It uses JSON to store data, JavaScript as its query language using MapReduce, and HTTP for an API.




CouchbaseMongoDB
Multi-document ACID transactionsExtremely limitedYes
Ability to choose your own shard key and refine it at any time without downtimeNoYes
Continuous backup with cross cluster consistency and point in time recoveryNoYes
Client Side Field Level EncryptionNo
Requires significant application code, limited data type support, no enforcement, no queries
Yes
Built-in full-text search powered by Lucene
NoYes
Partner ecosystem~1001k+
PerformanceFast compared to competitors like OracleFast compared to competitors like Oracle
Supported languages~ 10 Including:
JavaScript
C
Go
.NET
Python

20+ Including:
JavaScript
.NET
Python
Go
Java
PHP
Ruby
Scala

Note : credit : https://www.mongodb.com/

2. Neo4j : 

3. Cassandra :

4. HBase :

Amazon DynamoDB :  

DynamoDB is basically a key-value store. It can be as a hash-map backed by some persistent storage. Two most important operations supported by DynamoDB are Get and Put.
Put(“key”, “value”);
Get(“Key”) → Returns “value”;

DynamoDB uses a cluster of machines and each machine is responsible for storing a portion of the data in its disks. When a machine is added to the DynamoDB cluster, it is assigned an integer value randomly.
For example, we assume the DynamoDB has 3 machines.
A is assigned a token of 10, B 15 and C 20


When we insert a key value pair to the DynamoDB, the key is first hashed into an integer. Then the key-value pair is stored on the machine. Keys that are hashed to (20, 2⁶⁴-1] and [0, 10] are stored on machine A. Machine B therefore stores keys whose hash values are between 10 and 15. The others are stored on machine C.

Replication

If the data is stored on only one machine, if the machine crashes or the disk is corrupted, the data is lost. When the machine is down, we can’t access the data. To resolve this issue, DynamoDB replicates the data N times. Lets assume N=3, which is commonly used in AWS.

Each machine has complete knowledge of the hash of all machines inside the cluster. For example, all machines know that A has a hash value of 10. When a machine receives a request to persist data, it will forward the request to other two machines that have the next two greater Hash.

Machine A considers the data got persisted after it receives acknowledgement from both machine B and C. It then responds to the client. However, this has an issue. 

To resolve the issue, DynamoDB doesn’t require ack from all N machines. 
It only requires M machines to write the data to the disk. M is normally set to 2 in AWS.  Machine A returns to the client when 2 of the following have happened

1. Write to disk inside A succeeds
2. Received ack from B
3. Received ack from C

This approach reduces write latency, however it introduces another issue. Imagine a case that machine A got the ack from B and C abut it failed to write in its own local disk. Then when client makes the get request, A would return a false result since its local disk doesn’t have the updated data.

Thus, DynamoDB further requires machine A to get copies of data and return the latest copy to the client. Read +Write should be greater than N and we can guarantee the user will always see the latest data. R is normally set to 2. In our case, machine A is required to get a copy of the data from at least one of the machine B and C and return the latest value to the client. Therefore the correct value will be returned.

Multi-masters and Circular replication

Comments

Popular posts from this blog

How to create Annotation in Spring boot

 To create Custom Annotation in JAVA, @interface keyword is used. The annotation contains :  1. Retention :  @Retention ( RetentionPolicy . RUNTIME ) It specifies that annotation should be available at runtime. 2. Target :  @Target ({ ElementType . METHOD }) It specifies that the annotation can only be applied to method. The target cane be modified to:   @Target ({ ElementType . TYPE }) for class level annotation @Target ({ ElementType . FIELD }) for field level annotation @Retention ( RetentionPolicy . RUNTIME ) @Target ({ ElementType . FIELD }) public @ interface CustomAnnotation { String value () default "default value" ; } value attribute is defined with @ CustomAnnotation annotation. If you want to use the attribute in annotation. A single attribute value. Example : public class Books {           @CustomAnnotation(value = "myBook")     public void updateBookDetail() {         ...

Kafka And Zookeeper SetUp

 Kafka And Zookeeper SetUp zookeeper download Link : https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-3.8.3/apache-zookeeper-3.8.3-bin.tar.gz Configuration: zoo.conf # The number of milliseconds of each tick tickTime =2000 # The number of ticks that the initial # synchronization phase can take initLimit =10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit =5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir =/tmp/zookeeper # the port at which the clients will connect clientPort =2181 4 char whitelist in command arguments 4lw.commands.whitelist =* Start ZooKeeper Server $ bin/zkServer.sh start Check zookeeper status dheeraj.kumar@Dheeraj-Kumar bin % echo stat | nc localhost 2181 stat is 4 character whitelisted argument  Check Kafka running status : echo dump | nc localhost 2181 | grep broker Responsibility of Leader in Zookeeper: 1. Distrib...

Auto retries in REST API clients On Java On Ease

  Writing REST clients to consume API endpoints has become commonplace. While consuming REST endpoints, we sometimes end up in a situation where a downstream service throws some kind of transient error that goes away when the API call is retried. In such situations, we ask ourselves — “What if my API client was smart enough that knew how to retry a failed call?” Some of us go the extra mile to implement our own custom code that can retry API calls on error. But mind you, it is not only about knowing how to retry. The client also has to know when to retry and when not to. If the error is irrecoverable such as 400 — Bad Request, there is no point in retrying. It might also have to know how to back off and how to recover from the error. Implementing all this by hand, and then repeating it over and over again in every API client is cumbersome. It also adds a lot of boilerplate code and makes things even worse. But if you are a Spring/Spring Boot developer, you will be surprised to know...