Skip to main content

Popular NoSql DB

NO SQL DBs

 

Use Case :

  • If application requires low latency.

  • If datas are unstructured, or do not have any relational datas.

  • If only need to serialize and deserialize data (JSON, XML, YAML, etc.).

  • if need to store a massive amount of data.

1. CouchDB: CouchDB uses multiple formats and protocols to store, transfer, and process its data. It uses JSON to store data, JavaScript as its query language using MapReduce, and HTTP for an API.




CouchbaseMongoDB
Multi-document ACID transactionsExtremely limitedYes
Ability to choose your own shard key and refine it at any time without downtimeNoYes
Continuous backup with cross cluster consistency and point in time recoveryNoYes
Client Side Field Level EncryptionNo
Requires significant application code, limited data type support, no enforcement, no queries
Yes
Built-in full-text search powered by Lucene
NoYes
Partner ecosystem~1001k+
PerformanceFast compared to competitors like OracleFast compared to competitors like Oracle
Supported languages~ 10 Including:
JavaScript
C
Go
.NET
Python

20+ Including:
JavaScript
.NET
Python
Go
Java
PHP
Ruby
Scala

Note : credit : https://www.mongodb.com/

2. Neo4j : 

3. Cassandra :

4. HBase :

Amazon DynamoDB :  

DynamoDB is basically a key-value store. It can be as a hash-map backed by some persistent storage. Two most important operations supported by DynamoDB are Get and Put.
Put(“key”, “value”);
Get(“Key”) → Returns “value”;

DynamoDB uses a cluster of machines and each machine is responsible for storing a portion of the data in its disks. When a machine is added to the DynamoDB cluster, it is assigned an integer value randomly.
For example, we assume the DynamoDB has 3 machines.
A is assigned a token of 10, B 15 and C 20


When we insert a key value pair to the DynamoDB, the key is first hashed into an integer. Then the key-value pair is stored on the machine. Keys that are hashed to (20, 2⁶⁴-1] and [0, 10] are stored on machine A. Machine B therefore stores keys whose hash values are between 10 and 15. The others are stored on machine C.

Replication

If the data is stored on only one machine, if the machine crashes or the disk is corrupted, the data is lost. When the machine is down, we can’t access the data. To resolve this issue, DynamoDB replicates the data N times. Lets assume N=3, which is commonly used in AWS.

Each machine has complete knowledge of the hash of all machines inside the cluster. For example, all machines know that A has a hash value of 10. When a machine receives a request to persist data, it will forward the request to other two machines that have the next two greater Hash.

Machine A considers the data got persisted after it receives acknowledgement from both machine B and C. It then responds to the client. However, this has an issue. 

To resolve the issue, DynamoDB doesn’t require ack from all N machines. 
It only requires M machines to write the data to the disk. M is normally set to 2 in AWS.  Machine A returns to the client when 2 of the following have happened

1. Write to disk inside A succeeds
2. Received ack from B
3. Received ack from C

This approach reduces write latency, however it introduces another issue. Imagine a case that machine A got the ack from B and C abut it failed to write in its own local disk. Then when client makes the get request, A would return a false result since its local disk doesn’t have the updated data.

Thus, DynamoDB further requires machine A to get copies of data and return the latest copy to the client. Read +Write should be greater than N and we can guarantee the user will always see the latest data. R is normally set to 2. In our case, machine A is required to get a copy of the data from at least one of the machine B and C and return the latest value to the client. Therefore the correct value will be returned.

Multi-masters and Circular replication

Comments

Popular posts from this blog

How to create Annotation in Spring boot

 To create Custom Annotation in JAVA, @interface keyword is used. The annotation contains :  1. Retention :  @Retention ( RetentionPolicy . RUNTIME ) It specifies that annotation should be available at runtime. 2. Target :  @Target ({ ElementType . METHOD }) It specifies that the annotation can only be applied to method. The target cane be modified to:   @Target ({ ElementType . TYPE }) for class level annotation @Target ({ ElementType . FIELD }) for field level annotation @Retention ( RetentionPolicy . RUNTIME ) @Target ({ ElementType . FIELD }) public @ interface CustomAnnotation { String value () default "default value" ; } value attribute is defined with @ CustomAnnotation annotation. If you want to use the attribute in annotation. A single attribute value. Example : public class Books {           @CustomAnnotation(value = "myBook")     public void updateBookDetail() {         ...

Kafka And Zookeeper SetUp

 Kafka And Zookeeper SetUp zookeeper download Link : https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-3.8.3/apache-zookeeper-3.8.3-bin.tar.gz Configuration: zoo.conf # The number of milliseconds of each tick tickTime =2000 # The number of ticks that the initial # synchronization phase can take initLimit =10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit =5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir =/tmp/zookeeper # the port at which the clients will connect clientPort =2181 4 char whitelist in command arguments 4lw.commands.whitelist =* Start ZooKeeper Server $ bin/zkServer.sh start Check zookeeper status dheeraj.kumar@Dheeraj-Kumar bin % echo stat | nc localhost 2181 stat is 4 character whitelisted argument  Check Kafka running status : echo dump | nc localhost 2181 | grep broker Responsibility of Leader in Zookeeper: 1. Distrib...

Cache Policy

Cache policies determine how data is stored and retrieved from a cache, which is a small and fast storage area that holds frequently accessed data to reduce the latency of accessing that data from a slower, larger, and more distant storage location, such as main memory or disk. Different cache policies are designed to optimize various aspects of cache performance, including hit rate, latency, and consistency. Here are some common types of cache policies: Least Recently Used (LRU): LRU is a commonly used cache replacement policy. It evicts the least recently accessed item when the cache is full. LRU keeps track of the order in which items were accessed and removes the item that has not been accessed for the longest time. First-In-First-Out (FIFO): FIFO is a simple cache replacement policy. It removes the oldest item from the cache when new data needs to be stored, regardless of how frequently the items have been accessed. Most Recently Used (MRU): MRU removes the most recently accessed ...