NoSQL

How to choose the right one ?

Rémi Alvado / @remialvado / Shopping Adventure

Heads Up

NoSQL does not mean :

We don't want to work with SQL anymore

but :

We don't want to work only with SQL

Main features

  1. Data warehouse
  2. Search Engine
  3. Big data processing

Storage Format

Multi column

id street postcode city

Document based


<?xml version="1.0" encoding="UTF-8" ?>
<address>
  <city postcode="38000">Grenoble
  <street>85 rue des Alliés
</address>
						

{
  "city": {
    "postcode": "38000",
    "value" : "Grenoble"
  },
  "street": "85 rue des Alliés"
}
						

address:
  city:
    postocode: 38000
    value: Grenoble
  street: 85 rue des alliés
						

Data structure

Your data are well structured and never mutate ? Prefer Schema !

You data structure is evolving all the time to match new requirements ? Consider Schema-less datawarehouses.

Data Storage

Master Slave

A
B
C
D
E
S1

Master group

E
S2
E
S3
E
S4
S5

Slaves group

Data Storage

Distributed with master

A
M
PV
S
S1

Master group

D
GW
I
N
S2
B
CX
L
O
S3
E
H
K
Q
S4
F
J
R
T
S5
P
G
C
U
S6

Slaves group

Data Storage

Distributed without master (masterless)

A
M
V
S
S1
D
W
I
N
S2
B
X
L
O
S3
E
H
K
Q
S4
F
J
R
T
S5
P
G
C
U
S6

Data Storage

Distributed with replication

A
F
H
L
S1
B
E
I
K
S2
B
A
G
J
S3
F
C
J
K
S4
G
J
I
D
S5
D
E
H
L
S6

Main features

  1. Data warehouse
  2. Search Engine
  3. Big data processing

Search data using an index

Always an access by one primary key. Key can be user or system defined.

Possibility to setup multiple secondary indexes. Comes mainly with a storage cost.

Indexing process

Fully manual : developers are using an indexer API to define what should be indexed

Schema driven : a data schema is helping the NoSQL database to perform indexation

Automatic : the NoSQL database tries to guess what is the best type for each element (column or node)

Synchrone or Asychrone

Near realtime (less than 20ms) or... quite slow (more than 3-4 seconds)

Main features

  1. Data warehouse
  2. Search Engine
  3. Big data processing

Regular processing

Search for uniq vowels
Filter
Processing
A
F
H
L
S1
B
E
I
K
S2
B
A
G
J
S3
F
C
J
K
S4
G
J
I
D
S5
D
E
H
L
S6

Map Reduce

Search for uniq vowels
Filter
Processing
A
F
H
L
S1
B
E
I
K
S2
B
A
G
J
S3
F
C
J
K
S4
G
J
I
D
S5
D
E
H
L
S6

Map Reduce

Only supported by a few NoSQL Engine so far but this is a cool feature !

Multiple supported languages : Java, Javascript, Erlang, ...

Map phase can use some extra features like Search, key filtering, ...

Summary

Many supported features related to big data management : storage, search, processing, ...

Performances are really great for most of them : it's not the only way to make your choice

You can use multiple NoSQL frameworks at the same time. Eg : Yokozuna, MongoDB Connector, ...

Questions ?

Fork me on GitHub