From Postgres to Riak
a success story
The journey from relational
database to Riak
Created by Johan Sommerfeld / @s2hc
/me
- Electrical M.Sc with focus on signal theory
- Background in both operations and development
- Manic about availability, scalability and automation ...
- ... modular and distributed systems
- Founded S2HC Sweden in 2010
/client
- Founded in 1981
-
Building hardware and software for measurement systems,
used in big constructions project around the world
- Early adopters of remote measuring
The old setup
- PC program
- Project management
- Hardware management
- Reports (pdf, txt)
The new system
An online, multitenant, management and reporting system
Challenges / requirements
- Sensor data
- Chunks
- Unpredictable amount
- Unpredictable delivery (time) to the backend
- Multitenant
- High availability
- Scale
Usage: overview
Usage: details
Usage: status / control
Usage: integration
overview
How hard can it be?
Problems
- Single point of failure
- Scaling
- Resource sharing
Problems | measurements
Solutions
- Big servers
- Sharding
- SAN
- NoSQL (cassandra, mongodb, riak) ... winner!
Big servers
- Lots of db black magic optimization
- No need to rewrite logic
- Limited scaling
- No HA
Sharding
- New logic needed
- Good scalability (if done right)
- No HA
SAN
- Only scales storage
- Good redundancy on data
NoSQL
- Lots of new logic
- A bit uncomfortable (at first)
- HA, hopefully
- Scales, hopefully
- Super fast, hopefully
NoSQL, the showdown
As a small company, we didn't have the luxury of creating a full stack prototype for each database.
Settled for theory, concepts and ... gut feeling
- Feature set, ease from relational database
- Scaling
- HA
The ring / art of distribution
- Consistent hash f(key) -> hash
- Hash keyspace: 2 160
- Vnodes: gets even amount of keyspace
- Vnodes: power of 2 (... 8, 16, 32, 64, ... 1024)
The ring / art of distribution
-
vnodes: 32,
nodes: 3
-
n = 3, vnode, vnode + 1, vnode + 2
NoSQL, the hype (getting started)
- We have a pure relational data model!?
- We have to have pagination!?
- Think way outside your box
know your data
The data flow
Initial extraction design
- user selects project, measuring points, and time span
- prep, get serial numbers and make two types of queries
- 1. query on chunks, getting meta data (unit, interval time etc), analyze step
- 2. query on the data, data process step
- sql:
select ... where serial and range;
The data flow
Initial extraction design
The data flow
first riak extraction design
The data flow
conditions
- Incoming and outgoing requests separated
- Immutable data
- Deterministic in time
Gives
- Known keys
- Fast read,
r=1
The data flow
Storage
- Slice per day
- 1s data
- Store time, value and flags as json
- Store as list instead of object
# 60 * 60 * 24 = 86 400
{"time": 16315353989120, "value": 24.54592940, "flags": 250} # ~5.8M
[16315353989120, 24.54592940, 250] # ~2.9M
The data flow
Storage part 2
- Slice per hour <10s, else per day
- Meta data (unit, interval time, etc) is sent less frequently
- Store meta data in its own key + existing data keys
header_key = "{{company_id}}-{{serial}}-{{channel}}-2015-11-04
data_key = "{{company_id}}-{{serial}}-{{channel}}-" +\
"{{interval_time}}-2015-11-04-00"
Good wants better
- Now that we're fast, what else can we do
- Searches of
20+ sensors
with 5s between measurements
and 1 year time span
- Huge memory consumption when fetching all data
- Not usable in average browser
20 * (60 * 60 / 5) * 24 * 365 = 126 144 000 # data points
20 * 365 = 7 300 # meta keys to fetch from riak
20 * 24 * 365 = 175 200 # data keys to fetch from riak
The data flow
New extraction
Develop and scale
PostgreSQL vs riak
- Input vs output
- Auto increment, unique key
- Ranges, list resources
- Updates, isolation
Riak and the system
- Yesterday riak was a necessity
- Today we have both data and user input in riak
-
Tomorrow we have written a notification system using riak_core and erlang.
Maybe something for #RICON16
-
The day after that we have moved all critical parts into riak, at that point
the PostgreSQL database will be <50MB