TimescaleDB

TimescaleDB: An open-source database built for analyzing time-series data with the power and convenience of SQL — on premise, at the edge or in the cloud.

Demo stand

while investigatinve everythin else, to not become a bottle neck created an digital ocean vm with server

  • hostname: 167.172.169.230
  • username (ssh): ranhcer
  • username (sql): postgres
  • password: xxxx
docker run -d --name=db --restart=always -p 5432:5432 -e POSTGRES_PASSWORD="rtB7aYaFF5WBx8cL" -v $PWD/data:/var/lib/postgresql/data timescale/timescaledb:1.7.0-pg12

in case of issues just connect via ssh and restart container

for convinience pgadmin is also added

mkdir pgadmin
sudo chown -R 5050:5050 pgadmin
docker run -d --name=pgadmin --restart=always -p 80:80 -e PGADMIN_DEFAULT_EMAIL=svc@rabota.ua -e PGADMIN_DEFAULT_PASSWORD=xxxxx -v $PWD/pgadmin:/var/lib/pgadmin dpage/pgadmin4

Seed database with sample data

In oficial docs there are articles with examples - Tutorial: How to simulate a basic IoT sensor dataset on PostgreSQL or TimescaleDB

Example with containers:

003_seed.sh

#!/bin/bash

psql --username=postgres --command="CREATE database tutorial"
psql --username=postgres --dbname=tutorial --command="CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;"

psql --username=postgres --dbname=tutorial --command="
-- crate conditions table
CREATE TABLE conditions (
  time        TIMESTAMPTZ       NOT NULL,
  location    TEXT              NOT NULL,
  temperature DOUBLE PRECISION  NULL,
  humidity    DOUBLE PRECISION  NULL
);

-- create hypertable
SELECT create_hypertable('conditions', 'time');

-- seed conditions table with million rows
insert into conditions (time, location, temperature, humidity)
select
    now() - INTERVAL '1 min' * (1000000 - i) AS time,
    (array['office', 'home', 'garage'])[floor(random() * 3 + 1)] AS location,
    floor(random() * (100 - 1 + 1) + 1) AS temperature,
    floor(random() * (100 - 1 + 1) + 1) AS humidity
from generate_series(1, 1000000) s(i);
"

docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v /d/OneDrive/Desktop/timescaledocs/003_seed.sh:/docker-entrypoint-initdb.d/003_seed.sh timescale/timescaledb:1.7.0-pg12

tutorial database will be created in running container with conditions table containing 1M of rows

Clients

List of clients from official wiki

mkdir pgadmin
sudo chown -R 5050:5050 pgadmin
docker run -d --name=pgadmin --restart=always -p 80:80 -e PGADMIN\_DEFAULT\_EMAIL=svc@rabota.ua -e PGADMIN\_DEFAULT\_PASSWORD=xxxxx -v $PWD/pgadmin:/var/lib/pgadmin dpage/pgadmin4

psql, pgcli

pgcli - console, open source client

Configuration

Official docs recommends use built in feature.

While starting timescaledb-tune command is ran, which will configure everything with recommended settings, you can always overridy any of settings by passing them as arguments

for example shared_buffer, by default will be 128mb

docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password  -e NO_TS_TUNE=y timescale/timescaledb:1.7.0-pg12
docker exec -it db psql --username=postgres -c "select name, (cast(setting as int) * 8) / 1024 as mb from pg_settings where name = 'shared_buffers';"

timescaledb-tune on may machine sets it to 1.5gb

docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12
docker exec -it db psql --username=postgres -c "select name, (cast(setting as int) * 8) / 1024 as mb from pg_settings where name = 'shared_buffers';"

override example:

docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12 postgres -cshared_buffers=100MB
docker exec -it db psql --username=postgres -c "select name, (cast(setting as int) * 8) / 1024 as mb from pg_settings where name = 'shared_buffers';"

to see what exactly does timescaledb-tune run:

docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password  -e NO_TS_TUNE=y timescale/timescaledb:1.7.0-pg12
docker exec -it db timescaledb-tune

exmple of recommended settings

Recommendations based on 5.81 GB of available memory and 4 CPUs for PostgreSQL 12

Category Setting Old New
memory shared_buffers 128MB 1487MB
memory effective_cache_size N/A 4462MB
memory maintenance_work_mem N/A 761648kB
memory work_mem N/A 3808kB
parallelism timescaledb.max_background_workers N/A 8
parallelism max_worker_processes 8 15
parallelism max_parallel_workers_per_gather N/A 2
parallelism max_parallel_workers N/A 4
wal wal_buffers N/A 16MB
wal min_wal_size 80MB 512MB
miscellaneous default_statistics_target N/A 500
miscellaneous random_page_cost N/A 1.1
miscellaneous checkpoint_completion_target N/A 0.9
miscellaneous max_locks_per_transaction N/A 64
miscellaneous autovacuum_max_workers N/A 10
miscellaneous autovacuum_naptime N/A 10
miscellaneous effective_io_concurrency N/A 200

Additional settings

While starting, there is a check that verifies if data folder exists, and if so - tune is skipped, so we need to rerun tune on each new erver, and settings may be edited in $PGDATA/postgresql.conf

Disk Write Settings

synchronous_commit = 'off' - turn off symc commit, will spead up write, risking to lose uncommited data - in our case makes sense

with sync commit i was able to get around 7K rps, with it disabled - 39K

Here is an example of how check was done:

Пример которым проверял

# experiment 1: start database with sunchronous commit on
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12

# experiment 2: start database with sunchronous commit off
# docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12 -csynchronous_commit=off

# check current setting value
docker exec -it db psql --username=postgres -c "select name, setting from pg_settings where name = 'synchronous_commit'"

# create table
docker exec -it db psql --username=postgres -c "CREATE TABLE conditions (
    time        TIMESTAMPTZ       NOT NULL,
    location    TEXT              NOT NULL,
    temperature DOUBLE PRECISION  NULL,
    humidity    DOUBLE PRECISION  NULL
);"

# create hypertable
docker exec -it db psql --username=postgres -c "SELECT create_hypertable('conditions', 'time');"

echo "insert into conditions (time, location, temperature, humidity) values (now(), (array['office', 'home', 'garage'])[floor(random() * 3 + 1)], floor(random() * (100 - 1 + 1) + 1), floor(random() * (100 - 1 + 1) + 1))" > bench.sql
docker cp bench.sql db:/var/lib/postgresql/bench.sql

# run benchmark
docker exec -it db pgbench --username=postgres --progress=2 --client=10 --time=60 --jobs=2 --no-vacuum --file=/var/lib/postgresql/bench.sql

Lock Settings

Recommended settings is max_locks_per_transaction = 2 * num_chunks, where num_chunks - max number of chunks in single hypertable, if indexes are used this number should be doubled

pgtune

pgtune - online tool for tuning, gives approximate same results as timescaledb-tune (i have set that we are expecting around 500 connections):

Setting Original Recommended Description
max_connections 100 500 docs
effective_io_concurrency 200 200 docs
min_wal_size 512MB 4GB docs
max_wal_size 1024MB 16GB docs
max_parallel_maintenance_workers 2 2 docs

Backup, restore

pg_dump did not work out of the box, so going to use bg_basebackup which does full backup of everything

Example

# start database
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v $PWD/data:/var/lib/postgresql/data -v $PWD/backup:/backup timescale/timescaledb:1.7.0-pg12

# create table
docker exec -it db psql --username=postgres -c "create table demo (id int, value int);"

# insert few rows
docker exec -it db psql --username=postgres -c "insert into demo values (1,1), (2,2), (3,3)"

# check
docker exec -it db psql --username=postgres -c "select * from demo"

# backup
docker exec -it db pg_basebackup --pgdata=/backup --format=tar --gzip  --username=postgres

# kill database
docker rm -f db

# cleanup data
rm -rf data && mkdir data

# extract backup into data
tar -xvC data -f backup/base.tar.gz
tar -xvC data/pg_wal -f backup/pg_wal.tar.gz

# start database
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v $PWD/data:/var/lib/postgresql/data -v $PWD/backup:/backup timescale/timescaledb:1.7.0-pg12

# check
docker exec -it db psql --username=postgres -c "select * from demo"

Notes:

  • before full backup cleanup folder
  • in windows had issues with permissions

Upgrade

It seems like upgrade of timescale itself is quite easy, problems can be while upgrading postgres itself

https://github.com/docker-library/postgres/issues/37 - here you can find some useful info

# start database timescale - 1.5.1, postgres - 11
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v $PWD/data:/var/lib/postgresql/data timescale/timescaledb:1.5.1-pg11

# getting started guide

# create table
docker exec -it db psql --username=postgres -c "CREATE TABLE conditions (
    time        TIMESTAMPTZ       NOT NULL,
    location    TEXT              NOT NULL,
    temperature DOUBLE PRECISION  NULL,
    humidity    DOUBLE PRECISION  NULL
);"

# create hypertable
docker exec -it db psql --username=postgres -c "SELECT create_hypertable('conditions', 'time');"

# seed 1K rows
docker exec -it db psql --username=postgres -c "insert into conditions (time, location, temperature, humidity)
select
    now() - INTERVAL '1 min' * (1000 - i) AS time,
    (array['office', 'home', 'garage'])[floor(random() * 3 + 1)] AS location,
    floor(random() * (100 - 1 + 1) + 1) AS temperature,
    floor(random() * (100 - 1 + 1) + 1) AS humidity
from generate_series(1, 1000) s(i);"

# check number of rows
docker exec -it db psql --username=postgres -c "select count(*) from conditions"

# create continuous aggregate
docker exec -it db psql --username=postgres -c "CREATE VIEW conditions_summary_hourly
WITH (timescaledb.continuous) AS
SELECT location,
       time_bucket(INTERVAL '1 hour', time) AS bucket,
       AVG(temperature),
       MAX(temperature),
       MIN(temperature)
FROM conditions
GROUP BY location, bucket;"

# check continuous aggregate
docker exec -it db psql --username=postgres -c "SELECT * FROM conditions_summary_hourly limit 3;"

# gracefully stop database
docker stop db

# start database timescale - 1.6.1, postgres - 11
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v $PWD/data:/var/lib/postgresql/data timescale/timescaledb:1.6.1-pg11

# list extensions \dx timescaledb
docker exec -it db psql --username=postgres -c "SELECT extname, extversion FROM pg_catalog.pg_extension"

# update extensions
docker exec -it db psql --username=postgres -c "ALTER EXTENSION timescaledb UPDATE"

# check
docker exec -it db psql --username=postgres -c "select count(*) from conditions"

# check continuous aggregate
docker exec -it db psql --username=postgres -c "SELECT * FROM conditions_summary_hourly limit 3;"

# gracefully stop database
docker stop db

# start database timescale - 1.7.0, postgres - 12
# docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v $PWD/data:/var/lib/postgresql/data timescale/timescaledb:1.7.0-pg12

# FATAL: database files are incompatible with server
# DETAIL: The data directory was initialized by PostgreSQL version 11, which is not compatible with this version 12.2.

Compression

We need to figure out how we will distinct categorial from linear metadata in fact, so we can compress them. Also, we need to figure out, after what amount of time data will be compressed, in example it is 7 days, which may be too big

compression - something similar to columnar index from sql server, docs describe it very well

After experiments i was able to compress data 4 times

# start database
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12

# create table
docker exec -it db psql --username=postgres -c "CREATE TABLE conditions (
    time        TIMESTAMPTZ       NOT NULL,
    location    TEXT              NOT NULL,
    temperature DOUBLE PRECISION  NULL,
    humidity    DOUBLE PRECISION  NULL
);"


# create hypertable
docker exec -it db psql --username=postgres -c "SELECT create_hypertable('conditions', 'time');"

# seed 1M rows
docker exec -it db psql --username=postgres -c "insert into conditions (time, location, temperature, humidity)
select
    now() - INTERVAL '1 min' * (1000000 - i) AS time,
    (array['office', 'home', 'garage'])[floor(random() * 3 + 1)] AS location,
    floor(random() * (100 - 1 + 1) + 1) AS temperature,
    floor(random() * (100 - 1 + 1) + 1) AS humidity
from generate_series(1, 1000000) s(i);"

# get size of whole database - 112MB
docker exec -it db psql --username=postgres -c "SELECT pg_size_pretty( pg_database_size('postgres') );"
# get size of conditions table - 16KB
docker exec -it db psql --username=postgres -c "SELECT pg_size_pretty( pg_total_relation_size('conditions') );"
# get size of conditions hypertable - 102MB
docker exec -it db psql --username=postgres -c "SELECT table_name, num_chunks, total_size FROM timescaledb_information.hypertable;"
# get disk usage - 305MB
docker exec -it db du -hcs /var/lib/postgresql/data


# enable compression
docker exec -it db psql --username=postgres -c "ALTER TABLE conditions SET ( timescaledb.compress, timescaledb.compress_segmentby = 'location')"
docker exec -it db psql --username=postgres -c "SELECT add_compress_chunks_policy('conditions', INTERVAL '7 days');"

# get compression stats (after a while 98 of 100 chunks should be compressed)
docker exec -it db psql --username=postgres -c "SELECT hypertable_name, chunk_name, compression_status, uncompressed_total_bytes, compressed_total_bytes FROM timescaledb_information.compressed_chunk_stats;"

# get size of whole database - 29MB
docker exec -it db psql --username=postgres -c "SELECT pg_size_pretty( pg_database_size('postgres') );"
# get size of conditions hypertable - 17MB
docker exec -it db psql --username=postgres -c "SELECT table_name, num_chunks, total_size FROM timescaledb_information.hypertable;"
# get disk usage - 222MB
docker exec -it db du -hcs /var/lib/postgresql/data

Data Retention

May be useful for product metrics, for example our internal telemetry

data retention - mechanism of automated cleanup of outdated data, aka we may ask timescaledb to automatically delete data older than X days, example:

-- 100 chunks
SELECT show_chunks('conditions');
-- 99 chunks
SELECT show_chunks('conditions', older_than => INTERVAL '24 hours');
-- delete manualy
-- SELECT drop_chunks(INTERVAL '24 hours', 'conditions');
-- create job
SELECT add_drop_chunks_policy('conditions', INTERVAL '24 hours');
-- list jobs
SELECT * FROM timescaledb_information.drop_chunks_policies;
-- 1 chunk
SELECT show_chunks('conditions');
-- 3.2K rows of 1M left
select count(*) from conditions;

Performance test

For performance tests there is built in util pgbench, here is example of usage:

# start database
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12

# create table
docker exec -it db psql --username=postgres -c "CREATE TABLE conditions (
    time        TIMESTAMPTZ       NOT NULL,
    location    TEXT              NOT NULL,
    temperature DOUBLE PRECISION  NULL,
    humidity    DOUBLE PRECISION  NULL
);"

# create hypertable
docker exec -it db psql --username=postgres -c "SELECT create_hypertable('conditions', 'time');"

echo "insert into conditions (time, location, temperature, humidity) values (now(), (array['office', 'home', 'garage'])[floor(random() * 3 + 1)], floor(random() * (100 - 1 + 1) + 1), floor(random() * (100 - 1 + 1) + 1))" > bench.sql
docker cp bench.sql db:/var/lib/postgresql/bench.sql

# initialize pgbench
docker exec -it db pgbench -U postgres -i

# run write bench for 10 clients over 60 seconds
docker exec -it db pgbench --username=postgres --progress=2 --client=10 --time=60 --jobs=2 --no-vacuum --file=/var/lib/postgresql/bench.sql

Out example:

transaction type: /var/lib/postgresql/bench.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 2
duration: 10 s
number of transactions actually processed: 72755
latency average = 1.371 ms
latency stddev = 0.349 ms
tps = 7272.065619 (including connections establishing)
tps = 7273.068218 (excluding connections establishing)

in which we are interested in number of transactions and average latency

Monitoring

at moment it is not clear what to monitor

timescaledb on its own does not have anything special for monitofing

supposedly we will use:

ssh rancher@167.172.169.230

docker run -d --name=node_exporter --restart=always --net=host --pid=host -v "/:/host:ro,rslave" -p 9100:9100 quay.io/prometheus/node-exporter --path.rootfs=/host
docker run -d --name=postgres_exporter --restart=always --net=host -e DATA_SOURCE_NAME="postgresql://postgres:rtB7aYaFF5WBx8cL@localhost:5432/postgres?sslmode=disable" wrouesnel/postgres_exporter
docker run -d --name=cadvisor_exporter --restart=always --volume=/:/rootfs:ro --volume=/var/run:/var/run:ro --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --volume=/dev/disk/:/dev/disk:ro --publish=8080:8080 --privileged gcr.io/google-containers/cadvisor

curl -s 167.172.169.230:9100/metrics
curl -s 167.172.169.230:9187/metrics
curl -s 167.172.169.230:8080/metrics

mkdir prometheus
touch prometheus.yml
docker run -d --name=prometheus --restart=always --net=host -v $PWD/prometheus.yml:/etc/prometheus/prometheus.yml -v $PWD/prometheus:/prometheus --user=root prom/prometheus


mkdir grafana
docker run -d --name=grafana --restart=always -p 3000:3000 -v $PWD/grafana:/var/lib/grafana --user=$(id -u) grafana/grafana

and prometheus.yml

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: node_exporter
    static_configs:
      - targets:
          - localhost:9100
  - job_name: postgres_exporter
    static_configs:
      - targets:
          - localhost:9187
  - job_name: cadvisor_exporter
    static_configs:
      - targets:
          - localhost:8080

Read Only User

CREATE ROLE mac WITH LOGIN PASSWORD '1234' NOSUPERUSER INHERIT NOCREATEDB NOCREATEROLE NOREPLICATION VALID UNTIL 'infinity';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO mac;