TimescaleDB
TimescaleDB: An open-source database built for analyzing time-series data with the power and convenience of SQL — on premise, at the edge or in the cloud.
Demo stand
while investigatinve everythin else, to not become a bottle neck created an digital ocean vm with server
- hostname: 167.172.169.230
- username (ssh): ranhcer
- username (sql): postgres
- password: xxxx
docker run -d --name=db --restart=always -p 5432:5432 -e POSTGRES_PASSWORD="rtB7aYaFF5WBx8cL" -v $PWD/data:/var/lib/postgresql/data timescale/timescaledb:1.7.0-pg12in case of issues just connect via ssh and restart container
for convinience pgadmin is also added
mkdir pgadmin
sudo chown -R 5050:5050 pgadmin
docker run -d --name=pgadmin --restart=always -p 80:80 -e PGADMIN_DEFAULT_EMAIL=svc@rabota.ua -e PGADMIN_DEFAULT_PASSWORD=xxxxx -v $PWD/pgadmin:/var/lib/pgadmin dpage/pgadmin4Seed database with sample data
In oficial docs there are articles with examples - Tutorial: How to simulate a basic IoT sensor dataset on PostgreSQL or TimescaleDB
Example with containers:
003_seed.sh
#!/bin/bash
psql --username=postgres --command="CREATE database tutorial"
psql --username=postgres --dbname=tutorial --command="CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;"
psql --username=postgres --dbname=tutorial --command="
-- crate conditions table
CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL
);
-- create hypertable
SELECT create_hypertable('conditions', 'time');
-- seed conditions table with million rows
insert into conditions (time, location, temperature, humidity)
select
now() - INTERVAL '1 min' * (1000000 - i) AS time,
(array['office', 'home', 'garage'])[floor(random() * 3 + 1)] AS location,
floor(random() * (100 - 1 + 1) + 1) AS temperature,
floor(random() * (100 - 1 + 1) + 1) AS humidity
from generate_series(1, 1000000) s(i);
"
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v /d/OneDrive/Desktop/timescaledocs/003_seed.sh:/docker-entrypoint-initdb.d/003_seed.sh timescale/timescaledb:1.7.0-pg12tutorial database will be created in running container with conditions table containing 1M of rows
Clients
List of clients from official wiki
mkdir pgadmin
sudo chown -R 5050:5050 pgadmin
docker run -d --name=pgadmin --restart=always -p 80:80 -e PGADMIN\_DEFAULT\_EMAIL=svc@rabota.ua -e PGADMIN\_DEFAULT\_PASSWORD=xxxxx -v $PWD/pgadmin:/var/lib/pgadmin dpage/pgadmin4psql, pgcli
pgcli - console, open source client
Configuration
Official docs recommends use built in feature.
While starting timescaledb-tune command is ran, which will configure everything with recommended settings, you can always overridy any of settings by passing them as arguments
for example shared_buffer, by default will be 128mb
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -e NO_TS_TUNE=y timescale/timescaledb:1.7.0-pg12
docker exec -it db psql --username=postgres -c "select name, (cast(setting as int) * 8) / 1024 as mb from pg_settings where name = 'shared_buffers';"timescaledb-tune on may machine sets it to 1.5gb
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12
docker exec -it db psql --username=postgres -c "select name, (cast(setting as int) * 8) / 1024 as mb from pg_settings where name = 'shared_buffers';"override example:
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12 postgres -cshared_buffers=100MB
docker exec -it db psql --username=postgres -c "select name, (cast(setting as int) * 8) / 1024 as mb from pg_settings where name = 'shared_buffers';"to see what exactly does timescaledb-tune run:
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -e NO_TS_TUNE=y timescale/timescaledb:1.7.0-pg12
docker exec -it db timescaledb-tuneexmple of recommended settings
Recommendations based on 5.81 GB of available memory and 4 CPUs for PostgreSQL 12
| Category | Setting | Old | New |
|---|---|---|---|
| memory | shared_buffers | 128MB | 1487MB |
| memory | effective_cache_size | N/A | 4462MB |
| memory | maintenance_work_mem | N/A | 761648kB |
| memory | work_mem | N/A | 3808kB |
| parallelism | timescaledb.max_background_workers | N/A | 8 |
| parallelism | max_worker_processes | 8 | 15 |
| parallelism | max_parallel_workers_per_gather | N/A | 2 |
| parallelism | max_parallel_workers | N/A | 4 |
| wal | wal_buffers | N/A | 16MB |
| wal | min_wal_size | 80MB | 512MB |
| miscellaneous | default_statistics_target | N/A | 500 |
| miscellaneous | random_page_cost | N/A | 1.1 |
| miscellaneous | checkpoint_completion_target | N/A | 0.9 |
| miscellaneous | max_locks_per_transaction | N/A | 64 |
| miscellaneous | autovacuum_max_workers | N/A | 10 |
| miscellaneous | autovacuum_naptime | N/A | 10 |
| miscellaneous | effective_io_concurrency | N/A | 200 |
Additional settings
While starting, there is a check that verifies if data folder exists, and if so - tune is skipped, so we need to rerun tune on each new erver, and settings may be edited in $PGDATA/postgresql.conf
Disk Write Settings
synchronous_commit = 'off' - turn off symc commit, will spead up write, risking to lose uncommited data - in our case makes sense
with sync commit i was able to get around 7K rps, with it disabled - 39K
Here is an example of how check was done:
Пример которым проверял
# experiment 1: start database with sunchronous commit on
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12
# experiment 2: start database with sunchronous commit off
# docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12 -csynchronous_commit=off
# check current setting value
docker exec -it db psql --username=postgres -c "select name, setting from pg_settings where name = 'synchronous_commit'"
# create table
docker exec -it db psql --username=postgres -c "CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL
);"
# create hypertable
docker exec -it db psql --username=postgres -c "SELECT create_hypertable('conditions', 'time');"
echo "insert into conditions (time, location, temperature, humidity) values (now(), (array['office', 'home', 'garage'])[floor(random() * 3 + 1)], floor(random() * (100 - 1 + 1) + 1), floor(random() * (100 - 1 + 1) + 1))" > bench.sql
docker cp bench.sql db:/var/lib/postgresql/bench.sql
# run benchmark
docker exec -it db pgbench --username=postgres --progress=2 --client=10 --time=60 --jobs=2 --no-vacuum --file=/var/lib/postgresql/bench.sqlLock Settings
Recommended settings is max_locks_per_transaction = 2 * num_chunks, where num_chunks - max number of chunks in single hypertable, if indexes are used this number should be doubled
pgtune
pgtune - online tool for tuning, gives approximate same results as timescaledb-tune (i have set that we are expecting around 500 connections):
| Setting | Original | Recommended | Description | |||||
|---|---|---|---|---|---|---|---|---|
| max_connections | 100 | 500 | docs | |||||
| effective_io_concurrency | 200 | 200 | docs | |||||
| min_wal_size | 512MB | 4GB | docs | |||||
| max_wal_size | 1024MB | 16GB | docs | |||||
| max_parallel_maintenance_workers | 2 | 2 | docs |
Backup, restore
pg_dump did not work out of the box, so going to use bg_basebackup which does full backup of everything
Example
# start database
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v $PWD/data:/var/lib/postgresql/data -v $PWD/backup:/backup timescale/timescaledb:1.7.0-pg12
# create table
docker exec -it db psql --username=postgres -c "create table demo (id int, value int);"
# insert few rows
docker exec -it db psql --username=postgres -c "insert into demo values (1,1), (2,2), (3,3)"
# check
docker exec -it db psql --username=postgres -c "select * from demo"
# backup
docker exec -it db pg_basebackup --pgdata=/backup --format=tar --gzip --username=postgres
# kill database
docker rm -f db
# cleanup data
rm -rf data && mkdir data
# extract backup into data
tar -xvC data -f backup/base.tar.gz
tar -xvC data/pg_wal -f backup/pg_wal.tar.gz
# start database
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v $PWD/data:/var/lib/postgresql/data -v $PWD/backup:/backup timescale/timescaledb:1.7.0-pg12
# check
docker exec -it db psql --username=postgres -c "select * from demo"Notes:
- before full backup cleanup folder
- in windows had issues with permissions
Upgrade
It seems like upgrade of timescale itself is quite easy, problems can be while upgrading postgres itself
https://github.com/docker-library/postgres/issues/37 - here you can find some useful info
# start database timescale - 1.5.1, postgres - 11
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v $PWD/data:/var/lib/postgresql/data timescale/timescaledb:1.5.1-pg11
# getting started guide
# create table
docker exec -it db psql --username=postgres -c "CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL
);"
# create hypertable
docker exec -it db psql --username=postgres -c "SELECT create_hypertable('conditions', 'time');"
# seed 1K rows
docker exec -it db psql --username=postgres -c "insert into conditions (time, location, temperature, humidity)
select
now() - INTERVAL '1 min' * (1000 - i) AS time,
(array['office', 'home', 'garage'])[floor(random() * 3 + 1)] AS location,
floor(random() * (100 - 1 + 1) + 1) AS temperature,
floor(random() * (100 - 1 + 1) + 1) AS humidity
from generate_series(1, 1000) s(i);"
# check number of rows
docker exec -it db psql --username=postgres -c "select count(*) from conditions"
# create continuous aggregate
docker exec -it db psql --username=postgres -c "CREATE VIEW conditions_summary_hourly
WITH (timescaledb.continuous) AS
SELECT location,
time_bucket(INTERVAL '1 hour', time) AS bucket,
AVG(temperature),
MAX(temperature),
MIN(temperature)
FROM conditions
GROUP BY location, bucket;"
# check continuous aggregate
docker exec -it db psql --username=postgres -c "SELECT * FROM conditions_summary_hourly limit 3;"
# gracefully stop database
docker stop db
# start database timescale - 1.6.1, postgres - 11
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v $PWD/data:/var/lib/postgresql/data timescale/timescaledb:1.6.1-pg11
# list extensions \dx timescaledb
docker exec -it db psql --username=postgres -c "SELECT extname, extversion FROM pg_catalog.pg_extension"
# update extensions
docker exec -it db psql --username=postgres -c "ALTER EXTENSION timescaledb UPDATE"
# check
docker exec -it db psql --username=postgres -c "select count(*) from conditions"
# check continuous aggregate
docker exec -it db psql --username=postgres -c "SELECT * FROM conditions_summary_hourly limit 3;"
# gracefully stop database
docker stop db
# start database timescale - 1.7.0, postgres - 12
# docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password -v $PWD/data:/var/lib/postgresql/data timescale/timescaledb:1.7.0-pg12
# FATAL: database files are incompatible with server
# DETAIL: The data directory was initialized by PostgreSQL version 11, which is not compatible with this version 12.2.Compression
We need to figure out how we will distinct categorial from linear metadata in fact, so we can compress them. Also, we need to figure out, after what amount of time data will be compressed, in example it is 7 days, which may be too big
compression - something similar to columnar index from sql server, docs describe it very well
After experiments i was able to compress data 4 times
# start database
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12
# create table
docker exec -it db psql --username=postgres -c "CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL
);"
# create hypertable
docker exec -it db psql --username=postgres -c "SELECT create_hypertable('conditions', 'time');"
# seed 1M rows
docker exec -it db psql --username=postgres -c "insert into conditions (time, location, temperature, humidity)
select
now() - INTERVAL '1 min' * (1000000 - i) AS time,
(array['office', 'home', 'garage'])[floor(random() * 3 + 1)] AS location,
floor(random() * (100 - 1 + 1) + 1) AS temperature,
floor(random() * (100 - 1 + 1) + 1) AS humidity
from generate_series(1, 1000000) s(i);"
# get size of whole database - 112MB
docker exec -it db psql --username=postgres -c "SELECT pg_size_pretty( pg_database_size('postgres') );"
# get size of conditions table - 16KB
docker exec -it db psql --username=postgres -c "SELECT pg_size_pretty( pg_total_relation_size('conditions') );"
# get size of conditions hypertable - 102MB
docker exec -it db psql --username=postgres -c "SELECT table_name, num_chunks, total_size FROM timescaledb_information.hypertable;"
# get disk usage - 305MB
docker exec -it db du -hcs /var/lib/postgresql/data
# enable compression
docker exec -it db psql --username=postgres -c "ALTER TABLE conditions SET ( timescaledb.compress, timescaledb.compress_segmentby = 'location')"
docker exec -it db psql --username=postgres -c "SELECT add_compress_chunks_policy('conditions', INTERVAL '7 days');"
# get compression stats (after a while 98 of 100 chunks should be compressed)
docker exec -it db psql --username=postgres -c "SELECT hypertable_name, chunk_name, compression_status, uncompressed_total_bytes, compressed_total_bytes FROM timescaledb_information.compressed_chunk_stats;"
# get size of whole database - 29MB
docker exec -it db psql --username=postgres -c "SELECT pg_size_pretty( pg_database_size('postgres') );"
# get size of conditions hypertable - 17MB
docker exec -it db psql --username=postgres -c "SELECT table_name, num_chunks, total_size FROM timescaledb_information.hypertable;"
# get disk usage - 222MB
docker exec -it db du -hcs /var/lib/postgresql/dataData Retention
May be useful for product metrics, for example our internal telemetry
data retention - mechanism of automated cleanup of outdated data, aka we may ask timescaledb to automatically delete data older than X days, example:
-- 100 chunks
SELECT show_chunks('conditions');
-- 99 chunks
SELECT show_chunks('conditions', older_than => INTERVAL '24 hours');
-- delete manualy
-- SELECT drop_chunks(INTERVAL '24 hours', 'conditions');
-- create job
SELECT add_drop_chunks_policy('conditions', INTERVAL '24 hours');
-- list jobs
SELECT * FROM timescaledb_information.drop_chunks_policies;
-- 1 chunk
SELECT show_chunks('conditions');
-- 3.2K rows of 1M left
select count(*) from conditions;Performance test
For performance tests there is built in util pgbench, here is example of usage:
# start database
docker run -it --rm --name=db -p 5432:5432 -e POSTGRES_PASSWORD=password timescale/timescaledb:1.7.0-pg12
# create table
docker exec -it db psql --username=postgres -c "CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL
);"
# create hypertable
docker exec -it db psql --username=postgres -c "SELECT create_hypertable('conditions', 'time');"
echo "insert into conditions (time, location, temperature, humidity) values (now(), (array['office', 'home', 'garage'])[floor(random() * 3 + 1)], floor(random() * (100 - 1 + 1) + 1), floor(random() * (100 - 1 + 1) + 1))" > bench.sql
docker cp bench.sql db:/var/lib/postgresql/bench.sql
# initialize pgbench
docker exec -it db pgbench -U postgres -i
# run write bench for 10 clients over 60 seconds
docker exec -it db pgbench --username=postgres --progress=2 --client=10 --time=60 --jobs=2 --no-vacuum --file=/var/lib/postgresql/bench.sqlOut example:
transaction type: /var/lib/postgresql/bench.sql
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 2
duration: 10 s
number of transactions actually processed: 72755
latency average = 1.371 ms
latency stddev = 0.349 ms
tps = 7272.065619 (including connections establishing)
tps = 7273.068218 (excluding connections establishing)in which we are interested in number of transactions and average latency
Monitoring
at moment it is not clear what to monitor
timescaledb on its own does not have anything special for monitofing
supposedly we will use:
ssh rancher@167.172.169.230
docker run -d --name=node_exporter --restart=always --net=host --pid=host -v "/:/host:ro,rslave" -p 9100:9100 quay.io/prometheus/node-exporter --path.rootfs=/host
docker run -d --name=postgres_exporter --restart=always --net=host -e DATA_SOURCE_NAME="postgresql://postgres:rtB7aYaFF5WBx8cL@localhost:5432/postgres?sslmode=disable" wrouesnel/postgres_exporter
docker run -d --name=cadvisor_exporter --restart=always --volume=/:/rootfs:ro --volume=/var/run:/var/run:ro --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --volume=/dev/disk/:/dev/disk:ro --publish=8080:8080 --privileged gcr.io/google-containers/cadvisor
curl -s 167.172.169.230:9100/metrics
curl -s 167.172.169.230:9187/metrics
curl -s 167.172.169.230:8080/metrics
mkdir prometheus
touch prometheus.yml
docker run -d --name=prometheus --restart=always --net=host -v $PWD/prometheus.yml:/etc/prometheus/prometheus.yml -v $PWD/prometheus:/prometheus --user=root prom/prometheus
mkdir grafana
docker run -d --name=grafana --restart=always -p 3000:3000 -v $PWD/grafana:/var/lib/grafana --user=$(id -u) grafana/grafanaand prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: node_exporter
static_configs:
- targets:
- localhost:9100
- job_name: postgres_exporter
static_configs:
- targets:
- localhost:9187
- job_name: cadvisor_exporter
static_configs:
- targets:
- localhost:8080Read Only User
CREATE ROLE mac WITH LOGIN PASSWORD '1234' NOSUPERUSER INHERIT NOCREATEDB NOCREATEROLE NOREPLICATION VALID UNTIL 'infinity';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO mac;