Achieving fast upserts for Apache Druid

Druid’s data hierarchy

Druid servers architecture taken from https://druid.apache.org/docs/latest/design/processes.html
Druid servers architecture. Taken from https://druid.apache.org/docs/latest/design/processes.html

Data in Singular’s pipeline

Data in Singular’s Druid

How we used to load data into Druid

How we load data to Druid now

Small segments

Filter complexity

(
table_type='Append' AND (
(source='Facebook' AND date='2021-01-01' AND version='ver1') OR
(source='Facebook' AND date='2021-01-02' AND version='ver2') OR
(source='Adwords' AND date='2021-01-01' AND version='ver3')
)
) OR
(
table_type='Primary' AND (
date not in ('2021-01-01', '2021-01-02') OR
(date='2021-01-01' and source not in ('Facebook', 'Adword')) OR
(date='2021-01-02' and source not in ('Facebook',))
)
)

Version state

Query performance

Results

Closing thoughts

--

--

--

Singular’s engineering blog. We post here about tech topics we encounter and solutions we build. For more info go to → www.singular.net :)

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Reloading Rails Partials on Demand with render_async

Building a Fincrime Feature Store — How we use Golang and Dataflow

Excel to(at) SQL - JOINS

How to run ARM Windows on an M1 Mac

Hadoop Installation From Scratch

World Cup Simulator — Week 1

Three features SASS you should know.

BlueHost VS GoDaddy OverView

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Singular Engineering Blog

Singular Engineering Blog

Singular’s engineering blog. We post here about tech topics we encounter and solutions we build. For more info go to → www.singular.net :)

More from Medium

CDC merge capability comparison of Apache CarbonData and Apache Hudi

Kafka for Condition Monitoring and Predictive Maintenance in Industrial IoT

Testing Kerberos Authenticated HIVE Connections in Apache DolphinScheduler

Getting Started with Apache Iceberg Tables Using AWS Glue Custom Connector