Use Tableau to Connect to DuckDB

Brock Tibert
4 min readSep 24, 2022

I recently came across a situation where DuckDB fit the need on a recent project. The ask was to demonstrate the construction of a data pipeline and highlight the analytical possibilities of a spatial tracking dataset. DuckDB is an impressive in-process single-file database that proved to be blazingly fast, but also an absolute delight to code against.

To expand upon my use case, I needed to prototype an ETL pipeline that would be scaled out to process about 10GB of data each day. This is a drop in the bucket for FAANG-like orgs, but for most SMBs, this can be a daunting task, especially for small analytics teams that tend to work with shared data on a network and run their analyses locally.

DuckDB has a great API, and using the python client library was a snap. Moreover, the performance is amazing; SELECT * from table on tables with 1.5M rows and 50+ columns results in full dataframes in only a few seconds. DuckDB is a great way to introduce performance for those who need to consider local options before expanding/scaling to the cloud.

To demonstrate the analytical possibilities for the client, I wanted to build a dashboard in Tableau but found the resources on connecting to DuckDB quite limited.

Below is aimed to be a quick start guide for those who may want to follow a similar analytical path.

Setup

While DuckDB provides a number of Client APIs, this guide will focus on development in python.

  1. Install the DuckDB library into your python environment via pip install duckdb
  2. You will need to download the JDBC driver for use in Tableau. You can download the JDBC jar file here.
  3. Tableau provides instructions on where to place the downloaded jar file here. In my case, because I am on a Mac, I placed the jar file in ~/Library/Tableau/Drivers.

With the necessary components in place, the simple script below will create a small DuckDB database with a single table in python.

import duckdb
import string
import random
import pandas as pd
# connect to DuckDB.
# This will create an instance if it doesn't already exist
con = duckdb.connect("data/example.db")# create the table in DuckDB
SQL = """
CREATE TABLE example(a INTEGER, b VARCHAR, c BOOLEAN, d REAL);
"""
con.execute(SQL)
# confirm the table exists
con.execute("PRAGMA show_tables;").fetch_df()

Note above, I am using .fetch_df() to return the results as a pandas Dataframe.

With the schema setup, we can insert data directly into DuckDB from pandas quite easily.

# create a simple dataset
data = {'a': list(range(3)),
'b': list(string.ascii_lowercase[:3]),
'c': [True, False, True],
'd': [random.random() for _ in range(3)]}
# make the dictionary a dataframe
df = pd.DataFrame(data)
# insert the dataframe into the table defined above
con.execute("INSERT INTO example SELECT * FROM df")

To confirm that the data exists:

# confirm the data exists in the table
con.execute("SELECT * from example").fetch_df()
a b c d
0 0 a True 0.529113
1 1 b False 0.780074
2 2 c True 0.004097

Connect Tableau to DuckDB

With the DuckDB jar downloaded and placed in the appropriate folder for Tableau, we can now setup a connection to our small DuckDB database.

  1. Select Other Databases (JDBC) from a To a Server section on the Connect screen.
  2. The connection string should be setup as follows: jdbc:duckdb:///path/path/data/example.db It’s worth noting that this is the fully qualified path to the location of the DuckDB you created programmatically. While there are number of ways to access this path, if you are using VS Code, simply right-click on the file and select Copy Path.

An example of the connection screen is shown below:

It may take a few moments, but Tableau will connect and present the Data Source page.

And the data preview screen:

Finally, an arbitrary data viz:

Some Notes

  • It is possible to create a database table directly from a pandas dataframe via CREATE TABLE example AS SELECT * from df.
  • DuckDB can handle nested data in columns (e.g. JSON). In my case, I didn’t want to flatten this data for the MVP, and inserting the data into DuckDB was seamless.

However, while I was able to connect Tableau to my instance, I was unable to pull data. I suspect that could be two reasons for this:

  1. There may be datatypes supported in DuckDB that Tableau does not support or recognize. For this reason, instead of creating your tables and views on the fly, I would recommend defining your schema before inserting the data into the table as we did above. Simply, explicitly define the schema.
  2. If you still run into issues, avoid using more “complex” datatypes like blob, list, struct, or maps.

Even if do not have a use case that incorporates Tableau into your workflow, I would recommend taking a look at DuckDB if you haven’t already, as the toolkit is extremely impressive.

--

--

Brock Tibert

IS Faculty (Questrom) — Embedded Data/Product/Analytics Advisor — @brocktibert