A Streamlit dashboard: 5 steps from zero to Docker-deployed
Data visualization is a vital activity in data science: there is no point in gathering data, making it pristine clean, building complex queries to answer business questions and find insights, or building an ML model if you cannot show your findings and predictions in a compelling, functional way.
Outputs in data science can come in tabular form with thousands of rows, however showing something like this in a business presentation can be quite a deception, even though the information is relevant. People react better to ordered, colored, and graphically coherent information, so if you manage to present your analysis or procedure result in such a way, the odds that it can be used to take key decisions improve dramatically.
To visualize complex data there are a wide variety of tools, and in this post, I am going to use Streamlit. Why? Well… This post’s objective is not to compare visualization libraries but to help you to build a quick framework to deploy a Hello world! Streamlit dashboard using Docker, which you can use as a base to build on top. However, according to my personal experience, I can tell you these good things about Streamlit.
- You can build a dynamic dashboard that runs in a browser without writing a single line of front-end code: no Javascript, no endpoints, no controllers, no AJAX. If you are like me (I hate writing front-end code) then you will love the fact that only Python is needed to deploy a basic functional Streamlit dashboard.
- Several components you can use. If you need a button, a collapsing panel, numeric and date inputs, drop-downs, multi-selects, radio buttons they are all included and you can instantiate them with a line of Python code, and they look great out-of-the-box. You can also design elements in columns. Streamlit handles the responsive heavy-lifting for you.
- Its visual style is customizable. Yes, I said you don’t require to write front-end code, however, if you will, you can customize the look and feel of your dashboard by writing custom CSS. This is only if you are very picky because out-of-the-box Streamlit components look great.
- Streamlit is only a visualization tool, it will work with whatever other Python libraries you use to handle, retrieve and process your data. In this post I will use Pandas and Seaborn to read and plot static data, however, you can use lists, dictionaries, and bare Pyplot scripts. You could as well use MySQL to gather real-time data and show it right away.
The problem
To provide some context let's assume the task in hand is to generate a dashboard about some marketing KPIs to show a weekly plot describing how do they behave, and also provide some stats. Something like this, and please forgive my ugly writing.
Step 0. Prerequisites
From now on I assume you know nothing about Streamlit but you are familiar with Python, a virtual environment such as virtualenv and Docker.
Step 1. Installation
The file project structure I will use is super simple.
streamlit-demo
app
__init__.py
dashboard.py
.gitignore
Dockerfile
requirements.txt
After creating the structure I will install some libraries with the following commands, assuming you are using Linux.
# clone and move to the directory
cd streamlit-demo# create virtual environment and activate it
virtualenv env --python=python3
source env/bin/activate# install libraries
pip install streamlit
pip install pandas
pip install seaborn
pip install matplotlib
Step 2. Say Hello World
Let's allow Streamlit to say Hello world! In the dashboard.py file write these lines.
import streamlit as st
st.title('Streamlit dashboard')
st.text('Hello world')
Now cd into de app directory and run the command.
streamlit run app/dashboard.py
And that is all! You will see a message stating that the Streamlit app is ready to be seen in your browser like this.
You cannot say it is not great. Streamlit took care of all the process of creating a web server, launching it, creating the HTML code to render the title and text components. Streamlit even handles the JS code creation whenever it is required.
This app is quite useless though, let's change that now.
Step 2. Gathering some data
To make things easier I am just generating some random data, however, you could pull it from a database, build with SQL from other sources, or whatever fancy process you can think about. As long as it is a Pandas dataframe it will work.
The following code will generate some random data containing a daily record of impressions and clicks.
import streamlit as st
import pandas as pd
import numpy as np# make sure you get the same data each time
np.random.seed(1)
def build_dataframe():
# create two columns with random data
data = {
'impressions': np.random.randint(low=111, high=10000, size=100),
'clicks': np.random.randint(low=0, high=1000, size=100)
}
df = pd.DataFrame(data)
# add a date column and calculate the weekday of each row
df['date'] = pd.date_range(start='1/1/2018', periods=100)
df['weekday'] = df['date'].dt.dayofweek
return df
st.title('Streamlit dashboard')
df = build_dataframe()
st.dataframe(df.describe())
The important part in this snippet is the st.dataframe(df.describe()) line. This is the way to tell Streamlit to render a dataframe. With this line, Streamlit will generate the required HTML to properly display a Pandas dataframe. This code will yield this new web app.
It is getting interesting. However, I am not close yet to achieving what I am expecting to do. I still need to build the KPI selection logic and the weekday grouping. Let's do the KPI selector first by adding this code below the st.title statement.
selected_kpi = st.selectbox(
'Select a KPI: ',
['clicks', 'impressions']
)
And this is the result.
OMG! A couple of lines and the KPI selector is done and working!
Those lines are powerful. The st.selectbox statement built the HTML selector and it also assigns the selected value to a Python variable you can use. How awesome is that?
Now I am going to use some Pandas magic to group the dataframe by weekday using the following code.
weekday_df = df.groupby('weekday').sum()
If dashboard.py is modified to show the weekday_df instead of the source dataframe description and rerun the Streamlit app (you do that either by typing letter r in your browser or by using the rerun button in the sandwich menu in the upper right), you have this.
Pandas Magic! That line grouped the source dataframe by weekday and aggregated the result with a sum. Looking good.
The final work is to generate a weekly plot per KPI. For that, I need to retrieve only the required KPI column using the selected_kpi variable, make a bar plot using Seaborn and tell Streamlit to display the plot. The dashboard.py file with the required code updates looks like this.
import matplotlib.pyplot as plt
import streamlit as st
import pandas as pd
import numpy as np
import seaborn as sns
np.random.seed(1)
def build_dataframe():
data = {
'impressions': np.random.randint(low=111, high=10000, size=100),
'clicks': np.random.randint(low=0, high=1000, size=100)
}
df = pd.DataFrame(data)
# add a date column and calculate the weekday of each row
df['date'] = pd.date_range(start='1/1/2018', periods=100)
df['weekday'] = df['date'].dt.dayofweek
return df
def build_weekly_bar_plot(df: pd.DataFrame, kpi: str):
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
sns.barplot(x='weekday', y=kpi, data=df, ax=ax)
return fig, ax
st.title('Streamlit dashboard')
selected_kpi = st.selectbox(
'Select a KPI: ',
['clicks', 'impressions']
)
df = build_dataframe()
weekday_df = df.groupby('weekday').sum()
# required to get a column called weekday from the index
weekday_df.reset_index(inplace=True)
weekday_df['weekday'] = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sat', 'Sun']
weekday_fig, weekday_ax = build_weekly_bar_plot(weekday_df, kpi=selected_kpi)
st.pyplot(weekday_fig)st.dataframe(df.describe())
The new code in here is after the weekday grouping and in the build_weekly_bar_plot function. Here is what is happening.
- Literal weekday names are used to replace the integer weekdays; 0 was for Monday, 1 for Tuesday, and so on.
- build_weekly_bar_plot uses Seaborn’s barplot method to create the required weekly plot by using the selected_kpi variable as the y axis bar values, and the weekday names as the x-axis values.
This code yields this dashboard… And it looks like we are done here. The dashboard shows a per-weekday view of the selected KPI and some general stats of the whole data.
Nevertheless, it looks ugly. Can it be tinkered a bit so it looks more decent? I bet it is possible. Let's update the last lines of code as follows.
with st.expander("Show weekday values table"):
st.dataframe(weekday_df)
with st.expander("Show full stats"):
st.dataframe(df.describe())
And this is the result.
Nice! The dashboard has now some nice buttons to show the additional data below the main plot, the st.expander command does the trick.
The final dashboard.py file looks like this now.
import matplotlib.pyplot as plt
import streamlit as st
import pandas as pd
import numpy as np
import seaborn as sns
np.random.seed(1)
def build_dataframe():
data = {
'impressions': np.random.randint(low=111, high=10000, size=100),
'clicks': np.random.randint(low=0, high=1000, size=100)
}
df = pd.DataFrame(data)
# add a date column and calculate the weekday of each row
df['date'] = pd.date_range(start='1/1/2018', periods=100)
df['weekday'] = df['date'].dt.dayofweek
return df
def build_weekly_bar_plot(df: pd.DataFrame, kpi: str):
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
sns.barplot(x='weekday', y=kpi, data=df, ax=ax)
return fig, ax
st.title('Streamlit dashboard')
selected_kpi = st.selectbox(
'Select a KPI: ',
['clicks', 'impressions']
)
df = build_dataframe()
weekday_df = df.groupby('weekday').sum()
# required to get a column called weekday from the index
weekday_df.reset_index(inplace=True)
weekday_df['weekday'] = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'Sat', 'Sun']
weekday_fig, weekday_ax = build_weekly_bar_plot(weekday_df, kpi=selected_kpi)
st.pyplot(weekday_fig)
with st.expander("Show weekday values table"):
st.dataframe(weekday_df)
with st.expander("Show full stats"):
st.dataframe(df.describe())
Let's stop for a moment to fully get a grasp of what is going on in here. Assume for a moment that there is no Python and no Streamlit, and you must develop a similar-looking dashboard… Perhaps using MySQL, PHP, Angular, and some web server such as Apache. Just think about the required work, it would include some of these tasks.
- Developing MySQL queries to gather the data
- Writing the PHP code to run the MySQL queries to retrieve the data and manipulate it
- Perhaps write the API code to receive POST or GET requests and return data in JSON format
- Write the Angular code (controllers, views) to create the queries and receive the data
- Design (or make use of pre-made components) to render the visual aspects of the Dashboard.
- Configure the Apache server
Ugly. Sounds like a week or month-long job assuming you have experience with all these technologies.
However, if you use Streamlit you have this dashboard up and running locally in a matter of 10 minutes, 30 minutes if you are a newbie with Python knowledge and you are really eager to understand what is going on.
Streamlit just saved you a ton of time.
So far so good, however, the server is running locally. Let's deploy it using Docker.
Step 4. Build a Dockerfile
First a heads up. Docker base images have issues when PIP libraries are updated (or is it the other way around?). A couple of months ago when I first built a Docker Streamlit dashboard the procedure of creating and activating the virtual environment, installing the dependencies with PIP, and freezing them to a requirements.txt file to be used by the Dockerfile worked just fine, however, while writing this post this procedure did not work with the Docker base image. So, to spare me some time and save you some headaches I will provide you with the specific requirements.txt you can use. It is in the git repository at the end of the post. So, let's return straight to business. This is the Dockerfile, dive in.
FROM python:3.8.0-slim
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN apt-get update \
&& apt-get install g++ -y \
&& apt-get install gcc -y \
&& apt-get install -y default-libmysqlclient-dev \
&& apt-get clean
# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt
# Run the application:
COPY app/ app/
COPY .env .
# Para exponer la app de streamlit
EXPOSE 8501
WORKDIR app/
# Ejecutar la app de streamlit al arrancar el contenedor
ENTRYPOINT ["streamlit", "run"]
CMD ["dashboard.py"]
The Dockerfile creates a virtual environment the proper way, installs the requirements, exposes port 8501 so you can render the Streamlit inside the Docker container from the host machine, and finally runs the commands to launch Streamlit.
Create an imaged tagged streamlit-demo with this command. Streamlit makes use of several libraries so building the process will take a while depending on your machine. So have a bit of patience.
docker build -t streamlit-demo .
I have used this Dockerfile for quite a few data analytics projects and it works just fine. However, it is not optimized for size or performance, if you need something more customized some improvements will be required.
Step 5. Deploy to a Docker container!
It is been quite a ride but it is almost over. The remaining step is to run a Docker container with the image built in the previous step and check that the dashboard is working using a browser.
To run the container you can use the following command. Since I am using the detached mode with the -d flag you will not see much feedback in your console.
docker run -p 8501:8501 -t -d --name dashboard streamlit-demo
The important bit of this command is the -p 8501:8501 flag, which indicates that it should map the host 8501 port to the same container port.
So drums please… 🥁
Head to http://localhost:8501/
If you see your dashboard then we are done!
Conclusions
Streamlit is a good-to-go option if you are willing to deploy a useful dashboard in a short period, without the hassle of building it from scratch using technologies such as PHP and Angular. One line of Python code is capable of building fully functional HTML components.
An important remark is that Streamlit runs in a top-bottom approach: the upper code defines what is rendered first in the dashboard. Also, when some input changes by means of a radio button, for instance, the whole dashboard is re-rendered.
Streamlit is designed as an internal tool. If you want to show some data to your stake owners, colleagues, or other internal users Streamlit will work like magic. However, if you will build a tool to be used for all the world, perhaps you should pass on Streamlit, since it has some limitations, such as its inability to build an out-of-the-box login feature. Nevertheless, Streamlit is a terrific tool.
You can grab the working code from my Github.
If you are interested in going further with Streamlit try this routing and auth approach.
If you found useful this post please do follow and share! Also, any comments and input are welcome.