December 13, 2020

Create a web app for your data science project in under an hour

Prefer to watch this? Check out my Streamlit tutorial on YouTube.

Having a portfolio is crucial to land the data science job of your dreams. And as long as you get your hands dirty working with data working with interesting use cases, you will impress your future employer. But you can always go the next mile when it comes to presenting... and make an interactive web app out of the project you built.

Until very recently this required one to learn web development and start complex React or Angular projects but no more!

Streamlit is an amazing tool that makes it extremely easy to build an interactive front-end. It is specially made with data science projects in mind and thus has a lot of useful functionality to show off your projects.

Let's walk through how to set up a very simple yet very impressive front-end that you can use to show off your project.

If you're thinking, "But Mısra, I don't have a project I can build a web app on yet!", fear not. I have just the thing for you. My Hands-on Data Science course is specifically designed to help you get your first data science project out. In the course, we get hands-on immediately and learn by doing. Go check it out. I promise you will be impressed by how fast you can progress when you put your practical skills to the test and experience working on a real-life project first hand. And then, you can build and be proud of your web app.

Download and install Streamlit

Make sure you have pip installed on your computer. Pip is the package installer for python. Once you have pip, you can go ahead and install streamlit using:


  pip install streamlit

You can check that it is installed with:


  streamlit hello

Now, create a folder where you will have everything for your front-end in. Inside that folder, create another folder named data, that’s where you will have the data you worked with and a python file called main.py

This is what your folder should look like

Decide on a design

Before we go any further and start creating the front-end I think it’s a good idea to sketch out what you want it to look like. At this point it might be good to know what Streamlit is capable of in terms of layout. Basically by default Streamlit displays things linearly. So everything you want to show goes below the previous thing. And as an addition, you can also have a sidebar. On this sidebar, you can have field to collect input from the suser like sliders, text fields, drop down selections etc.

But with a recent update Streamlit now also lets you divide your content into containers and columns. Containers divide the page into horizontal sections and columns let you divide it into vertical sections. Columns go inside containers but you don’t have to have columns at all. You can just have one column in the middle. 

So in the light of this information, here is what I sketched for my app. I am not using the sidebar because I want to be able to use the whole width of the screen and also show 2 column comfortably.

Quick design on paper

Set up main structure of the page

Now it’s time to code! Open your main.py file and import streamlit. And right away create all the horizontal sections I want.

 
  import streamlit as st

  siteHeader = st.beta_container()
  dataExploration = st.beta_container()
  newFeatures = st.beta_container()
  modelTraining = st.beta_container()

Let’s add some of the text we want. Here is how you write text:


  st.text(‘Hello, welcome to my app!’)

Very simple right? And there are a couple of options when it comes to writing headings and titles.


  st.title() # corresponds to H1 heading
  st.header() # corresponds to H2 heading
  st.subheader() # corresponds to H3 heading

This is what my code looks like after I add some text. Note that the ‘with’ divider helps me write code that belong to a certain container inside its limits.


  import streamlit as st

  siteHeader = st.beta_container()
  dataExploration = st.beta_container()
  newFeatures = st.beta_container()
  modelTraining = st.beta_container()

  with siteHeader:
    st.title('Welcome to the Awesome project!')
    st.text('In this project I look into ... 
      And I try ... I worked with the dataset from ...')

  with dataExploration:
    st.header('Dataset: Iris flower dataset')
    st.text('I found this dataset at... 
      I decided to work with it because ...')

  with newFeatures:
    st.header('New features I came up with')
    st.text('Let\'s take a look into the 
      features I generated.')

  with modelTraining:
    st.header('Model training')
    st.text('In this section you can select 
      the hyperparameters!')

At this point I want to see what my app looks like. So I open a terminal window and make sure to navigate to the folder where I have my main.py file. Then I run:


  streamlit run main.py    
  # or whatever the name of your python file is

The browser window automatically pops up and here is what I see:

That’s a great start! And actually, most of what we’re going to do next is normal Python code. Next up is actually using the data to build the first section. I get my data and put it into the data folder I created inside the folder for this streamlit application. 

By the way, when you make changes to your application, you do not need to run streamlit run main.py again. If you have not closed the browser window, you will see the rerun options on the top right of your screen.

Bring in your data

I want to create a visualization for my dataset. And here is simply how to do this.


  import pandas as pd.  #1

  taxi_data = pd.read_csv(‘data/taxi_data.csv’)  #2
  distribution_pickup = pd.DataFrame
    (taxi_data[‘PULocationID'].value_counts())   #3
  st.bar_chart(distribution_pickup)    #4

#1 import pandas library but put this at the beginning of the file

#2 read the data file

#3 calculate the amount of time each pick-up location ID occurs in the data

#4 display the values in a bar chart

In the next section, you might want to talk about some of the features you came up with. One useful way to show them is to use a list. Streamlit lets use have markdown text. That way you can decide how you want your text to look like. Here is a simple guide on how to edit markdown. https://www.markdownguide.org/basic-syntax/

In the new features container, I add the explanations:


  st.markdown('* **first feature:** this is the explanation')
  st.markdown('* **second feature:** another explanation’)

Here is how my app looks now:

Collect user input

In the last section, I want to get some input from the user. I want to show you three different ways to do this:

  • Slider
  • Drop down menu
  • Text input

You can create a slider simply with the function st.slider(). You need to determine the minimum value the slider can take, maximum value the slider can take, the default value the app should start with and the step meaning how much at a time should the slider move. I use it to determine the max_Depth I should use at my machine learning algorithm. And the way to get the input from the user is simply to write a variable equals to the slider function.


  max_depth = st.slider
    (‘What should be the max_depth of the model?', 
    min_value=10, max_value=100, 
    value=20, 
    step=10)

Similarly we can have a drop down menu. In streamlit it goes by the name of select box. You need to give it the options to be displayed as a list and the index of the option to be showed by default. 

As you can see you can have numerical and non-numerical values in the same list so it’s possible to have options like “All”, “No limit”. We read the selection by the user with the same technique.


  number_of_trees = st.selectbox
  	('How many trees should there be?', 
    options=[100,200,300,'No limit'], 
    index=0)

Lastly, you can also get text input from the user. You need to give the text_inpur function the prompt to show the user. You can choose to include a default value too. I also included a way to show the user their options before they give an input.


  st.text('Here is a list of features: ')
  st.write(taxi_data.columns)
  input_feature = st.text_input
    ('Which feature would you like to input to the model?', 
    'PULocationID')

In my original design, I wanted to have 2 columns in the last section. In order to do that, inside the container section I will define two columns.


  selection_col, display_col = st.beta_columns(2)    
  # 2 because I want to create 2 columns. 
  # You can create more if you want to.

But even though I create the columns, the input prompts still take up the whole width. In order to put them in one column, we need to assign them to columns. To do this, I change the streamlit identifier "st" with the name of the column I want to assign the component to.

So my code for the last section looks like this:


  with modelTraining:
    st.header('Model training')
    st.text('In this section you can select 
      the hyperparameters!')

    selection_col, display_col = st.beta_columns(2)

    max_depth = selection_col.slider
      ('What should be the max_depth of the model?', 
      min_value=10, 
      max_value=100, 
      value=20, 
      step=10)

    number_of_trees = selection_col.selectbox
      ('How many trees should there be?', 
      options=[100,200,300,'No limit'], 
      index=0)

    selection_col.text('Here is a list of features: ')
    selection_col.write(taxi_data.columns)
    input_feature = selection_col.text_input
      ('Which feature would you like to input to the model?', 
      'PULocationID')

And the section looks like this:

Lastly, I want to report the performance of the model trained with the choices of the user. It is just like any other time you train a machine learning model, only the hyper parameters are determined by the user.


  from sklearn.ensemble import RandomForestRegressor
  from sklearn.metrics import mean_absolute_error

  regr = RandomForestRegressor(max_depth=max_depth, 
    n_estimators=number_of_trees)    #1

  X = taxi_data[[input_feature]]     #2
  y = taxi_data[[‘trip_distance’]]     #3

  regr.fit(X, y) #4
  prediction = regr.predict(y) #5

  display_col.subheader('Mean absolute error:’) #6
  display_col.write(mean_absolute_error(y, prediction)) #7

#1  set up the model with the selected inputs

#2  set the input features again selected by the user

#3  set the output feature

#4 and #5 Fitting the model to the data, getting the predictions. (Ps. I didn’t separate the data into train and test for the ask of keeping the code simple.)

#6 Display an explanation for the metric

#7 Display the calculated metric for the performance of this certain settings

Optimize your app's run time

One other very nice feature of streamlit is caching. You might not be aware of it when working with small application built on small amounts of data but every time the user changes an input, the whole application runs from the beginning. If your application includes a lot of calculations, works with big amounts of data or does calls to a database to retrieve data, your application will quickly become too slow to interact with.

As a solution, the streamlit team developed “caching”. It is very simple to set up and when you assign a certain piece of code to be cached, the application saves the result of that piece of code and does not re-run it unless the input given directly to that piece of code has changed.

We can do this for reading the data in our case because once it is read, we do not make any changes on the data and we do not need it to be re-read from its source.

Al we have to do is to turn the data reading code into a function and decorate it with the cache decorator.


  @st.cache
  def get_data():
    taxi_data = pd.read_csv('data/taxi_data.csv')
    return taxi_data

Then we can call this function to load out data to the application


  taxi_data = get_data()

Personalize

Streamlit is designed to put interactive front-ends out there quickly. So that’s why there is not much flexibility when it comes to how the application looks. But there is still something we can do to personalize the application a little bit. All you need to do is to add this little section to your code.


  st.markdown(
      """
      
      """,
      unsafe_allow_html=True
  )

And by populating the gap between the quotes with css code, you can personalize your application. What I would recommend especially if you’re going to use multiple columns is to definitely make sure you set the width to be wider, so the application takes up more of the screen and is not crammed to the center of the webpage.

One example is changing the back ground color:

<style>
     .main {
     background-color: #FA6B6D;

     }

</style>

Let's wrap this up here. I will prepare a separate guide on how to deploy and share your streamlit apps with the world.