The world’s leading publication for data science, AI, and ML professionals.

Your Data Science Visualizations Will Never Be The Same – Plotly & Dash

Using Plotly and Dash to create interactive dashboards

Data Viz

Your Data Science Visualizations Will Never Be The Same – Plotly & Dash

Photo by Isaac Smith on Unsplash
Photo by Isaac Smith on Unsplash

Not so long ago, I wrote a simple intro to four Python Data Visualization libraries where I showcased their pros and cons, and used practical examples to show what they are capable of.

As we’re going to get deeper into the ones that I like the most, I highly encourage you to check that article first, as this one will expand on what was shown there:

Building Interactive Data Visualizations with Python – The Art of Storytelling

Today we’ll focus on Plotly[1] and Dash[2]. Why two? Because they go hand-in-hand. As I stated in the article linked above, "Dash isn’t a plotting library per se. It’s an amazing framework used to generate dashboards."

So Plotly is the library we use to plot, and Dash is the framework we use to generate cool, interactive dashboards from those plots.

Here’s the set of steps we’ll follow to build today’s dashboard:

  • Setup and installation – to get us in the proper state.
  • Some simple use cases – to show how Plotly works
  • Building a dashboard with Dash – to create the best dashboards.
  • Conclusions – to wrap up the story and see the results.

Before going deeper, we need to talk about the data. We need some sort of data to be able to visualize it, right? Keeping up with most of my latest Medium content, I’ll be focusing on sports and, more concretely, football (soccer).

I’ll be using Statsbomb’s free data[3] from the 2015–16 LaLiga campaign.

There’s a lot of data from that season but I want to visualize Futbol Club Barcelona’s players’ performance focusing mostly on attacking terms: shots, goals, assists…

The purpose might differ based on the analyst’s position: are you a Real Madrid analyst? Then I’m sure you’ll want to decipher how your team can stop Leo Messi (spoiler: you can’t).

But if you work within the Barça organization, you might want to just check your player’s numbers and see where some players perform better than others.

Whatever it is, always make sure you define your goals before creating any dashboard – there’s so much info you can visualize that you have to purposely pick the plots you want to look at.

And always aim for simplicity; non-technical people will have to draw conclusions from your dashboards.

Setup and Installation

I like to keep things ordered and structured. So the first thing we’ll do is create a new directory in whatever path you want your app to be hosted in. I’ll create it on my Desktop, for simplicity. Here are the two commands I run on a terminal:

$ cd ~/Desktop
$ mkdir plotly-dash

Now, the next natural step is to create a new Python environment within the new directory. I’ll use pipenv [4] but you can use your virtualenv management tool of preference (or none).

If you haven’t got pipenv installed in your machine, then run this command first:

$ pip install --user pipenv

Then, create the environment:

$ cd plotly-dash
$ pipenv shell

This will create a new environment and automatically activate it. Anything you install now from that terminal is going to be installed on the environment only.

So let’s start installing libraries using pip:

(plotly-dash) $ pip install dash pandas statsbombpy

Yep, by installing these three we’ll have more than enough. They all have their own dependencies and we’re going to take advantage of some of them like Plotly or NumPy.

With everything set up, we’re now ready to start exploring Plotly.

Visualizing Data with Plotly

My recommendation here is to test it from a jupyter notebook, as it will make your development phase more fluid. In this case, you should also install it – I promise it’s the last installation we run – and we’ll also open it:

(plotly-dash) $ pip install notebook
... (installation outputs)

(plotly-dash) $ jupyter notebook

As always, we’ll need to prepare the data and we’ll create a new notebook called plotly.ipynb. To avoid extremely large notebooks and files, I like to modularize my code. For that reason, I created a src directory within the project folder and added two new files there: functions.py and classes.py. The structure now looks like:

- plotly-dash/
    - src/
        - classes.py
        - functions.py
    - plotly.ipynb

The first function I’ll create is going to be called prepare_team_data() and will return events, shots, and assist data from the specified team (in our case, Barcelona).

As the function itself is not useful for today’s purposes, because we want to focus on plotting and creating dashboards, I won’t put the code to the function. But you have the link to the whole code in the Resources section[5].

# Third-party libraries
import pandas as pd
from statsbombpy import sb

# Custom modules
from src.functions import prepare_team_data

events, shots, assists = prepare_team_data('Barcelona')
shots.head()

And here’s a snapshot of what the shots DF looks like.

shots DF screenshot - image by the author
shots DF screenshot – image by the author

Good, let’s start with shots then. I want to plot a player’s shot distribution, to see where he shoots from more and where his goals come from. To do this, I’ve created a FootballPitch class in the classes.py module.

This class allows us to plot a complete football pitch, half of it (the attacking half, it is), or even a heatmap as we’ll be doing.

Again, you can find the code in the GitHub link[5] in the Resources section at the bottom of this article. But we’re going to inspect it a little bit because here’s where we’ve used a lot of Plotly’s gifts.

The class has basically two methods: plot_pitch() and plot_heatmap. As we’re first interested in displaying the player shots, let’s start with the first one by dividing it into little code chunks.

Note that you will see some variables and class attributes that we haven’t assigned any value to. These are function parameters or initialized when creating the object.

First thing first: let’s declare the essential variables the function will use.

# Fig to update
fig = go.Figure()

# Internal variables
self.height_px = self.pitch_width*10*zoom_ratio
self.width_px = self.pitch_length*10*zoom_ratio

pitch_length_half = self.pitch_length/2 if not self.half else 0
pitch_width_half = self.pitch_width/2
corner_arc_radius = 1

centre_circle_radius = 9.15

goal = 7.32
goal_area_width = goal + (5.5*2)
goal_area_length = 5.5
penalty_area_width = goal_area_width + (11*2)
penalty_area_length = goal_area_length + 11
penalty_spot_dist = 11
penalty_circle_radius = 9.15

Now that we have the figure declared, what we’ll do over and over again is add traces or shapes into it to customize it as we want. So, for example, the first thing the function does is plot a rectangular shape, being it the pitch itself:

fig.add_trace(
    go.Scatter(
        x=[0, self.pitch_length, self.pitch_length, 0, 0], 
        y=[0, 0, self.pitch_width, self.pitch_width, 0], 
        mode='lines',
        hoverinfo='skip',
        marker_color=line_color,
        showlegend=False,
        fill="toself",
        fillcolor=bg_color
    )
)

Here, we add a trace which is a scatterplot with mode lines – meaning we want a line, not a real scatter plot with independent dots. The parameters are pretty self-explanatory, such as the x and y (the data we want to plot), the colors… The hoverinfo label is used to determine what we want to show when we hover our mouse over these lines. As we’re building the pitch as part of the background and isn’t telling us anything about the data we want to analyze, I’m setting it to skip.

Then we set some extra configurations into the figure’s layout:

fig.update_layout(
    yaxis_range=[-self._vertical_margin, self.pitch_width + self._vertical_margin], 
    xaxis_range=[-self._horizontal_margin, self.pitch_length + self._horizontal_margin],
    height=self.height_px,
    width=self.width_px,
    plot_bgcolor='rgba(0,0,0,0)',
    xaxis=dict(showgrid=False, visible=False),
    yaxis=dict(showgrid=False, visible=False)
)

That gives us the following result:

Football pitch (grass only) - image by the author
Football pitch (grass only) – image by the author

And we now have our pitch plotted. Not really meaningful… Yet.

Plotting in Plotly is really this easy! By adding some more traces and shapes into the plot, here’s what my pitch background ends up looking like:

Football pitch - image by the author
Football pitch – image by the author

Now, you might not be interested in displaying a football pitch. That’s why I didn’t put all the code here… But great dashboards are the result of creativity and skills, and plotting a pitch is a great way to display football events that happen on the pitch (if we’re interested in location).

So let’s get going and start displaying real data!

As we want to display shots – and goals – a scatter plot looks like a fair option to use. Remember that we already have the data prepared, we just have to filter it and display it.

Let’s plot Leo Messi’s shots and goals:

import plotly.graph_objects as go
from src.classes import FootballPitch

player = 'Leo Messi'

pitch = FootballPitch(half=True)
fig = pitch.plot_pitch(False, bg_color='#C1E1C1') 

player_shots = get_player_shots(player, shots.copy(), pitch)
scatter_colors = ["#E7E657", "#57C8E7"]

for i, group in enumerate([True, False]):
    fig.add_trace(go.Scatter(
        x=player_shots[player_shots['goal'] == group]['x'],
        y=player_shots[player_shots['goal'] == group]['y'],
        mode="markers",
        name='Goal' if group else 'No Goal',
        marker=dict(
            color=scatter_colors[i],
            size=8,
            line=dict(
                color='black',
                width=1
            )
        ),
    ))

fig.update_layout(
    title='Shot distribution'
)

The first part is self-explanatory: we just declare variables, instantiate the pitch, store the figure in the fig variable, and run a function that filters the shots data frame to return only the player ones.

Then, in a 2-iteration loop, we add a scatter plot twice: one for shots that weren’t a goal (which will be displayed in blue) and one for shots that resulted in a goal. The result:

Leo Messi's shot and goal distribution in 2015/16 - image by the author
Leo Messi’s shot and goal distribution in 2015/16 – image by the author

And what makes plotly amazing is that this plot is fully interactive. We can play around hovering our mouse to see the real shot locations, hide the non-goal ones to inspect just the scoring shots…

Let’s go ahead now and build a line plot. It will be interactive of course, and we’ll use it to inspect the player’s shots by quarter and also to compare it with his teammates’ and the team’s average.

To do so, we’ll start by grouping shots in a quarterly manner (in 15-minute chunks) for each player. The next part will be plotting the values themselves and playing with line opacity to highlight the current player (Messi).

player = 'Leo Messi'
max_shots = 0
fig = make_subplots()

for p in shots.player.unique():
    player_shots = get_player_shots(p, shots)

    xy = 15 * (player_shots[['float_time', 'minutes']]/15).round()
    xy = xy.groupby(['float_time']).count()[['minutes']]

    max_shots = xy.minutes.max() if xy.minutes.max() > max_shots else max_shots

    fig.add_trace(
        go.Scatter(
            name=p,
            x = xy.index, 
            y = xy.minutes,
            mode='lines',
            opacity=1 if p == player else 0.2
        )
    )

Now that we have all players ready, we’ll add the team’s average as a dashed line. The code does exactly the same as the snippet above but uses team-level data.

# Add team's avg
xy = 15 * (shots[['float_time', 'minutes']]/15).round()
xy = xy.groupby(['float_time']).count()[['minutes']]/len(shots.player.unique())

fig.add_trace(
    go.Scatter(
        name="Team's Average",
        x = xy.index, 
        y = xy.minutes,
        line = go.scatter.Line(dash='dash'),
        marker=None,
        mode='lines'
    )
)

And we’ll end up adding some styling to the layout:

fig.update_xaxes(range=[0, 91])
fig.update_layout(
    #title='Shots by Quarter',
    margin=dict(l=20, r=20, t=5, b=20),
    xaxis = dict(
        tickmode = 'array',
        tickvals = xy.index.values
    ),
    height=200,
    plot_bgcolor="#F9F9F9", 
    paper_bgcolor="#F9F9F9",
    yaxis_range=[-3,max_shots+5]
)

The result:

The green, highlighted line is Leo Messi’s data (as the label shows when I hovered over the 60th-minute shot count). For some reason, maybe due to fatigue, Messi’s shots decreased during the 60-75 minutes but they increased in the last minutes of the game.

We see most of the team’s shots during the last 15 minutes decrease but Leo goes the other way. This shows a lot about his impact on the team and his desire to win.

Anyway, enough for the intro. We’ve managed to plot two different plots and also create an amazing background for our plots. I think we’ve covered more than Plotly’s basics.

Creating the Dashboard

A dashboard is just a combination of plots displayed in an ordered and attractive way. And we already have the plots created – we did it in the last section – so we just need to display them.

Now, it isn’t as straightforward. We’ll have to add some changes to the code snippets shared above but I promise they’ll be tiny.

To complete the dashboard, I’ll add some more plots and functionalities to make it fully interactive.

Having Dash already installed, I’ll create a new file called app.py:

- plotly-dash/
    - src/
        - classes.py
        - functions.py
    - plotly.ipynb
    - app.py

And the file’s template will start being this simple:

from dash import html, Dash, dcc, Input, Output, callback

app = Dash(__name__) 

if __name__ == '__main__':
    app.run(debug=True)

If you went on and executed the file (python app.py), you’d get a message in your terminal like the next one:

(plotly-dash) $ python app.py
Dash is running on http://127.0.0.1:8050/

 * Serving Flask app 'app'
 * Debug mode: on

Go ahead and navigate to http://127.0.0.1:8050/. You’ll see a blank page, but that’s actually your dashboard.

Let’s start adding stuff, shall we? Take a look at the next code.

@callback(
    Output('shot_distribution', 'figure'),
    Input('player_dropdown', 'value')
)
def create_shot_distribution(player):
    pitch = FootballPitch(half=True)
    fig = pitch.plot_pitch(False, bg_color='#C1E1C1', zoom_ratio=0.8) 

    player_shots = get_player_shots(player, SHOTS.copy(), pitch)

    scatter_colors = ["#E7E657", "#57C8E7"]

    for i, group in enumerate([True, False]):
        fig.add_trace(go.Scatter(
            x=player_shots[player_shots['goal'] == group]['x'],
            y=player_shots[player_shots['goal'] == group]['y'],
            mode="markers",
            name='Goal' if group else 'No Goal',
            marker=dict(
                color=scatter_colors[i],
                size=8,
                line=dict(
                    color='black',
                    width=1
                )
            ),
            #marker_color=scatter_colors[i] # #E7E657 i #57C8E7  
        ))

    fig.update_layout(
        margin=dict(l=20, r=20, t=5, b=20),
    )

    return fig

By now, it should sound familiar. It’s exactly the same code we used to display Messi’s shots… But now, instead of defining the player to be Leo Messi, it’s the function argument.

And where does this argument come from? Just above the function declaration, we have the callback decorator. These callbacks are what make Dash’s dashboards interactive.

We use them to determine the inputs and outputs of the associated app component. In this case, we’re saying that the function needs the player parameter which will come from the element called player_dropdown (which we haven’t defined yet).

As for the output, we made the function return the fig. Thanks to the callback decorator, the app knows that this will be the figure being used in the shot_distribution element from our dashboard.

You probably have too many questions right now. How do I define a dropdown or any interactable component? How do I actually get to plot the shot_distribution element?

Let’s start with the first question: the dropdown. Dash has its own core components (dcc) and the dropdown is one of them. Creating it is as simple as:

dcc.Dropdown(
    PLAYER_OPTIONS,
    'All players', 
    id='player_dropdown', 
    style={'width': '200px', 'margin': '20px auto', 'text-align': 'left'}
)

This will create a dropdown using all player names as possible options, using All players as the default value. But the most important part is the id. Here’s where we get to tell Dash that this dropdown is the one associated with the previous function’s input callback.

In other words, the value this dropdown has will be the player being shown on the shot distribution plot.

But we still need to place both of these components into our dashboard. The page remains blank.

You’ll need some HTML knowledge now, but basic knowledge will be more than enough (though it can get as complex as you want).

We need to place these components within an HTML code. Dash, again, makes it extremely easy for us to do so. In the case of the dropdown, it can be done by simply wrapping the code with an html.Div component, basically placing the dropdown within a <div></div> HTML element:

filter = html.Div(
    [
        dcc.Dropdown(
            PLAYER_OPTIONS,
            'All players', 
            id='player_dropdown', 
            style={
                'width': '200px', 
                'margin': '20px auto', 
                'text-align': 'left'
            }
        )
    ],
    style={'display': 'inline-block'}
)

The way this works is the html.Div can have many child elements (hence the list) and then we can set the element’s CSS style using the style attribute, which is a dictionary. Easy, right?

In the case of the shot distribution graph, here’s the equivalent:

shot_distribution_graph = html.Div(
    [
        html.H2('Shot Distribution'),
        dcc.Graph(id='shot_distribution', figure={})
    ], 
    style={
        'padding': '2%',
        'display': 'inline-block'
    }
)

Same structure, but to display graphs we use the dcc.Graph component and, as you probably guessed, the id attribute is key here too. It links this particular component with the output callback from the function we declared. So, whatever is computed there, will be displayed here.

We have now wrapped the components with HTML code. But they aren’t being displayed yet. We need to add them to the dashboard’s layout:

app.layout = html.Div([
    shot_distribution_graph, filter
], style={
    'width': '1650px', 
    'margin': 'auto'
})

No secret here; the structure is the same but on a higher level. We’re placing the previous <div></div> elements into a big one (the whole website container) and providing some extra styling. Now, yes, if you refresh the website or restart the app, you’ll see your first results:

Shot distribution plot - image by the author
Shot distribution plot – image by the author

Amazing what we’ve built already, right? This interactability is powerful.

To finish this section, let’s do the same but with the other plot we built. This time, I’ll paste the whole new code here so you can check it all at once:

# Functions
@callback(
    Output('shots_by_quarter', 'figure'),
    Input('player_dropdown', 'value')
)
def create_shots_by_quarter(player):
    fig = make_subplots()

    max_shots = 0

    for p in SHOTS.player.unique():
        player_shots = get_player_shots(p, SHOTS)

        xy = 15 * (player_shots[['float_time', 'minutes']]/15).round()
        xy = xy.groupby(['float_time']).count()[['minutes']]

        max_shots = xy.minutes.max() if xy.minutes.max() > max_shots else max_shots

        fig.add_trace(
            go.Scatter(
                name=p,
                x = xy.index, 
                y = xy.minutes,
                mode='lines',
                opacity=1 if p == player else 0.2
            )
        )

    # Add team's avg
    xy = 15 * (SHOTS[['float_time', 'minutes']]/15).round()
    xy = xy.groupby(['float_time']).count()[['minutes']]/len(SHOTS.player.unique())

    fig.add_trace(
        go.Scatter(
            name="Team's Average",
            x = xy.index, 
            y = xy.minutes,
            line = go.scatter.Line(dash='dash'),
            marker=None,
            mode='lines'
        )
    )

    fig.update_xaxes(range=[0, 91])
    fig.update_layout(
        margin=dict(l=20, r=20, t=5, b=20),
        xaxis = dict(
            tickmode = 'array',
            tickvals = xy.index.values
        ),
        height=200,
        plot_bgcolor="#F9F9F9", 
        paper_bgcolor="#F9F9F9",
        yaxis_range=[-3,max_shots+5]
    )

    return fig

# Dashboard's layout components
shots_by_quarter = html.Div(
    [
        html.H2('Shots By Quarter', style={'margin-top': '20px'}),
        dcc.Graph(id='shots_by_quarter', figure={})
    ],
    style={
        'padding': '2%'
    }
)

# Create layout
app = Dash(__name__)
app.layout = html.Div([
    shot_distribution_graph, filter, shots_by_quarter
], style={'width': '1650px', 'margin': 'auto'})

# Run app
if __name__ == '__main__':
    app.run(debug=True)
Resulting dashboard with two plots - image by the author
Resulting dashboard with two plots – image by the author

Now, this is functional. But it isn’t really attractive… HTML and CSS will be our tools making it more visually appealing (even though I’m not good at design).

However, this is outside of our scope. Our goal was to create a dashboard and we’ve done it. This one’s really simple but if you managed to understand everything we did, how the final dashboard was done I shared at the beginning and will share again in the next section will be no secret to you (again, the code is freely available at the bottom of this article)

Wrapping Up

Today we built a dashboard with two plots and one dropdown. But we can scale it as needed. For example, knowing how to place a dropdown, we know how to place one slider. And, what about two?

Everything we learned today can be applied to any data you want to visualize, from economic reports to medical results or ad campaign insights. I chose to apply it to football because I’m deeply passionate about it, but please generalize the knowledge and apply it anywhere.

Knowing how to place two plots, we can create many, many more. And different ones: one showing assists, another showing the player’s influence on the pitch, the comparison between his goals and the expected… And with all this plus a little bit of HTML and CSS, we get the final dashboard:

Final dashboard - image by the author
Final dashboard – image by the author

I really hope you can see how good this tool is.

Dash and Plotly must be in any data analyst’s skillset. They are amazing libraries we can use to share our data and insights in a way that’s highly customized – i.e. adapted to your needs – and easy to comprehend.

Thanks for reading the post! 

I really hope you enjoyed it and found it insightful.

Follow me and subscribe to my mail list for more 
content like this one, it helps a lot!

@polmarin

Resources

[1] Plotly: Low-code Data App Development

[2] Dash Documentation and User Guide – Plotly

[3] Free Data | StatsBomb

[4] Pipenv: Python Dev Workflow for Humans

[5] Plotly & Dash Project Code – GitHub


Related Articles