A Small Python Library For Marketing Mix Modeling: MaMiMo

Marketing Analytics

Hello! I noticed that people are really interested in my articles about Marketing mix modeling, and that’s why I have created a small present for you: a small library that helps you create simple marketing mix models yourself! In this article, I will show you how to use it.

⚠ Obligatory warning ⚠: I tried my best to make this library as solid and error-free as possible. However, there might still be bugs, so please always do some sanity checks afterwards before reporting anything to the stakeholders. If you spot an error or request a feature, just drop me a message or – even better – create a pull request on Github! 😉

In case you don’t know what Marketing Mix Modeling is, imagine you are in a company selling stuff. To sell even more of that said stuff, you do advertising. At some point, you want to know how good your advertising was performing per channel, i.e., TV, radio, web banners, … and answer questions like: "This 1000 € that I put into TV ads that week, by how much did this increase my revenue?". Marketing mix modeling is a simple way to do that. You can find more information in my articles about it:

Introduction to Marketing Mix Modeling in Python

An Upgraded Marketing Mix Modeling in Python

The library I created is creatively called MaMiMo, and if you know how to use scikit-learn, you can also use this one. Let the fun begin with a simple

pip install mamimo

A Small Example

If you have read both of the top articles – which I assume from now on – you probably remember that we used an artificial dataset there. We then defined some saturation and carryover transformations to conduct marketing mix modeling. I put a similar but slightly more complicated example dataset into mamimo to get you started:

from mamimo.datasets import load_fake_mmm

data = load_fake_mmm()

X = data.drop(columns=['Sales'])
y = data['Sales']

This gives us a dataset with a date index on a weekly basis, three media channels, and a sales column that we want to explain.

Let us plot the sales to see what is going on:

Here, we can see a general upward trend. There might also be seasonality, but it is hard to tell just by looking at it. Let us also address the elephant in the room: this big spike in the first week of January 2020. We will assume this was a very important day for our product, not just an outlier.

However, let us keep it simple for the moment and just try to explain the sales using the TV, radio, and banner spendings. As in the second article, we want to build the following model:

Starting Out

We can straightforwardly implement this pipeline using mamimo’s carryover and saturation submodules, as well as its LinearRegression that is more flexible than scikit-learn’s version. What we need from scikit-learn, though, is the pipelining functionality.

As a reminder: ColumnTransformer applies different pipelines to different columns. We need it to have different hyperparameters for different channels.

from mamimo.carryover import ExponentialCarryover
from mamimo.saturation import ExponentialSaturation
from mamimo.linear_model import LinearRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

adstock = ColumnTransformer(
    [
     ('tv_pipe', Pipeline([
            ('carryover', ExponentialCarryover()),
            ('saturation', ExponentialSaturation())
     ]), ['TV']),
     ('radio_pipe', Pipeline([
            ('carryover', ExponentialCarryover()),
            ('saturation', ExponentialSaturation())
     ]), ['Radio']),
     ('banners_pipe', Pipeline([
            ('carryover', ExponentialCarryover()),
            ('saturation', ExponentialSaturation())
     ]), ['Banners']),
    ]
)

model = Pipeline([
    ('adstock', adstock),
    ('regression', LinearRegression(positive=True))
])

This yields a model that is not very good on its own because it still needs hyperparameter tuning.

print(model.fit(X, y).score(X, y))

# Output:
# 0.10985072579909416

Even training and evaluation on the same set (never 👏 do 👏 that 👏 in 👏 production 👏 ) yields a pretty bad result – we are underfitting. So, let us tune some hyperparameters.

Hyperparameter Tuning

Let us use sklearn’s RandomSearchCV today to tune the hyperparameters. We will tune the saturation exponent and the carryover strength and length for all channels separately. You can address the radio carryover length via adstock__radio_pipe__carryover__window, for example.

from scipy.stats import uniform, randint
from sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit

tuned_model = RandomizedSearchCV(
    model,
    param_distributions={
        'adstock__tv_pipe__carryover__window': randint(1, 10),
        'adstock__tv_pipe__carryover__strength': uniform(0, 1),
        'adstock__tv_pipe__saturation__exponent': uniform(0, 1),
        'adstock__radio_pipe__carryover__window': randint(1, 10),
        'adstock__radio_pipe__carryover__strength': uniform(0, 1),
        'adstock__radio_pipe__saturation__exponent': uniform(0, 1),
        'adstock__banners_pipe__carryover__window': randint(1, 10),
        'adstock__banners_pipe__carryover__strength': uniform(0, 1),
        'adstock__banners_pipe__saturation__exponent': uniform(0,1),
    },
    cv=TimeSeriesSplit(),
    random_state=0,
    n_iter=100
)

This basically tries to find the best hyperparameters in the ranges 0 to 1 for the carryover strengths and saturation exponents and integers between 1 and 10 (weeks) for the carryover length. The algorithm tries n_iter=100 different random hyperparameter combinations and evaluates the _r_² using a 5-fold expanding window time series split using sklearn’s TimeSeriesSplit(). If we feel like it, we could also use the MAPE, as business people enjoy this metric, using the scoring='neg_mean_absolute_percentage_error' keyword.

After training, we can check the best hyperparameters:

print(tuned_model.best_params_)

# Output:
# 'adstock__banners_pipe__carryover__strength': 0.6817399450693612,
# 'adstock__banners_pipe__carryover__window': 1,
# 'adstock__banners_pipe__saturation__exponent': 0.097493384215085,
# 'adstock__radio_pipe__carryover__strength': 0.8518536993666015,
# 'adstock__radio_pipe__carryover__window': 1,
# 'adstock__radio_pipe__saturation__exponent': 0.1598452868541913,
# >>> 'adstock__tv_pipe__carryover__strength': 0.04680635471218875,
# >>> 'adstock__tv_pipe__carryover__window': 4,
# 'adstock__tv_pipe__saturation__exponent': 0.0038603515102610952

So the model thinks that the TV carryover effect lasts four weeks, with about 4.68% of the effect carried into the next week, for example. Statements like this are gold for the business.

This output lets you explain your model to the stakeholders in a simple and comprehensible way.

Still, the model is terrible, so I would not trust the hyperparameters found so far.

The output has the same problems as before: no trend, and the spike can also not be explained.

In times like this, we should incorporate more features that might explain the sales. This can be the price of the product, time features, google trends, the weather, holidays, and anything you want.

We will now enhance the model by giving it a trend, the one-hot encoded month, as well as the information that something happened in the first week of January 2020.

Incorporating Time Features

I also added some convenience functions to add more features. Just look at that:

from mamimo.time_utils import add_time_features, add_date_indicators

X = (X
     .pipe(add_time_features, month=True)
     .pipe(add_date_indicators, special_date=["2020-01-05"])
     .assign(trend=range(200))
)

This adds

a month column (integers between 1 and 12),
a binary column named special_date that is 1 on the 5th of January 2020 and 0 everywhere else, and
a (so far linear) trend which is only counting up from 0 to 199.

All of these features get refined in a moment when we build the next model. In addition to the media channels, we will do the following preprocessing of the new features:

the month gets one-hot encoded
the linear trend may be raised to some power, e.g., trend² for a trend that grows quadratically (business would be happy)
the special date can have a carryover effect as well, i.e., we say that not only the week of the 5th of January 2020 is important, but also potentially the weeks after it, same logic as for carryover in media channels

from mamimo.time_utils import PowerTrend
from mamimo.carryover import ExponentialCarryover
from mamimo.saturation import ExponentialSaturation
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

cats =  [list(range(1, 13))] # different months, known beforehand

preprocess = ColumnTransformer(
    [
     ('tv_pipe', Pipeline([
            ('carryover', ExponentialCarryover()),
            ('saturation', ExponentialSaturation())
     ]), ['TV']),
     ('radio_pipe', Pipeline([
            ('carryover', ExponentialCarryover()),
            ('saturation', ExponentialSaturation())
     ]), ['Radio']),
     ('banners_pipe', Pipeline([
            ('carryover', ExponentialCarryover()),
            ('saturation', ExponentialSaturation())
     ]), ['Banners']),
    ('month', OneHotEncoder(sparse=False, categories=cats), ['month']),
    ('trend', PowerTrend(), ['trend']),
    ('special_date', ExponentialCarryover(), ['special_date'])
    ]
)

new_model = Pipeline([
    ('preprocess', preprocess),
    ('regression', LinearRegression(
        positive=True,
        fit_intercept=False) # no intercept because of the months
    )
])

Fitting this still-untuned model shows a much better performance:

Wow! This is nearly a perfect fit already. Adding the trend and the indicator on the 5th of January 2020 helped a lot already. Just imagine what happens after we tune the hyperparameters.

The Last Tune

Let us tune the new hyperparameters as well. Maybe a linear trend is not the best we can do. Furthermore, all the carryovers are non-existent so far because this is the default behavior of ExponentialCarryover if you don’t provide hyperparameters. We will now create one more hyperparameter tuning job:

from scipy.stats import randint, uniform
from sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit

tuned_new_model = RandomizedSearchCV(
  new_model,
  param_distributions={
    'preprocess__tv_pipe__carryover__window': randint(1, 10),
    'preprocess__tv_pipe__carryover__strength': uniform(0, 1),
    'preprocess__tv_pipe__saturation__exponent': uniform(0, 1),
    'preprocess__radio_pipe__carryover__window': randint(1, 10),
    'preprocess__radio_pipe__carryover__strength': uniform(0,1),
    'preprocess__radio_pipe__saturation__exponent': uniform(0, 1),
    'preprocess__banners_pipe__carryover__window': randint(1, 10),
    'preprocess__banners_pipe__carryover__strength': uniform(0, 1),
    'preprocess__banners_pipe__saturation__exponent': uniform(0, 1),
    'preprocess__trend__power': uniform(0, 2),           # new
    'preprocess__special_date__window': randint(1, 10),  # new
    'preprocess__special_date__strength': uniform(0, 1), # new
  },
  cv=TimeSeriesSplit(),
  random_state=0,
  n_iter=1000, # some more iterations, takes more time
)

tuned_model.fit(X, y)

The modeling of the spike got worse, but I would argue that the rest looks much better now. Maybe we could model the spike better again by trying more hyperparameter combinations, but let us assume we are satisfied with the result now. Let us get the hyperparameters:

print(tuned_new_model.best_params_)

# Output:
# 'preprocess__banners_pipe__carryover__strength': 0.98037507922965,
# 'preprocess__banners_pipe__carryover__window': 1,
# 'preprocess__banners_pipe__saturation__exponent': 0.1763329074644,
# 'preprocess__radio_pipe__carryover__strength': 0.9417421432655166,
# 'preprocess__radio_pipe__carryover__window': 1,
# 'preprocess__radio_pipe__saturation__exponent': 0.069184804692642,
# 'preprocess__special_date__strength': 0.8667029791268241,
# 'preprocess__special_date__window': 6,
# > 'preprocess__trend__power': 1.463860555363072,
# 'preprocess__tv_pipe__carryover__strength': 0.3422263312509606,
# 'preprocess__tv_pipe__carryover__window': 1,
# 'preprocess__tv_pipe__saturation__exponent': 0.3591065076533001

Great, so the trend is not linear, but of the form t^1.464 instead, meaning even stronger than linear.

If you are interested in the linear regression coefficients as well, you can get them via

import pandas as pd

best_model = tuned_new_model.best_estimator_
pd.Series(
    best_model.named_steps['regression'].coef_,
    index=best_model[:-1].get_feature_names_out()
)

# Output:
# tv_pipe__TV                    3389.936227
# radio_pipe__Radio              2278.722723
# banners_pipe__Banners          2455.014524
# month__month_1                 2724.333162
# month__month_2                 2991.294009
# month__month_3                 4080.414741
# month__month_4                 4542.696378
# month__month_5                 3484.384654
# month__month_6                 3785.648376
# month__month_7                 2497.006415
# month__month_8                 2068.016338
# month__month_9                 1883.746572
# month__month_10                2179.914547
# month__month_11                2135.526582
# month__month_12                2233.341158
# trend__trend                      9.801207
# special_date__special_date    96822.051131

Computing The Channel Contributions

Now that we have a trained model, we want to know how much every channel contributed to weekly sales. For convenience, I created a breakdown function to do just that.

from mamimo.analysis import breakdown

contributions = breakdown(tuned_new_model.best_estimator_, X, y)
ax = contributions.plot.area(
    figsize=(16, 10),
    linewidth=1,
    title="Predicted Sales and Breakdown",
    ylabel="Sales",
    xlabel="Date",
)

handles, labels = ax.get_legend_handles_labels()
ax.legend(
    handles[::-1],
    labels[::-1],
    title="Channels",
    loc="center left",
    bbox_to_anchor=(1.01, 0.5),
)

Phew, that’s a mishmash of colors – highly incomprehensible. That is because the grouping is too unnecessarily fine-grained; we have a color for each channel and month, for example.

We can just take all the month__ variables, as well as trend__trend and put it into the baseline, i.e., the sales that we would have regardless of our media spendings, according to the model. We can tell the breakdown function to sum up these fine-grained contributions like this:

group_channels = {
    'Baseline': [f'month__month_{i}' for i in range(1, 13)] + ['Base', 'trend__trend']
} # read: 'Baseline consists of the months, base and trend.'
  # You can add more groups!

contributions = breakdown(
    tuned_new_model.best_estimator_,
    X,
    y,
    group_channels
)

This produces a smaller data frame

Let’s give this data frame nicer names again

contributions.columns = [
    'TV', 'Radio', 'Banners',
    'Special Date', 'Baseline'
]

and then plot

Sweet! The content is relatively easy to understand now:

the baseline is increasing over time and seems to be seasonal as well
TV, radio, and banners all have some contribution
we can see the effect of the special date

Computing Return on Investments

I did not create a convenience function for this (yet?), but you can also calculate the ROIs for each channel like this:

for channel in ['TV', 'Radio', 'Banners']:
    roi = contributions[channel].sum() / X[channel].sum()
    print(f'{channel}: {roi:.2f}')

# Output: 
# TV: 0.33
# Radio: 0.47
# Banners: 1.23

From here, it seems like TV and radio do not perform well because the ROIs are below 1, i.e., each Euro that we spent on TV (radio) in the last 200 weeks only generated 0.33 € (0.47 €) in return on average.

Banners performed well on average; each Euro we spent there turned into 1.23 €.

Note: These statements might change if we consider other time periods, i.e. only the year 2019 or only 2020. For example, for 2019 we could compute it like

for channel in ['TV', 'Radio', 'Banners']:
    roi = contributions.loc['2019-01-01':'2019-12-31', channel].sum() / X.loc['2019-01-01':'2019-12-31', channel].sum()
    print(f'{channel}: {roi:.2f}')

# Output:
# TV: 0.36
# Radio: 0.50
# Banners: 1.22

That’s all, folks!

Conclusion

I presented my new library MaMiMo to you to make your marketing mix modeling life easier. It integrates well with scikit-learn and similar libraries, and it lets you do the following things (so far!):

define saturations (Exponential, Hill, Adbudg, BoxCox)
define carryovers (Exponential, Gaussian)
add time features (day, month, week of month, …, trend)
change the trend
analyze the model by checking the channel contributions

Install it via pip install mamimo !

I hope that you learned something new, interesting, and useful today. Thanks for reading!

If you have any questions, write me on LinkedIn!

And if you want to dive deeper into the world of algorithms, give my new publication All About Algorithms a try! I’m still searching for writers!

All About Algorithms

A Small Python Library For Marketing Mix Modeling: MaMiMo

Marketing Analytics

A Small Example

Starting Out

Hyperparameter Tuning

Incorporating Time Features

The Last Tune

Computing The Channel Contributions

Computing Return on Investments

Conclusion

Related Articles

Implementing Convolutional Neural Networks in TensorFlow

What Do Large Language Models “Understand”?

How to Forecast Hierarchical Time Series

Hands-on Time Series Anomaly Detection using Autoencoders, with Python

3 AI Use Cases (That Are Not a Chatbot)

Back To Basics, Part Uno: Linear Regression and Cost Function

Must-Know in Statistics: The Bivariate Normal Projection Explained

Our Columns

Optimizing Marketing Campaigns with Budgeted Multi-Armed Bandits