Imagine you’re a data scientist at a charming little pet shop specializing in just five products: two varieties of cat food and three types of dog food. Your mission? To help this small business flourish by accurately forecasting the weekly sales for each product. The goal is to provide a comprehensive sales forecast – total sales, as well as detailed predictions for cat food and dog food sales, and even individual product sales.
The Data
You have data on the sales of the different types of cat food A and B, as well as the different types of dog food C, D, and E for 200 days. Lucky for us, the data is exceptionally clean, with no missing values, and no outliers. Also, there is no trend. It looks like this:

Note: I generated the data myself.
In addition to the individual sales, we also have the aggregated sales for all cat food products, all dog food products, and all products. We call such a collection of time series hierarchical time series. In our case, they respect the following sales hierarchy:

The individual sales A, B, C, D, and E are at the bottom of the hierarchy, the total sales are at the top.
This structure translates to the following equations:
- sales_cat_A + sales_cat_B = sales_cat,
- sales_dog_C + sales_dog_D + sales_dog_E = sales_dog,
- sales_dog + sales_cat = sales_total.
So far so good.
Modeling
Now, the idea is to train a model for each of the eight time series and call it a day. You train the model on the first 140 days and try to predict the remaining 60. You want to produce forecasts for the next 60 days as well. After playing around it a bit, this is what you came up with (using exponential smoothing with an additive trend):

The performance on the test set (days 140–200) looks quite good, so you decide to send the forecasts for days 200–260 to your stakeholders. But a day later, you receive the following message:
"Hi, thanks again for these awesome forecasts! But there is something weird about them: Somehow they don’t add up. For example, if I add the predicted sales of products A and B, I don’t get the predicted sales of the whole cat food category. Can you please look into that again?" – Bob
Sweating, you look at the first few rows of your produced outputs:

You calculate the first row in the cat food category by hand: 93.85 + 160.42 = 254.27, but the direct model for sales_cat says 254.45, which is only 0.18 off, but still, it is not the same. Which numbers should you trust now?
Note: Often, when the time series are harder to learn, and the numbers get bigger, you can also be off by a few thousand or even worse.
Forecast Reconciliation
Oh damn, that makes sense from a technical perspective. You have just trained eight independent time series, so it would have been a miracle if they magically added up to one another. But don’t despair, there is a quite simple technique to save the day: forecast reconciliation!
The goal is to adjust the outputs that you have just created in a way that they respect the hierarchy again. The goal:
Individual cat food product sale forecasts add up to the cat food sale forecast. Individual dog food product sale forecasts add up to the dog food sale forecast. Last but not least, cat and dog food sales forecasts add up to the total sales forecast.
Forecast reconciliation sounds complicated, but it is a technique that comes in many flavors, some of which are very simple to understand and implement. Let us start with the easiest one.
Bottom-Up
The natural thing to do for me is to only forecast the individual product sales of A, B, C, D, and E – the bottom forecasts – and then add them up according to the hierarchy to create the forecasts of the higher levels.
So, basically:
- sales_cat_forecast := sales_cat_A_forecast + sales_cat_B_forecast,
- sales_dog_forecast := sales_dog_C_forecast + sales_dog_D_forecast + sales_dog_E_forecast,
- sales_total_forecast := sales_dog_forecast + sales_cat_forecast.
This also means that you only have to train five models instead of eight. So, you can take your individual product forecasts again and just add up a bunch of columns.

Now, you can add a column sales_cat by just summing sales_cat_A and sales_cat_B. Doing the same for sales_dog and finally sales_total, you create bottom-up reconciled forecasts:

Per definition, your forecasts respect the hierarchy now! 🎉 That was easy, right?
But what happens with the quality of your forecasts in the higher hierarchy levels if you do that?
Sure, at the bottom level, your forecasts are as good as before. But what is better on the higher, more aggregated levels? The direct modeling approach from before, or summing up forecasts from lower levels? The answer, as so often: It depends.
Some time series are hard to forecast, but the disaggregated, lower series might be easier. In this case, the bottom-up approach – apart from giving you coherent forecasts – could even improve the forecast quality.
But sometimes, the disaggregated time series are very jumpy and hard to predict, while on an aggregated level, forecasting would be simpler. Here, a direct model is better from the performance perspective.
In our small example, the result for sales_total looks like this:

Not a big difference visually, but in numbers, the direct forecast has a test MSE of 2547, while the reconciled bottom-up forecast has a higher MSE of 2699.
Top-Down
Where there is a bottom-up, there must also be a top-down. It is simple as well, but for me, it does not come as naturally as the bottom-up approach. In this approach, you only forecast a single time series: the topmost one! Once you have this, you can break it down to get forecasts for the lower levels. The idea is the following:
Assume you have an overall forecast for the total sales, and you know that historically, 40% of the sales come from cat food and 60% from dog food. Then, you can just multiply the total sales forecast by 40% to get the cat food sales, and multiply it by 60% to get the dog food sales.
You can do the same one level lower. If you know that 20% of the cat food sales come from product A and 80% from product B, multiply the cat food sales forecast by 20% to get a forecast for cat food product A, and multiply it by 80% to get a forecast for cat food product B. If you do it like that, you get a set of forecasts that respect the sales hierarchy.
So, you start with your direct model forecast of sales_total:

compute the percentages from historical data, and end up with the following:

However, the big problem that I have with this method is that all the time series that you create are perfectly positively correlated by construction.

Of course, there could be cases where all products behave the same way, but usually, this is very unlikely. Think about a department store selling fans and heaters. In summer, they sell a lot of fans, but rarely any heater. In winter, it is the other way around. This means that the time series are negatively correlated, if one goes up, the other goes down.
With the top-down approach, both time series can only have the same pattern, i.e., both go up together or both go down together. That’s why you have to be careful if you want to use the top-down approach.
Here is a comparison between a direct forecast and the top-down reconciled forecast for sales_cat_A:

Note: There is also the middle-out approach, where you forecast time series on a level in the middle and work yourself up or down again, as described in the bottom-up and top-down approaches. However, it has the same disadvantage as the top-down approach for the lower levels.
General Reconciliation
Ok, cool, so we have seen two simple techniques to reconcile forecasts. Just presented like this they look quite different, but they follow a general pattern that can be used to develop even better forecast reconciliation techniques!
We can describe this general reconciliation as matrix multiplication with some matrix A, turning the raw, incoherent forecasts into reconciled ones.

In order to find A, we define two other matrices first.
Summing Matrix
Let us define a matrix S – called the summing matrix – that encodes the hierarchy. We want this matrix to take all five bottom forecasts and sum them together to also get the higher forecasts in the hierarchy, like this:

It is easy to derive this matrix, in our case, it is the following:

If you do the matrix-vector multiplication (and you should!), you will see that you get what you want. But you can also read it as:
To get sales_total, you need all five bottom forecasts. To get sales_cat, you only need the first two bottom forecasts, which correspond to sales_cat_A and sales_cat_B, and so on.
This matrix is used for the bottom-up, top-down, and all other approaches.
Mapping Matrix
We need to define another matrix M – called the mapping matrix – that changes the reconciliation logic. This matrix should take all direct forecasts and turn them into a smaller number of forecasts – as many as there are bottom time series. In our example, it would take a vector of length 8 and turn it into a vector of length 5.
Since this is very vague, let us check how the M looks for the bottom-up approach in our example:

So, what can you do with this matrix now? Well, if you multiply this matrix with your eight incoherent base forecasts from the start, you end up with a shorter vector only containing the base vectors.

And if we multiply the vector on the right-hand side with S, we get exactly what the bottom-up approach gives us. So, if define A := S · M, we get what we want. We go from eight unreconciled base forecasts to eight reconciled forecasts, using the bottom-up approach.
Let us also do the top-down approach now. We only have to define another matrix M as

Here, the pᵢ‘s are the percentages that you multiply by the top forecast to get the bottom forecasts. Try it out by multiplying the matrix above with a concrete vector of length 8 of your choice!
Technically, implementing it like this works a bit differently than I explained the top-down approach before, but the result is exactly the same. If you do it via matrix multiplication, you first compute the bottom-level forecasts by multiplying the top-level forecast with some percentages. And then you use the matrix S to build the higher levels again.
More Mapping Matrices
So, we have seen that you can express the bottom-up as well as the top-down reconciliation approaches via matrix multiplication. To be more precise, the reconciliation works as follows:
- For each time series, train a model.
- Compute the matrix S (defined by the hierarchy, unique), and choose a matrix M (defines the algorithm, your choice).
- Make forecasts y with your models, and multiply the result with S · M (_y_rec = S · M · y_).
- You have other, reconciled forecasts _y_rec_ now.
Here is some code, so you can see how it works in detail:
import numpy as np
y_raw = np.array([
[1, 2], # forecasts of the total sales for the next 2 days
[4, 2], # forecasts of the cat food sales for the next 2 days
[2, 3], # ... dog food ...
[1, 0], # ... cat food A ...
[1, 2], # ... cat food B ...
[2, 1], # ... dog food C...
[0, 2], # ... dog food D...
[4, 3], # ... dog food E...
]) # you see that the cat food sales is not the sum of A and B
S = np.array(
[
[1, 1, 1, 1, 1], # the total sales are the sum of all bottom sales
[1, 1, 0, 0, 0], # cat food sales is the sum of only A and B
[0, 0, 1, 1, 1], # dog food sales is the sum of only C, D, and E
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
],
)
M = np.array(
[
[0, 0, 0, 1, 0, 0, 0, 0], # bottom up matrix
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
],
)
y_reconciled = S @ M @ y_raw # @ = matrix multiplication
print(y_reconciled)
# Output:
# [[8 8]
# [2 2]
# [6 6]
# [1 0]
# [1 2]
# [2 1]
# [0 2]
# [4 3]]
# You can see that the bottom 5 rows stay the same,
# as in the bottom up approach. The aggregates sales get replaced
# by sums of the bottom forecasts, e.g., [8 8] is the sum of the
# 5 bottom forecasts [1 0] + [1 2] + [2 1] + [0 2] + [4 3].
But what happens if we choose a completely different M? There is no reason why the special forms of the bottom-up or top-down mapping matrices M should be any good.
For example, Wickramasuriya et al. [1] found a matrix M that minimizes the total forecast variance of the reconciled forecasts. This is called the MinT (Minimum Trace) optimal reconciliation approach. However, this method is a bit more complicated and has its problems since you have to estimate error variances that you do not observe. There are some heuristics to construct M that work well in practice, though.
Luckily, many libraries such as sktime and the libraries created by Nixtla, implement all of the approaches I mentioned earlier.
Conclusion
In this article, we have explored the challenges associated with training independent models when there’s an expectation for aggregate consistency in predictions. For instance, the total forecasts for sales of individual items within a category should ideally match the forecasted sales for the entire category. However, independent models do not inherently ensure this alignment. To achieve coherence, one must engage in a subsequent step known as reconciliation, where the outputs of these models are adjusted to ensure they sum up correctly.
There are two simple ways to do the reconciliation that you can even implement on your own right away. For the more complicated, but potentially better methods, you can also resort to Time Series Forecasting libraries.
References
[1] Wickramasuriya, S. L., Athanasopoulos, G., & Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526), 804–819.
I hope that you learned something new, interesting, and valuable today. Thanks for reading!
If you have any questions, write me on LinkedIn!
And if you want to dive deeper into the world of algorithms, give my new publication All About Algorithms a try! I’m still searching for writers!