The world’s leading publication for data science, AI, and ML professionals.

Must-Know in Statistics: The Bivariate Normal Projection Explained

Derivation and practical examples of this powerful concept

Introduction

In statistics and Machine Learning, understanding the relationships between variables is crucial for building predictive models and analyzing data. One of the basic techniques for exploring these relationships is the bivariate projection, which relies on the concept of the bivariate normal distribution. This technique allows for the examination and prediction of the behavior of one variable in terms of another, utilizing the dependency structure between them.

Bivariate projection helps determining the expected value of one random variable given a specific value of another variable. For instance, in linear regression, projection helps estimate how a dependent variable changes with respect to an independent variable.

This article is divided into 3 parts: in the first part, I will explore the fundamentals of bivariate projection, deriving its formulation and demonstrating its application in regression models. In the second part, I will provide some intuition behind the projection and some plots to better understand its implications. In the third part, I will use the projection to derive the parameters for a linear regression.

In my derivation of the bivariate projection formula, I will use some well known results. In order not to be too heavy on the reader, I will provide the proofs and references of my statements in the Appendix at the end of the article.

Part 1: Formula for bivariate normal projection

Let Z be a random vector distributed as a normal bivariate N(μ, Σ) , where

Form of Z, where X and Y random variables distributed as a normal univariate
Form of Z, where X and Y random variables distributed as a normal univariate
Forms for the mean and the covariance matrix of Z, in term of means and variances of X and Y. ρ is the correlation between X and Y.
Forms for the mean and the covariance matrix of Z, in term of means and variances of X and Y. ρ is the correlation between X and Y.

Then the conditional distribution of Y given X = x is normal and is given by:

You will find the derivation of this result in the Appendix at the end of the article
You will find the derivation of this result in the Appendix at the end of the article

This is the density of a normal distribution with conditional mean

and conditional variance

Now we can write the linear projection of Y on _X,_ that is the conditional mean of Y given X = x:

This is a linear relationship between Y and X, since it is the linear projection of Y on X.

What does this formula tell us? What can we use it for in real applications? Let’s find out!

Part 2: Interpretation and simulations

Bivariate projection plays a crucial role in predictive modeling by allowing us to estimate the expected value of one variable based on the value of another. I will make a practical example using a linear regression.

In addition to its predictive power, bivariate projection provides valuable insights into the nature and strength of the relationship between two variables. For instance, I will use this result in a further article when dealing with the Kyle model for a Market Maker that tries to control the order flow. In this model, the Market Maker tries to understand the expected value of the security given the order flow.

Another machine learning application is the detection of anomalies or outliers. This process becomes more manageable through the projection, as it highlights deviations from the expected relationship between variables.

Before making a practical example using a linear regression, I will run some Python simulations to better highlight the form of bivariate normal distributions and what to expect by its projection.

In the following plots, the random variables X and Y are distributed as standard normals N(0, 1). We will see how the plots change when setting different values for their correlation ρ.

A first edge case could be to set ρ = 0, meaning that the two random variables are not correlated:

Here the two random variables are centered in their mean value 0 and their scatter plot has the form of a circle. This suggests that the variables are independent. There is no clear linear relationship between the variables. In the following 3D plot you can appreciate even more the form of the distribution.

Let us now apply the projection formula and see what happens to the distribution of Y for different values of X = x.

As we could imagine, the distribution of Y is not affected by different values of x. The mean value of Y and its variance remain the same.

Let us now see what happens for a more meaningful correlation. Let us set ρ = 0.9 :

The mean value is still centered in 0 for both variables, but the scatter plot shows a clear linear relation. The 3D plot is the following, you can appreciate the fact that now the distribution doesn’t show the form of a "cone" like the previous example.

Plotting the projection we can now see that the distribution of Y is actually affected by different values of x. It is interesting to notice the fact that the mean value of Y is dependent on x (as its value depends on the difference between x and _μX) while the variance of Y doesn’t change with x, as it is only dependent by the correlation ρ. Moreover, notice that the variance is smaller than the case with _ρ=0,_ since it scales with 1-ρ².

The last case that I will show is ρ = – 0.9. The comments are very similar as the previous case:

Part 3: Application – linear regression

Let us now apply the projection to a simple machine learning case: the linear regression. Let’s suppose that we want to build a machine learning model that predicts the price of a house (Y variable) using the values of its surface (X variable). We have a dataset with historical data for X and Y.

Let’s suppose that the variables are distributed as follows and that they have a linear relation:

We want to build a model capable of predicting the value of Y given a certain specific value for X:

where betas represent the coefficients of the linear regression, like the usual case:

Using the projection formula, we have

In this way, we can use the (estimated from dataset) parameters of the distributions to estimate the linear regression coefficients. Let’s first equate the two expressions:

rearrange the terms on the right side to separate the terms that multiply x and the terms that don’t:

For the equation to hold, the parameters should be:

Notice that these are the estimators for a linear regression parameters!

Conclusion

As we saw in this article, the linear projection is a powerful tool to use in Statistics. Its applications are various and you might be impressed how many times it is used non-explicitly.

You can find the code used to generate the plots here.

As usual, if you have any questions or suggestions, feel free to comment or reach out (you will find my contacts in my GitHub readme).


References

[1] Joel Hasbrouck (2007). Empirical Market Microstructure, Chapter 7

[2] Alex Tsun, _Probability & Statistics with Applications to Computing, Chapter 5.9_

Unless otherwise noted, all images are by the author.


Appendix – Bivariate Linear Projection derivation

Let us start by defining the join density function for two random variables X and Y:

For reference: https://web.stanford.edu/class/archive/cs/cs109/cs109.1218/files/student_drive/5.9.pdf
For reference: https://web.stanford.edu/class/archive/cs/cs109/cs109.1218/files/student_drive/5.9.pdf

Compute the determinant of the covariance matrix:

and the inverse of the covariance matrix:

For reference: https://math.stackexchange.com/questions/21533/shortcut-for-finding-a-inverse-of-matrix#:~:text=For%20a%202x2%20matrix%2C%20the,'%2C%20just%20memorize%20that%20pattern.
For reference: https://math.stackexchange.com/questions/21533/shortcut-for-finding-a-inverse-of-matrix#:~:text=For%20a%202×2%20matrix%2C%20the,’%2C%20just%20memorize%20that%20pattern.

Substitute this in the expression for the density function and we obtain:

Now, the marginal probability density function for a normal bivariate are normal univariates. The marginal function for X is given by the following formula:

For reference: https://en.wikipedia.org/wiki/Marginal_distribution
For reference: https://en.wikipedia.org/wiki/Marginal_distribution

Now we can finally compute the conditional distribution of Y given X=x. Notice that this is still normal:

For reference: https://en.wikipedia.org/wiki/Conditional_probability_distribution
For reference: https://en.wikipedia.org/wiki/Conditional_probability_distribution

Substituting the joint density function and the marginal density function, we obtain the projection density

The projection formula is now the expectation of Y given X = x, that can be computed by integrating the projection density function. Notice that the quadratic term inside the exponential can be interpreted as the random variable minus its mean. In this case, the mean is mu_Y shifted by the term depending on x. We appreciated this effect when plotting the distributions. The variance is scaled by 1-ρ² .

The expected value of the distribution is then

that is the bivariate projection.


Related Articles