Álvaro Méndez Civieta, Author at Towards Data Science

Squashing the Average: A Dive into Penalized Quantile Regression for Python

Álvaro Méndez Civieta — Fri, 16 Aug 2024 15:02:09 +0000

This is my third post on the series about penalized regression. In the first one we talked about how to implement a sparse group lasso in python, one of the best variable selection alternatives available nowadays for regression models, and in the second we talked about adaptive estimators, and how they are much better than their traditional counterparts. But today I would like to talk about quantile regression. and delve into the realm of high-dimensional quantile regression using the robust asgl package, focusing on the implementation of quantile regression with an adaptive lasso penalization.

Today we will see:

What is quantile regression
What are the advantages of quantile regression compared to traditional least squares regression
How to implement penalized quantile regression models in Python

What is quantile regression

Let’s kick things off with something many of us have probably encountered: least squares Regression. This is the classic go-to method when we’re looking to predict an outcome based on some input variables. It works by finding the line (or hyperplane in higher dimensions) that best fits the data by minimizing the squared differences between observed and predicted values. In simpler terms, it’s like trying to draw the smoothest line through a scatterplot of data points. But here’s the catch: it’s all about the mean. Least squares regression focuses solely on modeling the average trend in the data.

So, what’s the issue with just modeling the mean? Well, life isn’t always about averages. Imagine you’re analyzing income data, which is often skewed by a few high earners. Or consider data with outliers, like real estate prices in a neighborhood with a sudden luxury condo development. In these situations, concentrating on the mean can give a skewed view, potentially leading to misleading insights.

Advantages of quantile regression

Enter quantile regression. Unlike its least squares sibling, quantile regression allows us to explore various quantiles (or percentiles) of the data distribution. This means we can understand how different parts of the data behave, beyond just the average. Want to know how the bottom 10% or the top 90% of your data are reacting to changes in input variables? Quantile regression has got you covered. It’s especially useful when dealing with data that has outliers or is heavily skewed, as it provides a more nuanced picture by looking at the distribution as a whole. They say one image is worth a thousand words, so let’s see how quantile regression and least squares regression look like in a couple of simple examples.

Image by author: Examples comparing quantile regression and least squares regression.

These two images show very simple regression models with one predictive variable and one response variable. The left image has an outlier on the top right corner (that lonely dot over there). This outlier affects the estimation provided by least squares (the red line), which is way out of way providing very poor predictions. But quantile regression is not affected by outliers, and it’s predictions are spot-on. On the right image we have a dataset that is heteroscedastic. What does that mean? Picture your data forming a cone shape, widening as the value of X increases. More technically, the variability of our response variable isn’t playing by the rules – it expands as X grows. Here, the least squares (red) and quantile regression for the median (green) trace similar paths, but they only tell part of the story. By introducing additional quantiles into the mix(in blue, 10%, 25%, 75% and 90%) we are able to capture how our data dances across the spectrum and see its behavior.

Implementations of quantile regression

High-dimensional scenarios, where the number of predictors exceeds the number of observations, are increasingly common in today’s data-driven world, popping up in fields like genomics, where thousands of genes might predict a single outcome, or in image processing, where countless pixels contribute to a single classification task. These complex situations demand the use of penalized regression models to manage the multitude of variables effectively. However, most existing software in R and Python offers limited options for penalizing quantile regression in such high-dimensional contexts.

This is where my Python package, asgl, appears. asgl package provides a comprehensive framework for fitting various penalized regression models, including sparse group lasso and adaptive lasso – techniques I’ve previously talked about in other posts. It is built on cutting-edge research and offers full compatibility with scikit-learn, allowing seamless integration with other machine learning tools.

Example (with code!)

Let’s see how we can use asgl to perform quantile regression with an adaptive lasso penalization. First, ensure the asgl library is installed:

pip install asgl

Next, we’ll demonstrate the implementation using synthetic data:

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from asgl import Regressor

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=200, n_informative=10, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define and train the quantile regression model with adaptive lasso
model = Regressor(model='qr', penalization='alasso', quantile=0.5)

# Fit the model
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

# Evaluate the model
mae = mean_absolute_error(y_test, predictions)
print(f'Mean Absolute Error: {mse:.3f}')

In this example, we generate a dataset with 100 samples and 200 features, where only 10 features are truly informative making it a high dimensional regression problem). The Regressor class from the asgl package is configured to perform quantile regression (by selecting model='qr') for the median (by selecting quantile=0.5). If we are interested in other quantiles, we just need to set the new quantile value somewhere in the (0, 1) interval. We solve an adaptive lasso penalization (by selecting penalization='alasso'), and we could optimize other aspects of the model like how the adaptive weights are estimated etc, or use the default configuration.

Advantages of `asgl`

Let me finish by summarising the benefits of asgl:

Scalability: The package efficiently handles high-dimensional datasets, making it suitable for applications in a wide range of scenarios.
Flexibility: With support for various models and penalizations, asgl caters to diverse analytical needs.
Integration: Compatibility with scikit-learn simplifies model evaluation and hyperparameter tuning

And that’s it on this post about quantile regression! By squashing the average and exploring the full distribution of the data, we open up new possibilities for data-driven decision-making. Stay tuned for more insights into the world of penalized regression and the asgl library.

The post Squashing the Average: A Dive into Penalized Quantile Regression for Python appeared first on Towards Data Science.

Sparse Group Lasso in Python

Álvaro Méndez Civieta — Wed, 05 Aug 2020 09:05:36 +0000

How to use one of the best variable selection techniques in regression

Preparing to use LASSO and catch some meaningful variables. Photo by Priscilla Du Preez on Unsplash

So im here to talk about the wonderful asglpackage (the name comes from Adaptive Sparse Group Lasso) that adds a lot of features that were already available in R packages but not in Python, like solving sparse group lasso models, and goes beyond that, adding extra features that improve the results that sparse group lasso can provide.

I would like to start talking about the sparse group lasso: what is it and how to use it. Specifically, here we will see:

What is sparse group lasso
How to use sparse group lasso in python
How to perform k-fold cross validation
How to use grid search in order to find the optimal solution.

What is sparse group lasso

To understand what is sparse group lasso we need to talk (briefly) about two techniques: lasso and group lasso. Given a risk function, for example the linear Regression risk,

Risk function of a linear regression model

Lasso: is defined by adding a penalization on the absolute value of the β coefficients,

Lasso penalty formula

This definition provides sparse solutions, because it will send to zero some of the β coefficients (the least related with the response variable). The effect of this penalization can be controlled using the λ parameter. A large λ value provides solutions where the penalization has a greater importance, and thus there are more zeros among the β coefficients. This is mainly useful in high dimensional datasets, where there are more variables than observations but we only expect a small fragment of the variables to be truly meaningful.

However, there are situations in which the predictor variables in X have a natural grouped structure. For example, in biostatistics it is common to deal with genetic datasets in which predictors are grouped into genetical pathways. In stock market analysis one can group companies from the same business segment. In climate data one can group different regions… And lasso provides individual sparse solutions, not group sparse.

Group lasso: So here comes **** group lasso to the rescue. Group lasso is built as the sum of squares of coefficients belonging to the same group.

Group lasso penalty formula

This way it takes into account the possible grouped structure of predictors, and it sends to zero whole groups of variables. If all the groups are of size 1 (only one predictor per group) we will be solving a lasso model. Lets see lasso and group lasso graphically,

Lasso, group lasso and ridge penalizations comparison

In the image above we have a simple problem with three coefficients, β₁ β₁₁ and β₁₂. The last two coefficients form a group, and as we can see, lasso (left image) does not take this grouping information into account, but group lasso does. So group lasso can be seen as lasso between groups and ridge within groups. If a group is meaningful, we select the whole group. If it is not, we send it to zero.

Sparse group lasso: and finally here it is,

Sparse group lasso penalty function

Sparse group lasso is a linear combination between lasso and group lasso, so it provides solutions that are both between and within group sparse.

This technique selects the most meaningful predictors from the most meaningful groups, and is one of the best variable selection alternatives of recent years. However, there was no implementation of sparse group lasso for python… until now.

Moving to python: install asgl

Lets start by installing asgl. This can be easily done using pip

pip install asgl

Or alternatively, one can pull the github repository and run the setup.py

git clone https://github.com/alvaromc317/asgl.git
cd asgl
python setup.py

Import libraries

Once we have the package installed, we can start using it. First, let’s create some data to analise.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from asgl import Regressor

X, y = make_regression(n_samples=1000, n_features=10, bias=10, noise=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=250)
group_index = np.array([1, 1, 2, 2, 3, 3, 4, 4, 4, 4])

Here, in addition to importing the dataset, we have created a variable called group_index . This variable describes the group structure of the data, so if we have 10predictors, group_index should be a variable of length 10, and if the first two predictors form a group, they should have the same group_index value. However, our dataset does not have a natural grouped structure, so here we define a fake one just for the sake of this article.

Parameters for the sgl model

model = Regressor(model='lm', penalization='sgl', lambda1=0.1, alpha=0.5)
model.fit(X_train, y_train, group_index)

predictions = model.predict(X_test)
mse = mean_squared_error(predictions, y_test)

If we have a look to the sparse group lasso equation above, we can see that there are two parameters, α and λ, that can be optimized. λ controlls how much weight we want to give to the penalization, so larger λ values produce more sparse solutions. And α controls the tradeoff between lasso and group lasso. α equal to 1 provides a lasso, and α equal to 0 provides a group lasso. Now, usually, we can define a grid of possible values for both parameters and try to find the combination that minimizes the error.

Additionally, we specify the type of model to solve (lm, because we are solving a linear model), the penalization (sgl, because we want the sparse group lasso) and compute the mean squared error using scikit-learn’s functions.

Cross validation

We can define a grid of possible values for the hyperparameters α and λ, and find the optimal combination using cross validation from scikit-learn.

from sklearn.model_selection import GridSearchCV

model = Regressor(model='lm', penalization='sgl')

param_grid = {'lambda1': [1e-4, 1e-3, 1e-2, 1e-1, 1], 'alpha': [0, 0.2, 0.4, 0.6, 0.8, 1]}
gscv = GridSearchCV(model, param_grid, scoring='neg_median_absolute_error')
gscv.fit(X_train, y_train, **{'group_index': group_index})

So first, we define our model, then define the grid of possible values for the hyperparameters, and finally we initialize and run the `GridSearchCV“object that will perform cross validation on the grid of all the possible combinations of hyperparameters. This means that the function will analise all the 30 possible models (5 possible values for λ and 6 possible values for α).

We can finally see what our optimal model is by looking into

gscv.best_estimator_

As simple as that, we have found our optimal model.

And that’s it on how to implement sparse group lasso in python. I hope you enjoyed this post and found it useful, so stay tuned for future posts on this series and please do not hesitate on contacting me if you have any question / suggestion.

For a deeper review on what the asgl package has to offer, I recommend reading the jupyter notebook provided in the github repository.

Have a good day!

The post Sparse Group Lasso in Python appeared first on Towards Data Science.