The world’s leading publication for data science, AI, and ML professionals.

No Free Lunch with Feature Bias

Nonlinear models may not automatically incorporate pairwise interactions

Hands-on Tutorials

Photo by Alla Hetman on Unsplash
Photo by Alla Hetman on Unsplash

The potential of nonlinear models to capture interactions has led to suggestions that sensitive features such as sex or race should be included in models to mitigate feature bias [1]. This viewpoint challenges the conventional wisdom that sensitive features should be left out of models because of ethical concerns about using these as decision bases. Here, I will argue that the ability of tree-based ensemble models to spontaneously incorporate interactions is imperfect. Interactions may be only partially captured, or sensitive features can be incorporated into models in a "main-effect-like" way. We are not necessarily correcting feature bias, just compensating for it in a non-specific manner.

Feature bias occurs when the meaning of predictor variables in a Machine Learning models varies across groups. This is especially likely to occur for self-reported information or features that rely on human judgment. Some examples might be gender differences in self-reported health information [2,3], performance scores that reflect implicit biases of assessors, or racial biases in arrest rates for minor crimes when human discretion is involved in which offenses to pursue and which to ignore [1]. When features reflecting such biases are included in models, they have the potential to unjustly penalize some groups.

I started reading papers on fairness topics due to concerns about my own project. I wanted to understand and follow best practices. One paper, which I found to be very helpful overall, suggested some mitigation strategies for feature bias [1]:

"Fortunately, in the absence of label bias (i.e., measurement error in y), such feature bias is statistically straightforward to address. Specifically, one can include group membership (e.g., race and gender) in the predictive model itself, or alternatively, one can fit separate risk models for each group. In this manner – and under the assumption that y is accurately measured – the statistical models would automatically learn to appropriately weight predictors according to group membership. For example, when predicting violent recidivism, a model might learn to down-weight past drug arrests for black defendants."

When I first read this, I didn’t think much of it. It made sense at first that nonlinear models would incorporate relevant interactions into their solutions. However, I’ve recently been forced to challenge those assumptions.

My wrong thinking went something like, "if there is an interaction effect between features x and y, during training, trees will be built that first split on x, and then y, and thereby capture dependencies." When I think a little harder, I realize that that may be how a human with foreknowledge of interactions would build a tree, but machine learning models function differently.

My first clue that my thinking might be incorrect was Scott Lundberg’s article in Towards Data Science [4]. He simulates feature bias in an XGBoost model and shows changes in aggregated Shapley plots with and without a sensitive feature included in the model. To me, these plots did not look as though the biased feature had interactions with the sensitive feature, and I wondered why not (I will discuss the specifics of the Shapley-based technique later).

After looking at Lundberg’s plots, I read some existing literature on pairwise interactions in tree-based models. Although not specific to issues of fairness or feature bias, existing literature provides reason for caution in assuming that tree-based models will incorporate pairwise interactions. Several authors have shown that although tree-based models can incorporate interactions [5], there is no guarantee that they will do so [6, 7].

In the context of fairness, we need to understand not just whether interactions are incorporated, but to what extent, and whether sensitive features contribute to any degree in a "main-effect-like" way. If an interaction is captured weakly, we may not be fully mitigating bias. Or, if a model contains any "main effect like" contributions from a sensitive feature, we risk stereotyping, potentially increasing rather than reducing bias. We may be compensating for feature bias instead of correcting it, which could improve population metrics but negatively impact individual results.

In the sections below, I will simulate a feature bias scenario, making up a sensitive feature that has no effect on the actual outcome, but which affects the values of an influential feature. In this scenario, the sensitive feature acts only through interactions with the biased feature. However, I will show that the model incorporates the expected interaction weakly, and the sensitive feature has a substantial marginal influence. Thus, I am under-correcting feature bias, while risking stereotyping.

Example Dataset

I use the same public loans dataset [8] that I used in a previous blog post [9]; this file contains information about loan amounts and borrower characteristics, as well as loan default status, which is the outcome I will model. As before, I create an artificial, simplified "gender" category with just 2 levels. However, for this article, I assign gender at random, assigning 40% of cases to the "female" category. Source code for this project can be found on GitHub [10].

I use my made-up gender category to introduce feature bias into the annual income feature, which is a strong overall predictor [9]. I use a binary female indicator. For the females only, I reduce the value of the income feature by 40% on average, with some randomization in the amount of suppression. The result is an exaggerated feature bias scenario where the average income for males is $62k, while for females it is $35k.

In this example, gender (which is made up) has no effect at all on whether someone actually defaults on a loan, but it impacts how reliably their income is reported. I will explore how well introducing the gender feature into the model captures interactions and compensates for feature bias. I focus on 2 models:

  1. A model built using all predictors in the data set, but with the income feature modified to simulate feature bias
  2. The same as model 1, but also including a female status feature.

All results are measured on a hold-out validation data set. I show analysis for a random forest models because results are simple to interpret. I repeated all analysis with XGBoost models and saw less under-correction, but substantial main effects of the female feature; I plan to discuss the XGBoost model in a future blog post on feature bias mitigation strategies.

Demographic Parity

Because assignment is random, actual outcomes for the two populations are very similar. However, feature bias leads to differences in predictions across groups, as shown below:

Inclusion of the sensitive feature seems to correct rates to a very modest extent, making me think that inclusion of the sensitive feature is not fully compensating for feature bias. Whether any effects are due to incorporation of interactions in the model, or to a "main effect-like" adjustment of probabilities, remains to be seen.

Accumulated Local Effect Plots

Accumulated Local Effect (ALE) plots are one method for illustrating the average effect of a feature value on the model prediction. ALE plots are like partial dependence plots, except they are designed to adjust for correlations that can bias averages when unlikely combinations of feature values are included in calculations [11].

For my purposes, ALE plots are highly desirable because they can reveal the main effects of the female feature, after accounting for interactions, including the income-female interaction. Therefore, the single-feature ALE plot for the female feature reflects main-effect-like contributions only. If I am strictly correcting for feature bias and not introducing stereotyping, I expect 1-way ALE values for the female feature to be near zero.

I create ALE plots for sample validation data consisting of 50% males and 50% females. For the model that includes both income and female status, here is the ALE plot for the female feature:

1-way ALE plot for the female indicator. Image by author.
1-way ALE plot for the female indicator. Image by author.

We see some evidence of a main effect, with female status tending to compensate for feature bias across the board.

Two-way ALE plots show second-order effects only [11], enabling isolation of the income-female interaction:

2-way ALE plot for income and female status. Image by author.
2-way ALE plot for income and female status. Image by author.

In the two-way plot, the interaction effect varies by income. At lower incomes, the interaction term pushes male default rates higher, while the trend is reversed at high incomes.

Comparing values in the two-way plot to the single-feature result provides information about the relative strengths of the main effect and interaction terms. Note that, for many income levels, the magnitudes of the ALE interaction values (from the 2-way plot) are comparable to the main effect ALE values (from the 1-way plot). Therefore, we can conclude that our model includes both interaction and main effects from the female feature. We are to some extent correcting our feature bias, but to a similar extent we are making decisions based on gender only (stereotyping).

Inspection of Trees

Although not quantitative, we can assess how female status is incorporated into a model by investigating tree structures. If female status is incorporated into a model only as an interaction with income, we expect that the female feature will occur only on decision paths that also involve income.

This random forest model contains 150 trees with an average of 1,360 nodes each. Therefore, direct inspection of trees is impractical. But by tracing decision paths in each tree, we can get information about interactions. The table below summarizes information about decision paths for all tress (mean column) and for 3 randomly selected individual trees.

The average tree contains about 16 paths that involve the female feature but not income – these reflect unexpected contributions from the sensitive feature. These could be "main effect like" if these paths don’t disproportionately contain specific other features, or they could represent spurious interactions. Interestingly, one of the sample trees (#1) includes the female feature only in combination with income, which is what we would hope for in our solution. However, many other trees show extraneous effects from the female feature.

Aggregated Shapley Values

Aggregation of Shapley Values was suggested by Scott Lundberg to identify features contributing to population differences in model outcomes [4]. Because individual probabilities can be averaged to get group probabilities, and individual probabilities can be distributed across features using Shapley, population differences can also be distributed across features. Lundberg’s article shows the effect of including a sensitive feature in a dataset that includes both feature and label bias; in his example, adding a sensitive feature does not fully compensate for feature bias [4]

I apply Lundberg’s technique to my models here. I modify his technique to examine female samples using a "foil", or reference set, consisting of males only (see also [9]). The resulting plots show contributions to the excess probability for females compared to males. I look at the top 5 features, grouping the remaining features into an "other" category, and plot Shapley aggregate values with and without the sensitive feature:

Aggregated Shapley values for female cases compared to a male reference. Colors indicate the feature bias model with and without the female feature. Image by author.
Aggregated Shapley values for female cases compared to a male reference. Colors indicate the feature bias model with and without the female feature. Image by author.

For random forest, the impact of the income feature increases modestly when the female feature is introduced to the model, and we see a negative contribution from the female feature. We also see small changes in contributions from other features.

How should inclusion of the female feature affect the Shapley plot? With feature bias, we expect the direction of the interaction term to be opposite that of the original effect of income. The Shapley value from the income feature will include contributions from the main effect and some from interactions. Depending on how interactions are incorporated into the model, and whether the main effect is altered by inclusion of the sensitive feature, the Shapley contribution from the income feature might increase, decrease, or remain roughly the same.

The Shapley plot clearly shows that feature bias is not fully corrected by inclusion of the sensitive feature. Since I assigned female status completely at random, and the aggregated Shapley plot shows probability differences for females vs. males, I expect the combined contribution from income plus female status to be around zero. The two effects should be similar and opposite, but it’s clear the female effect is significantly weaker than the income effect.

Random Forest Model Summary

In this example scenario, inclusion of female status in the model under-corrects for feature bias while incorporating some main effects of female status, introducing stereotyping risk. This is consistent with existing literature on limitations in the abilities of tree-based models to incorporate pairwise interactions [6, 7].

Even if adding a sensitive feature isn’t a perfect fix, is it better than doing nothing? I think that answer would vary depending on the situation. In some scenarios, fairness may be improved by inclusion of such a feature. For this example, I would personally say that inclusion of female status isn’t justified. Its benefits in terms of changing final predictions are very modest, and it introduces risk. Not to mention that inclusion of such a feature might be a very tough sell with compliance departments!

In addition, my simulation of feature bias is in some ways very unrealistic. Here, feature bias affects all members of one group to a similar extent. In a real scenario, it’s much more likely that feature bias would affect some members of all groups, but in differing proportions. When feature bias affects only some members of each group, individuals would be able to reasonably argue that "if I were a different gender (or race, etc.), I would have been approved".

Furthermore, my example had female status linked only with income. Female status was neither correlated with any other feature, nor did it have any independent effects. Group membership may be corelated with other features (included in the data or not), in which case it might be incorporated into the model as a proxy or substitute for these. This would complicate feature bias correction via this method.

Final Thoughts

Existing literature on feature bias implies that introducing a sensitive feature automatically mitigates feature bias. However, Shapley and ALE plots, as well as existing research on interactions outside of the fairness context, would suggest caution in assuming that tree-based models will reliably incorporate important pairwise interactions. A simple example shown here demonstrates under-correction, as well as a main-effect-like contribution that might introduce stereotyping risk.

The fact that sensitive features don’t necessarily correct bias doesn’t mean that we should never use inclusion as a mitigation strategy. Adding a sensitive feature to a model may be the best option in some situations. But it’s probably a good idea to consider other methods. Whichever strategy is chosen, it’s crucial to thoroughly test your model to ensure that the method is behaving as expected! I plan to discuss some other possible feature bias mitigation strategies in a future blog post.

References

[1] S. Corbett-Davies and S. Goel, The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning (2018), Working paper, arXiv.org.

[2] N.E. Betz, L. Mintz and G. Speakmon, Gender differences in the accuracy of self-reported weight (1994), Sex Roles 30:543–552.

[3] M. Sieverding, A.L. Arbogast, S. Zintel, and C. von Wagner, Gender differences in self‐reported family history of cancer: A review and secondary data analysis (2020), Cancer Medicine, 9:7772–7780.

[4] Scott Lundberg, Explaining Measures of Fairness (2020), Towards Data Science.

[5] M.N. Wright, A. Ziegler and I.R. König, Do little interactions get lost in dark random forests? (2016), BMC Bioinformatics 17,145.

[6] S. Th. Gries. On classification trees and random forests in corpus linguistics: Some words of caution and suggestions for improvement (2019), Corpus Linguistics and Linguistic Theory 16,3:617–647.

[7] A-L Boulesteix, S. Janitza, A. Hapfelmeier, K. Van Steen, and C. Strobl, Letter to the Editor: On the term ‘interaction’ and related phrases in the literature on Random Forests (2015), Brief Bioinform 16,2:338–345.

[8] [4] h2o.ai. Lending Club Data Set, https://raw.githubusercontent.com/h2oai/app-consumer-loan/master/data/loan.csv.

[9] V. Carey, Fairness Metrics Won’t Save You from Stereotyping (2020), Towards Data Science.

[10] V. Carey. GitHub Repository, https://github.com/vla6/Stereotyping_ROCDS.

[11] C. Molnar, 5.3 Accumulated Local Effects (ALE) Plot, Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2018).


Related Articles