The world’s leading publication for data science, AI, and ML professionals.

Aesthetic Map: Bivariate Choropleth for Association Context on Spatial Data

How do you observe the relationship between the two variables? A scatter plot for non-spatial data, bivariate choropleth for spatial data.

Hands-on Tutorials

Introduction

Are the Covid-19 cases correlate with Urban Density? Does Population Density correlate with Gross Economic Output? What is the relationship between GDRP and Pollution? Are cities more productive than rural?

These questions can be answered by analyzing two spatial datasets. A simple scatter plot might answer it quickly. but in this article, I want to put heavy emphasis on the spatial aspect. This is because these questions rely on the geographical aspect of the elements. Truncating the aspect with a scatter plot might hide strategical or impactful insight regarding the actual nature of the data. As Tobler’s first rule of geography law stated:

‘everything is usually related to all else but those which are near to each other are more related when compared to those that are further away’ – Tobler, 1970

This article provides an introduction about the understanding association between data using geospatial visualization; that is understanding the correlation between 2 variables with spatial context. As stated, this is an option besides the scatter plot, and we’ll see how the geospatial method differs. The method to analyze is descriptively using bivariate choropleth.

(Note: there are many methods to observe the spatial relationship, and this method is just one of them. I choose this method because it’s simple, easy to understand, and visually communicative. Plus, it makes a beautiful map!)

Spatial Data in a Nutshell

Spatial data is data that has spatial dimensions. To put it simply, we can make a map of the data. Just like typical data, spatial data can be represented in a spreadsheet form. The difference between typical spreadsheet data and spatial data is spatial data has a geometry column that stores geometry information. This geometry information is usually represented in a Well-Known-Text format (check table 1 in the article on the link, or check this link).

For example, below’s figure: a table of Jakarta City’s administrative boundary with the respective population. Notice that it has a "geometry" column which stores the geographic information. Furthermore about the data format, please refer to my other article.

Choropleth Map

I’m sure you’ve encountered a choropleth map in your life, you just don’t know that it’s called "choropleth map". It’s a map like this:

A Choropleth Map of Jakarta Population (source: author, 2020; data: Indonesian Statistical Agency, 2020)
A Choropleth Map of Jakarta Population (source: author, 2020; data: Indonesian Statistical Agency, 2020)

This is the visualization of the previous spreadsheet data. Looks neat right? a great alternative to a typical bar chart. The spatial relationship is intuitive yet the quantitative information (population data) is delivered.

"Choropleth maps are shaded maps where the intensity of the colour is indicative of the intensity of the phenomenon in question." – Royal Geographical Society

This choropleth depends on classifying the data into several bins, just like a histogram, and the unique colors are assigned to these bins. The method of classification really affects the result, thus yielding different interpretations. More about the classification can be found here.

This reminds me of how classification and graph might mislead information (please refer to the video below, it’s a great guideline to criticize graph/data visualizations). So, be aware of the scale of the bins and classification methods of the choropleth.

Further Extent – Bivariate Colors

Now, the above example (the map) only explains 1 variable. we can insert 1 more variable such that it becomes bivariate colors. Choropleth that is consists of 2 variables is called Bivariate Choropleth.

Let’s just look at the example, shall we?

Bivariate Choropleth

Here’s the latest map of Bivariate Choropleth that I’ve made. Here, I would like to identify the degree of urbanization by observing the relationship between the sum of the population and the standard deviation of the population in a hexagon tessellated map. This map is my work for participating in #30DayMapChallenge on Twitter.

Bivariate Choropleth (Revised) (source: Author, 2020)
Bivariate Choropleth (Revised) (source: Author, 2020)

Notice that the legend now has 2 variables: the standard deviation, and the population sum. The hexagons contain the aggregated value of the population within the respective hexagons; thus the hexagons truncate in the form of statistical properties such as mean, median, etc. A relationship now is indicated by the combination of the colors for each variable. mixing these colors produces a unique combination of colors that represents the relationship between the variable. Furthermore, the hexagon map visualizes the spatial context of the data. With a map, you can see the (geospatial) clustering intuitively.

Now, let’s see how the scatter-plot-graph visualizes the relationship. Let’s do some comparisons.

The Scatter Plot

Alright, here’s the scatter plot graph (with the same coloring as the map).

Scatter Plot Analysis (source: Author, 2020)
Scatter Plot Analysis (source: Author, 2020)

I don’t know about you, but I found the scatter plot to be very odd after looking at the bivariate map. "I can not simply put that the relationship is linear because there are clustering in the data and this clustering must be accounted for. This makes the data spatially dependent instead of independent". This is because the scatter plot does not represent the geographical nature of the data; you can not observe spatial clustering. Simply doing a regression analysis would be wrong unless the analysis is geographically weighted. "(_The clustering) can result in spatial autocorrelation which causes problems for statistical methods that make assumptions about the independence of residuals_".

Concluding Remark

In my sense, a simple scatter plot is not suitable for non-spatial data, that is, data that does not rely on the spatial element. Doing a scatter plot hides the spatial element and relationship of the data such as clustering (spatial autocorrelation). One way to address the issue is by producing a bivariate choropleth map; a choropleth map of 2 variables. the plot of the graph represents the geographical information while the colors represent the intensity of the variables’ values. With bivariate choropleth, the spatial sense of the data is also captured and presented.

(Personal note: also, the map becomes beautiful, at least to me! It gives the subtle aesthetic vibe which tingles with the emotion, especially with some meticulous design choices. In this case, I perceive the Cartography as a form of art rather than a mere data visualization.)


Related Articles