Improving the user experience of overcrowded scatterplots

Of all my regular stops for data-vis design, Our World in Data is probably my favourite. Unlike (for example) FiveThirtyEight, which keeps complex graphics and news articles on different parts of the site, and unlike news sites which embed data graphics as iframes, OWID has a unique approach to mixing graphics and prose. This is much how I once imagined scholarly articles working, and was to some extent my inspiration for how to build this website.

In October, OWID announced a rebuild of Grapher, the engine behind their embedded graphics. This introduces several great features, including access to the raw data and easy-to-find information about data sources. This matters. As someone with a deep interest in data graphic design for the web, I decided to dig into this.

There is particular type of graph I'm interested in, because it is ubiquitous on OWID and on the web at large. It is a kind of scatterplot where the sizes of the data points convey information. Often, the data points represent countries, where area is proportional to the country's population. This is useful in that bigger countries should play a bigger role in our understanding of overall world trends.

This type of chart is a real challenge for the web medium, because large circles quickly overlap and obscure each other on small screens. In light of the OWID redesign, I wanted to see how they cracked that nut, and if their solution can be improved. My intention here is not to troll anyone. I'm well aware that OWID is operating with constraints - technical and otherwise - that I don't have. And as I've written before, I think that good UX for data graphics is a very hard problem. But I do think that data-vis should be a critical area for web developers and UX professionals to improve on, because a quantitative understanding of the world we live in will help societies make better decisions.

Graphical representation of the Gini coefficient

In this article, OWID breaks down the Gini coefficient, a measure of income inequality within a population. (I won't go through how it is calculated here, you can read the original article.) The second chart on the page plots how income tax changes inequality: by comparing the position of a country on the x-axis to the y-axis, we can see how much of an impact income re-distribution via taxes levels the playing field.

Here is the original chart, albiet embedded in an iframe. A more "accurate" representation, including responsive features, may be obtained by following the links above.

The chart is impressive in terms of how much information is conveyed. Let's break that down:

Each data point on it's own conveys four pieces of essential information: the Gini coefficient before taxes (x-axis), (2) the Gini coefficient after taxes (y-axis), the population (size), and the continent/region (colour).
Many of the points are labeled; for those that aren't, the name (and other information) is accessible in a tooltip that appears via hover.
Historical information is available, too, accessible via the bottom range slider.
Countries can be highlihted/selected in various ways - clicking/tapping on a country, selecting an entire region via the legend, or using the "Select countries and regions" button. We can go back in time, or we can view historical changes via a temporal trace (this view can be accessed by clicking where it says "1999" and then changing the range handles).

All in all, a lot of useful information is communicated by this chart. Can we improve on it?

Well, for one thing, the chart is much less usable on mobile/touch devices that it is on desktop browsers. Most obviously, the data points and labels overlap with each other too much to be visible. The legend remains next to the chart, taking up even more space. To get around this, I tried rotating my phone to landscape mode. This caused the y-axis to collapse completely. This wasn't a "runtime" bug of changing the window size; it happened even when I booted the page in landscape mode. (This is a legit bug and is logged here.)

Another issue was apparent when tapping on the legend, which seems to mess with the selection state in a way I can't make sense of. (To illustrate this, try tapping on the same continent twice, and then tap on a different continent. I can't tell what state I'm in at that point, but it's clearly unexpected.) I think what's going on is that the legend elements respond differently to both click and hover, and get confused which is which when responding to a tap. (This is another bug, logged here.)

There are a few other UX annoyances and inconsistencies which are apparent on all devices. The behaviour of the year selection slider is mysterious. It appears to be possible to convert the chart to a comet-tail view, where changes in the Gini coefficient are tracked in two dimensions over time. Unexpectedly, clicking the "1999" button on the slider converts to this view, while placing the slider handles on top of one another converts back. As a user, it isn't obvious to expect this behaviour.

Improving the crowded scatterplot

This type of chart is clearly a challenge to design well for the web. When we don't know how much horizontal space is available, how can we make effective use of the "size" of each circle, and how can we make it possible for users to get the most out of it, regardless of what device they are using to read?

I think the particular problem of this kind of chart is that the exact identity of each data point matters quite a bit. We can't count on there being enough space to label all of them, yet we want to make it easy for the user to find what they're looking for. The most likely thing a user will want to know is how their own country compares to the big players (I'm Canadian, and share my nation's tendency to constantly compare ourselves to our better-known neighbour.) A user also might want to know what's going on with some of the obvious outliers (Like, WTF South Africa?)

Here is my reimagining of the OWID chart. In this case, my goal is not a verbatim reproduction, but an attempt to demonstrate how the usability can be improved to benefit more users.

Select…Single YearRange

Year

Zeroing in on a data point

Finding "Canada" on the OWID chart required at least 3 clicks and probably a few keystrokes. This can be significantly improved with a live search. There are few enough countries represented in the chart that the implementation can be done entirely on the frontend, through string matching. The component is just an input box with a datalist:

<div>
  <label for="search-input">Search</label>
  <input
    type="text"
    value=""
    id="search-input"
    list="search-datalist"
  />
  <datalist id="search-datalist">
    <option key="IND" value="India" />
    <!-- etc. -->
  </datalist>
</div>

To help users find a particular country, we can highlight text matches as they type:

const matches = data.filter((point) =>
  point.country.toLowerCase().includes(searchString.toLowerCase())
);

But when they choose an exact match (probably by selecting from the dropdown list) the tooltip pops up as well:

const exactMatch = data.find((point) => point.country === searchString);
if (exactMatch) {
  // show tooltip
}

The user can also tap on any data point to bring up the tooltip, which contains name of the country and other information. To improve usability on touch devices, I chose to make this happen only on an actual tap or click, and ignore hover events.

Selecting groups

Following which countries/regions are selected in the OWID chart is complicated. There seem to be at least three different states a data point can be in, and the interplay between them is not obvious. Additionally, there seems to be a bug where taps are interpreted as hover events on touch devices, meaning that it's easy to get into a weird state. Tap around on the legend and individual data points for a bit and it quickly becomes hard to follow where you are and how to get back to where you started.

This set off my mutable-state spidey senses and I decided there needed to be a central store to track which countries are highlighted and which aren't. The "active countries" list behaves like a finite state machine:

we start with no active countries
when we search, any countries matching the search query are activated. All others are deactivated.
given that ALL countries in a continent are inactive, clicking on the continent in the legend activates all of them
given that ANY country in a continent is active, click on the continent in the legend deactivates all of them

This state then maps to the user interface according to the following:

if there are zero active countries, all of the data is shown in full colour. The 10 biggest countries are also labelled, if the chart is at least 768px wide. Otherwise, no labels are shown.
if ANY country is active, that country is shown in full colour with its label. Otherwise, it is ghosted (shown in grey).
if there are zero active countries, all items in the legend are shown in full colour.
if ANY country outside of the continent is active, but no countries within that continent are active, the legend item is ghosted.
if ALL countries in a continent are active, the legend item is shown in full colour.
if SOME countries in a continent are active, the legend is "partially ghosted".

Finally, the easiest way to ensure users don't get lost in weird places is to give them the option to reset the chart to its original state at any point. Clicking Reset puts everything back to the way it was at page load.

Improving the experience on small screens

We've already discussed some aspects of responsive design: removing (all) data labels on small screens so that they don't obscure the view when there's too little space available. A further improvement we can make is to scale the size of the circles so that they don't take up overlap with each other too much.

In my previous post on responsive chart design, I discussed how to measure the chart size in pixels so that we can scale things to the space available. In this case, let's scale each data point like so:

const r = Math.max(
  Math.sqrt(width * pop),
  2,
);

where pop is the population (in billions), width is the chart width in pixels, and r is the radius of the circle. There are a couple of things to note about this expression. Firstly, we use Math.max to ensure each circle has a minimum radius of 2 pixels, otherwise it might not be rendered at all. Secondly, we are scaling the radius based on the square root of population. This means that the area of the circle is directly proportional to the population. Since people tend to perceive area as "size" this is a more appropriate relationship than directly relating radius to population.

However, the biggest improvement we can make in this area concerns how the chart stacks with its controls, not the chart area itself. The OWID chart legend is rendered as part of the SVG, making control of stacking behaviour hard or impossible. I just moved the legend into HTML-land, and used a combination of flexbox and CSS columns to change the layout when the screen becomes too small.

The best way to see how this chart would look on a phone, is to look at it on a phone. Here's a QR code for this page:

Introducing the dimension of time

One detail of the data we've glossed over so far is that data isn't available for all countries in a single year. OWID is showing data for the "closest" available data point; I simply decided to show the latest available data. Nevertheless, how inequality changes over time is likely to be something users are interested in.

(One surprise I had while doing this analysis was how little change there is over time, at least according to the Gini coefficient. At least, the differences between countries seem to be much greater than the change within a country over time. I'm wondering if the Gini coefficient is a reasonable way to measure the type of inequality that gets frequently discussed in the news, and plan to address that in a future post.)

Let's discuss some features that will help users see a temporal view of this data.

Temporal charts within the tooltip

The most obvious way to show change over time is a simple line plot showing time on x-axis and the Gini coefficient on the y-axis. This has the additional advantage that it clearly shows the years in which data was collected for a given country (the plot points). I simply embedded a small line chart within the tooltips, so they can be called up for a given country when one wants more information. The one downside to presenting the data this way is that comparisons cannot be made between countries - however the main chart shows that pretty clearly. The tooltips therefore continue to act as a way to call up more information about each data point - and even a small chart can convey a lot of data visually.

Improving the year selection UX

OWID's year selector slider is hard to use. It converts between two different modes: one where a single year is selected and a different mode where a range of years can be selected. The only way to access the range mode is to click on either the "1999" or "2021" button, after which a second slider handle appears. One can then convert back to the single-year mode by setting the slider handles to the same point, which "collapses" them back to a single handle.

None of these actions have the consequence users expect. Moreover, the elements in the slider design are all divs, meaning screenreaders won't make sense of them, nor can they be controlled by the keyboard.

Given all of this, I decided to throw out the slider entirely and use explicitly labelled form elements. The data cover a range of only 22 years: enough for a dropdown to be an acceptable solution. The user can swap modes with a radio switch, making it entirely apparent what they are trying to look at.

Conclusion

Making data graphics work well across a range of device capabilities is challenging even for simple plots, and scatterplots with this amount of information in them are not simple. There really is a unifying principle behind UX design and chart design: empathy for the user, followed by consideration of all the myriad ways a user might interact with your work. While a great experience across every device might not be possible if we bind ourselves to the chart itself, we can also externalize the legend, controls, and tooltips, giving us all the usual responsive design tricks, while picking sematically correct HTML elements.