data-viz-workshop-2021

Plotting using logarithmic scales

Logarithmic scales (or log scales) are useful. But they are not understood by all. We are familiar with reading numbers on a number line or reading data from a graph. However, under certain circumstances, a standard scale may not be useful. If the data grows or decreases exponentially, then you may need to use a logarithmic scale.

A. Intuitions

1. You may have overlooked some log scales

Here is a screenshot of the Battery window from my MacBook. If you look at the ‘turn-off display after’ scale, you will notice that it ranges from 1 minute to 15 minutes, then to 1 hour, and then from 3 hours to ‘never.’ This is not difficult for us to understand. And, this is the ‘intuition’ to a logarithmic scale. On a log scale, very far numbers are pulled closer (compressed) to visualize a really large scale in a small area.

2. Distance from Boston

Imagine that you are want to visualize/compare the distance from Boston to a few other cities of your interest. The graphic below shows how your dot plot looks. As you can see in the graphic, the distance to San Fransisco dominates the visualization. All other cities appear to be very close compared to San Fransisco, and this is true. But, it bars us from comparing the distances between all cities other than San Fransisco.

Consider what the graph would look like if you were to translate the x-axis, which was in a linear scale to logarithmic scale. The overall x-axis scale looks odd/non-linear, but the San Fransisco distance does not dominate the graphic. Carefully notice how the gap between 500 and 1000 is much lesser than 50 to 500.

Instead of using a dot plot, say, you visualized the inter-city distances using a bar diagram. In a bar diagram, too, the bar showing the distance to San Fransisco dominates the plot.

Similar to the previous example, your graphic looks different if you translate your y-axis to a log scale. Notice how the gaps from 1 to 10, 10 to 100, and 100 to 1000 are all the same.

Showing data on a logarithmic scale can cure skewness towards large values.

B. Exponentially growing variables

3. Examples in the real-world

The human population has grown exponentially over the past century, and it has done so mainly by producing large amounts of food and learning how to control diseases. Ten thousand years ago, when humans first invented agriculture, there were maybe one million humans on the planet.

Epidemics and pandemics also can grow very fast. The graph below illustrates the trends of the new confirmed coronavirus cases from January 22 to March 19, 2020. Source: Popsci.

Fire also grows exponentially. The three pictures below show how the Creek Fire in eastern Fresno and Madera counties grew rapidly since it started on Friday, Sept. 4, 2020. By Tuesday morning, the California wildfire was estimated at more than 140,000 acres. Source: Fresnobee.

4. Exponential variables grow very fast

Exponential growth is a pattern of data that shows greater increases with passing time, creating the curve of an exponential function. Here is an example. Suppose that a mice population rises exponentially every month, starting with two in the first month, then four in the second month, 16 in the third, 256 in the fourth month, and so on. The mice population is growing to the power of 2 each month in this case. The graph below illustrates how exponential growth (green) surpasses both linear (red) and cubic (blue) growth.

Mathematically, an exponential growth (or decay) function is a function that grows (or shrinks) at a constant percent growth rate.

The equation can be written in the form f(x) = a(1 + r)x or f(x) = abx where b = 1 + r.

Here, a is the initial or starting value of the function, r is the percent growth or growth rate, and b is the growth factor/multiplier.

Since powers of negative numbers behave strangely, we limit b to positive values.

This is why log should not be used (or be used cautiously) when you have negative numbers in your data.

5. An exponential trend in a linear scale translates to a linear line in a log scale

Imagine that you are growing mushrooms, and that they are doubling each week. You start with two mushrooms in week one, which double to 4 in week two; then 8 in week three—then 16, 32, 64, 128, 512, 1024, 2048, up to 1 million—in just 20 weeks. This data would be too much for a standard graph, but it can be easily displayed on a logarithmic scale. Both plots below show the same data. The only difference is how the y-axis is scaled—uniformly in the linear scale and compressed/expanded in the logarithmic scale. In the plot below, since values double each time, the base of the logarithm scale used is 2.

But a linearly increasing trend in a linear scale translates differently to a log scale.

This implies that if we see that the values in the y-axis are not increasing in the logarithmic scale, in the real-world (linear scale), the values are still growing linearly.

Say that you start to laugh, and by 100 seconds, another 100 people start to laugh seeing you laugh. Seeing all these 100 people laugh, say that additional 100 people laugh in the next 100 seconds. In other words, the number of people laughing is linearly increasing over time. On a log scale, this linearly increasing trend looks like a ‘flattening’ trend.

When the trend looks flattening/non-increasing in a log scale, in reality, the trend is still ‘increasing’.

C. In my plot, when do I need a log scale?

6. A log plot can show us ‘patterns’ that are not clear on a linear scale

If we have two trends which ‘appear to be’ exponentially growing, comparison is difficult.

The data look very different when plotted on a logarithmic scale. In a typical graph, values on the(vertical) y-axis are plotted linearly: 1, 2, 3, and so on, or 10, 20, 30, or the like. By contrast, in a logarithmic plot, each tick on the y-axis represents a tenfold increase over the previous one: 1, then 10, then 100, then 1,000, then10,000, and so on. (The interval doesn’t have to be a factor of 10, it could be a factor of 2, or 5, or 27, or any other number, but humans seem to prefer factors of 10.) Source: NYTimes.

Logarithmic scales can emphasize the rate of change in a way that linear scales do not. Italy seems to be slowing the coronavirus infection rate, while the number of cases in the United States continues to double every few days. This is clear only if we translate the y-axis into a log scale.

The log scale is also useful to read/study stock prices. For example, if Garmin went up $100/share it would be more substantial than if Google went up $100/share. On a linear scale, Garmin’s change will show as insignificant compared to Google’s changes. Read more here (pdf).

7. A log scale can rescue your plot from skewness

The graph below shows the relationship between a drug’s dose (in nM) on X-axis and response on the Y-axis. The doses were chosen so each dose is twice the previous dose. When plotted with a linear axis many of the values are superimposed and it is hard to see what is going on.

With a logarithmic axis, the values are equally spaced horizontally, making the graph easier to understand.

If some of your data points are too large, they can dominate your plot. A log scale can rescue your plot from skewness.

7. When NOT to use a log scale?

If your data column has numbers less than or equal to 0, it cannot be represented on a log scale (without further manipulation). But, in general, this is not a problem for most datasets. For example, there are never a negative number of covid cases, there is never a negative fire size, and there is never a negative number of mice.

A log scale cannot represent a “0” or any negative quantities. So if your data column has numbers less than or equal to 0, a log scale cannot be used on this column.

D. Understanding the log scale better

8. How to read a logarithmic scale?

  1. Looking at the plot, determine whether you are reading a semi-log or log-log graph. A semi-log graph has a logarithmic scale either in the x-axis or y-axis, and a log-log graph has on both. Notice that a logarithmic scale has unevenly spaced grid lines. A standard/linear scale has evenly spaced grid lines.

  1. Read the scale of the main/major divisions. On a logarithmic scale graph, the evenly spaced marks represent the powers of whatever base you are working with. The standard logarithms use either base ten or the natural logarithm, which uses the base ‘e.’ Base 2 is also common.

  2. Notice that the interval between the main divisions is not evenly spaced. That is, for example, in the plot below, the mark for 50 would be placed about 2/3 of the way between 10 and 100, not around the middle of 10 and 100. Similarly, 5 would be placed 2/3 of the way between 1 and 10, not in the middle of 1 and 10.

  3. The minor interval marks are based on the logarithm of each number. Therefore, 1 is represented as the first major mark on the scale, 10 is the second, 100 is the third, etc. So, where to place the minor interval marks between, say, 1 and 10. This can be calculated as follows:

    • log10(1) = 0.00
    • log10(2) = 0.30
    • log10(3) = 0.48
    • log10(4) = 0.60
    • log10(5) = 0.70
    • log10(6) = 0.78
    • log10(7) = 0.84
    • log10(8) = 0.90
    • log10(9) = 0.95
    • log10(10) = 1.00

Mushrooms can proliferate. They may double every week with an exponential growth factor of 2: y(number-of-mushrooms)=2time(weeks). Or, they may increase by a factor of 10 each week with an exponential growth factor of 10. Regardless of the exponential growth factor, as long as they grow exponentially on a linear scale, the corresponding trend is a linear line on a logarithmic scale. At a growth factor of 2 the exponential growth appears linear on a log scale:

Same is for the growth factor of ‘e’:

And, the same for the growth factor of 10:

All exponential trends linearize, and the log base determines the slope of the linear line:

10. Scientific notations take longer to decode; they decrease the effectiveness of a log plot

E. Scientific research

We are born with an innate ability to understand log. But, our mathematical and scientific literacy makes log difficult for us to comprehend. Poor log!

11. Is our intuitive number line (in our mind) linear or logarithmic?

We (humans) seem to be born with a number line in our heads. But it may look less like an evenly segmented ruler and more like a logarithmic slide rule on which the distance between two numbers represents their ratio (when divided) rather than their difference (when subtracted). Previous studies of Westerners showed that people tend to map numbers on a linear scale, with the numerals evenly spaced along the line. But if the numbers are presented as hard-to-count groups of dots, people will logarithmically group the larger numbers closer together on one end of the scale in what researchers call a “compression effect.” Dehaene says the research suggests that a logarithmic number line might be an intuitive mathematical concept, whereas the idea of a linear number line might have to be learned. Source: Science.

12. Why do two light bulbs 💡💡 not seem twice as bright as one 💡? And, why do we perceive so many things logarithmically?

Number is important to human survival – you would want to know whether one lion is facing you or several. Indeed, it could be argued that perceiving numbers logarithmically rather than linearly could give an evolutionary advantage: it could be more important to know whether it is five lions facing you or three than to know if the deer herd you are chasing contains 100 animals or just 98. But in fact, perceptual systems of all kinds display a non-linear relationship between external stimulus and internal representation. If we double the force on your hand, it will feel like less than double the pressure. If we double the salinity of water, the taste will not be twice as salty. Non-linear scalings that give greater perceptual resolution to less intense stimuli are ubiquitous across animal species and across sensory modalities: heaviness, pain, warmth, taste, loudness, pitch, brightness, distance, time delay, and color saturation among others, are all perceived this way. Moreover, these mappings between observable stimulus and our internal perception-space – these psychophysical scales and laws – are approximately logarithmic. Source: Focus.