Joyce Robbins 2/7/2018
Since I happened to be preparing to teach, among other things, how to reorder the bars in a bar chart when I saw Claus Wilke’s response, I figured this would be a good topic for a tutorial.
My theory is that this is a troublesome topic because several different
problems are conflated into one question. If, however, you first
identify how you want the bars ordered, and why they are not
ordered, the road will be less rocky. And with
forcats (now a core
tidyverse package) functions – namely,
fct_inorder(), fct_relevel(), fct_reorder(), and fct_infreq() –
it will become second nature to get the order of the bars right.
Before getting into reordering, it’s important to understand R defaults,
which happen to be the same for base R and ggplot2. For factor
data, the default is for the bars to be plotted in the order of the
factor levels. This applies to ordered and non-ordered factors–all
types of factors have ordered levels. Character data, in contrast, is
plotted in alphabetical order by default. Check what you have with
str().
library(tidyverse)
mycolor <- "#002448"; myfill = "#7192E3"
df <- tibble(chardata = c("cold", "warm", "hot", "hot", "warm", "warm", "cold", "cold", "cold"),
factordata = factor(c("cold", "warm", "hot", "hot", "warm", "warm", "cold", "cold", "cold"),
levels = c("cold", "warm", "hot")))
head(df)## # A tibble: 6 x 2
## chardata factordata
## <chr> <fct>
## 1 cold cold
## 2 warm warm
## 3 hot hot
## 4 hot hot
## 5 warm warm
## 6 warm warm
In the examples below, all of the reordering happens in the calls to
ggplot2, which means that you don’t have to alter your data at all to
achieve the desired effect. In fact, you don’t even have to have to
convert your character data to factor data before plotting. The
forcats functions will perform the conversion for plotting purposes
only. However, if you prefer to change the data itself, of course, you
can. See the U.S. births example below for an example on how to do so.
Now, there are two key rules to follow when deciding how to reorder bars in a bar chart:
If the levels or categories have a natural order to them (a.k.a. ordinal data), they should be plotted in that order. However, unless specified otherwise, whether your categories are stored as factor or character data, they will appear in alphabetical order in a bar chart (unless you changed around the levels of the factors).
In this example, the levels have a natural order, from Beginner to Expert, that is not reflected in the plot:
mydf <- tibble(Skill = c("Beginner", "Adv Beginner", "Intermediate", "Expert"),
Num = c(75, 60, 15, 25))
ggplot(mydf, aes(Skill, Num)) +
geom_col(color = mycolor, fill = myfill) + theme_grey(14)The simplest way to get the order right depends on the situation at hand:
- = has a count or frequency column, in this case,
Num
In this case, we can simply indicate with fct_inorder() that we want
the levels to be plotted in the order in which they appear in the data
frame:
mydf## # A tibble: 4 x 2
## Skill Num
## <chr> <dbl>
## 1 Beginner 75
## 2 Adv Beginner 60
## 3 Intermediate 15
## 4 Expert 25
Since the row order in the data frame is correct, it’s a simple fix:
ggplot(mydf, aes(fct_inorder(Skill), Num)) +
geom_col(color = mycolor, fill = myfill) + theme_grey(14)- = has a count or frequency column, in this case,
Num
Often there’s just one level out of order. In the case below it’s “Under 15 years”, which should be the first category in the chart, not the last:
# 2015 U.S. Births
MotherAge <- c("15-19 years", "20-24 years", "25-29 years",
"30-34 years", "35-39 years", "40-44 years",
"45-49 years", "50 years and over", "Under 15 years")
Num <- c(229.715, 850.509, 1152.311, 1094.693, 527.996, 111.848,
8.171, .754, 2.500)
Births2015 <- tibble(MotherAge, Num)
ggplot(Births2015, aes(MotherAge, Num)) +
geom_col(color = mycolor, fill = myfill) +
ggtitle("United States Births, 2015", subtitle = "in thousands") +
scale_y_continuous(breaks = seq(0, 1250, 250)) +
coord_flip() + theme_grey(14)We can use fct_relevel() to move it where it needs to
go:
ggplot(Births2015, aes(fct_relevel(MotherAge, "Under 15 years"), Num)) +
ggtitle("United States Births, 2015", subtitle = "in thousands") +
scale_y_continuous(breaks = seq(0, 1250, 250)) +
geom_col(color = mycolor, fill = myfill) + coord_flip() + theme_grey(14)Although we can move the levels around without touching the original data, in this case, we probably do want to change the levels to the correct natural order and then plot, as follows:
Births2015 <- Births2015 %>%
mutate(MotherAge = fct_relevel(MotherAge, "Under 15 years"))
ggplot(Births2015, aes(MotherAge, Num)) +
ggtitle("United States Births, 2015", subtitle = "in thousands") +
scale_y_continuous(breaks = seq(0, 1250, 250)) +
geom_col(color = mycolor, fill = myfill) + coord_flip() + theme_grey(14)As long as the categories that are out of order all need to be moved to the same place, we can use the same technique:
x <- factor(c("A", "B", "C", "move1", "D", "E", "move2", "F"))
x## [1] A B C move1 D E move2 F
## Levels: A B C D E F move1 move2
fct_relevel(x, "move1", "move2") # move to the beginning (default)## [1] A B C move1 D E move2 F
## Levels: move1 move2 A B C D E F
fct_relevel(x, "move1", "move2", after = 4) # move after the fourth item## [1] A B C move1 D E move2 F
## Levels: A B C D move1 move2 E F
fct_relevel(x, "move1", "move2", after = Inf) # move to the end## [1] A B C move1 D E move2 F
## Levels: A B C D E F move1 move2
However, if they’re all in a big jumble, the only solution is to
manually reorder all of the levels with fct_relevel().
Some important notes:
-
This problem has nothing to do with any other variable. There is simply a mismatch between the levels of the factors and the natural order of the categories.
-
Don’t be tempted to use ordered factors even though your data has ordered levels. The levels are ordered for all factors.
Ordering by frequency count is the recommended approach for nominal data, that is, categories that are not naturally ordered.
Once again, the default is for the bars to be ordered alphabetically, which is not what we want. (Since the bar chart is horizontal the categories are alphabetical from bottom to top.)
weekend_gross <- tibble(movie = c("Jumanji", "Maze Runner", "Winchester",
"The Greatest Snowman", "The Post"),
gross = c(10.93, 10.475, 9.307, 7.696, 5.218))
ggplot(weekend_gross, aes(movie, gross)) +
ggtitle("Weekend Box Office", subtitle = "Feb 2-4, 2018") +
ylab("millions of dollars") +
geom_col(color = mycolor, fill = myfill) + coord_flip() + theme_grey(14)This issue can be addressed within the call to ggplot() with
fct_reorder(), also from forcats; we do not have to actually
reorder the factor levels.
# note the change in the first line:
ggplot(weekend_gross, aes(fct_reorder(movie, gross), gross)) +
ggtitle("Weekend Box Office", subtitle = "Feb 2-4, 2018") +
ylab("millions of dollars") +
geom_col(color = mycolor, fill = myfill) + coord_flip() + theme_grey(14)Notes:
- Although it appears that the bars are ordered from highest to lowest
frequency count, in fact, they are ordered from lowest to highest,
and plotted from the bottom up in a horizontal bar chart. If you
need to reverse the order, you can add a minus sign to the variable
which determines the order:
fct_reorder(movie, -gross)or usefct_reorder(movie, gross) %>% fct_rev().
In this case, we can’t order by another variable since we only have one variable: a list of categories:
unbinned <- tibble(response = sample(c("yes", "no", "maybe"), 100,
replace = TRUE, prob = c(.5, .15, .35)))
ggplot(unbinned, aes(response)) + geom_bar(color = mycolor, fill = myfill) +
theme_grey(14)Again the bars are ordered alphabetically by default, not in order of
frequency. The solution is our fourth forcats function,
fct_infreq():
ggplot(unbinned, aes(fct_infreq(response))) + geom_bar(color = mycolor, fill = myfill) +
theme_grey(14)Note that fct_infreq() orders the levels in decreasing order of
frequency, ideal for drawing bar charts (presumably not a coincidence).
Many thanks to Emily Zabor (
@zabormetrics) for convincing me to
try forcats despite my initial reluctance.
For more on best practices for bar charts, see:
Antony Unwin, “Displaying Categorical Data,” Graphical Data Analysis with R (CRC Press: 2015).
For more detail on forcats functions in general, see:
Jenny Bryan, “Be the boss of your factors”
Garrett Grolemund and Hadley Wickham, “Factors” chapter in R for Data Science











