Skip to content

Latest commit

 

History

History
152 lines (102 loc) · 4.77 KB

File metadata and controls

152 lines (102 loc) · 4.77 KB

Drawing a mosaic plot with vcd::mosaic()

Joyce Robbins 2/15/2018

The Data

df <- read.csv("../data/MusicIcecream.csv")
df
##     Age   Favorite     Music Freq
## 1   old bubble gum classical    1
## 2   old bubble gum      rock    1
## 3   old     coffee classical    3
## 4   old     coffee      rock    1
## 5 young bubble gum classical    2
## 6 young bubble gum      rock    5
## 7 young     coffee classical    1
## 8 young     coffee      rock    0

Order of splits

It is best to draw mosaic plots incrementally: start with splitting on one variable and then add additional variables one at a time. The full mosaic plot will have one split per variable.

Important: if your data is in a data frame (see above), the count column must be called Freq. (Other data structure options are tables, matrices -- for 2 variables -- or objects of class structable -- see >?vcd::structable.)

Also note that all of these plots are drawn with vcd::mosaic() not mosaicplot().

Split on Age only:

library(vcd)
mosaic(~Age, df)

Split on Age, then Music:

mosaic(Music ~ Age, df)

Note that the first split is between "young" and "old", while the second set of splits divides each age group into "classical" and "rock".

Split on Age, then Music, then Favorite:

mosaic(Favorite ~ Age + Music, df)

Direction of splits

Note that in the previous example, the direction of the splits is as follows:

  1. Age -- horizontal split

  2. Music -- vertical split

  3. Favorite -- horizontal split

This is the default direction pattern: alternating directions beginning with horizontal. Therefore we get the same plot with the following:

mosaic(Favorite ~ Age + Music, 
       direction = c("h", "v", "h"), df)

The directions can be altered as desired. For example, to create a doubledecker plot, make all splits vertical except the last one:

mosaic(Favorite ~ Age + Music,
       direction = c("v", "v", "h"), df)

Note that the direction vector is in order of splits (Age, Music, Favorite), not in the order in which the variables appear in the formula, where the last variable to be split is listed first, before the "~".

Options

Fill color:

library(grid) # needed for gpar
mosaic(Favorite ~ Age + Music, 
       gp = gpar(fill = c("lightblue", "blue")),
       df)

Rotate labels:

mosaic(Favorite ~ Age + Music, 
       labeling = labeling_border(rot_labels = c(45, -45, 0, 0)),
       df)

The rot_labels = vector sets the rotation in degrees on the four sides of the plot in this order: top, right, bottom, left. (Different from the typical base graphics order!) The default is rot_labels = c(0, 90, 0, 90).

Abbreviate labels:

mosaic(Favorite ~ Age + Music, 
       labeling = labeling_border(abbreviate_labs = c(3, 1, 6)), 
       df)

Labels are abbreviated in the order of the splits (as for direction =). The abbreviation algorithm appears to return the specified number of characters after vowels are eliminated (if necessary).

For more formatting options, see >?vcd::labeling_border.

Remove spacing between cells

mosaic(Favorite ~ Age + Music,
       spacing = spacing_equal(sp = unit(0, "lines")),
       df)

For more details, see >?vcd::spacings

Change border color (must also set fill(?))

mosaic(Favorite ~ Age + Music,
       gp = gpar(fill = c("lightblue", "blue"),
                 col = "white"),
       spacing = spacing_equal(sp = unit(0, "lines")),
       df)

Please feel free to suggest changes by submitting a pull request or ask questions / offer comments by opening an issue.