Title: | Combination Matrix Axis for 'ggplot2' to Create 'UpSet' Plots |
---|---|
Description: | Replace the standard x-axis in 'ggplots' with a combination matrix to visualize complex set overlaps. 'UpSet' has introduced a new way to visualize the overlap of sets as an alternative to Venn diagrams. This package provides a simple way to produce such plots using 'ggplot2'. In addition it can convert any categorical axis into a combination matrix axis. |
Authors: | Constantin Ahlmann-Eltze [aut, cre] |
Maintainer: | Constantin Ahlmann-Eltze <[email protected]> |
License: | GPL-3 |
Version: | 0.4.0.9000 |
Built: | 2024-11-22 04:37:13 UTC |
Source: | https://github.com/const-ae/ggupset |
The function splits the text based on the sep
argument and
views each occurring element as potential set.
axis_combmatrix( sep = "[^[:alnum:]]+", levels = NULL, override_plotting_function = NULL, xlim = NULL, ylim = NULL, expand = TRUE, clip = "on", ytrans = "identity" )
axis_combmatrix( sep = "[^[:alnum:]]+", levels = NULL, override_plotting_function = NULL, xlim = NULL, ylim = NULL, expand = TRUE, clip = "on", ytrans = "identity" )
sep |
The separator that is used to split the string labels. Can be a
regex. Default: |
levels |
The selection of string elements that are displayed in the combination matrix axis. Default: NULL, which means simply all elements in the text labels are used |
override_plotting_function |
to achieve maximum flexibility, you can
provide a custom plotting function. For more information, see details.
Default: |
xlim , ylim
|
The limits fort the x and y axes |
expand |
Boolean with the same effect as in
|
clip |
String with the same effect as in
|
ytrans |
transformers for y axis. For more information see
|
Technically the function appends a coord
system to the ggplot object.
To maintain compatibility additional arguments like ytrans
,
ylim
, and clip
are forwarded to coord_trans()
.
Note: make sure that the argument to the 'x' aesthetic is
character vector that contains the sep
sequence. The only
exception is if axis_combmatrix()
is combined with a
scale_x_mergelist()
. This pattern works because in the
first step scale_x_mergelist()
turns a list argument
to 'x' into a character vector that axis_combmatrix()
can work with.
For maximum flexibility, you can use the 'override_plotting_function' parameter
which returns a ggplot and is called with a tibble
with one entry per point of the combination matrix. Specifically, it contains
the collapsed label string
an ordered factor with the labels on the left of the plot
consecutive numbering of the points
a list column that contains the splitted labels
the x-position of the point
boolean to indicate if this element is active in the intersection
the row of the point
See the examples how the override_plotting_function
looks that recreates
the default combination matrix
library(ggplot2) mtcars$combined <- paste0("Cyl: ", mtcars$cyl, "_Gears: ", mtcars$gear) head(mtcars) ggplot(mtcars, aes(x=combined)) + geom_bar() + axis_combmatrix(sep = "_") # Example of 'override_plotting_function' ggplot(mtcars, aes(x=combined)) + geom_bar() + axis_combmatrix(sep = "_", override_plotting_function = function(df){ ggplot(df, aes(x= at, y= single_label)) + geom_rect(aes(fill= index %% 2 == 0), ymin=df$index-0.5, ymax=df$index+0.5, xmin=0, xmax=1) + geom_point(aes(color= observed), size = 3) + geom_line(data= function(dat) dat[dat$observed, ,drop=FALSE], aes(group = labels), size= 1.2) + ylab("") + xlab("") + scale_x_continuous(limits = c(0, 1), expand = c(0, 0)) + scale_fill_manual(values= c(`TRUE` = "white", `FALSE` = "#F7F7F7")) + scale_color_manual(values= c(`TRUE` = "black", `FALSE` = "#E0E0E0")) + guides(color="none", fill="none") + theme( panel.background = element_blank(), axis.text.x = element_blank(), axis.ticks.y = element_blank(), axis.ticks.length = unit(0, "pt"), axis.title.y = element_blank(), axis.title.x = element_blank(), axis.line = element_blank(), panel.border = element_blank() ) })
library(ggplot2) mtcars$combined <- paste0("Cyl: ", mtcars$cyl, "_Gears: ", mtcars$gear) head(mtcars) ggplot(mtcars, aes(x=combined)) + geom_bar() + axis_combmatrix(sep = "_") # Example of 'override_plotting_function' ggplot(mtcars, aes(x=combined)) + geom_bar() + axis_combmatrix(sep = "_", override_plotting_function = function(df){ ggplot(df, aes(x= at, y= single_label)) + geom_rect(aes(fill= index %% 2 == 0), ymin=df$index-0.5, ymax=df$index+0.5, xmin=0, xmax=1) + geom_point(aes(color= observed), size = 3) + geom_line(data= function(dat) dat[dat$observed, ,drop=FALSE], aes(group = labels), size= 1.2) + ylab("") + xlab("") + scale_x_continuous(limits = c(0, 1), expand = c(0, 0)) + scale_fill_manual(values= c(`TRUE` = "white", `FALSE` = "#F7F7F7")) + scale_color_manual(values= c(`TRUE` = "black", `FALSE` = "#E0E0E0")) + guides(color="none", fill="none") + theme( panel.background = element_blank(), axis.text.x = element_blank(), axis.ticks.y = element_blank(), axis.ticks.length = unit(0, "pt"), axis.title.y = element_blank(), axis.title.x = element_blank(), axis.line = element_blank(), panel.border = element_blank() ) })
A fictional biological dataset with a complex experimental design
df_complex_conditions
df_complex_conditions
a data frame with 360 rows and 4 variables
KO. Boolean value if the sample had a knock out.
DrugA. character vector with "Yes" and "No" elements indicating if the sample was treated with drug A.
Timepoint. Numeric vector with elements 8, 24, and 48 indicating the time of measurement since the beginning of the experiment.
response. Numeric vector with the response of the sample to the treatment conditions. Could for example be the concentration of a metabolite.
dim(df_complex_conditions) head(df_complex_conditions)
dim(df_complex_conditions) head(df_complex_conditions)
A fictional dataset describing which genes belong to certain pathways
gene_pathway_membership
gene_pathway_membership
a matrix with 6 rows and 37 columns. Each row is one pathway, with its name given as 'rownames' and each column is a gene. The values in the matrix are Boolean indicators if the gene is a member of the pathway.
dim(gene_pathway_membership) gene_pathway_membership[, 1:15]
dim(gene_pathway_membership) gene_pathway_membership[, 1:15]
The function handles list columns by collapsing them into delimited strings
using the sep
argument. This is useful to show sets and in combination
with the axis_combmatrix()
function.
scale_x_mergelist(sep = "-", ..., position = "bottom")
scale_x_mergelist(sep = "-", ..., position = "bottom")
sep |
String the is used to delimit the elements in each list entry. Default: "-". |
... |
additional arguments that are passed on to
|
position |
either "top" or "bottom" to specify where the x axis drawn. Default: "bottom" |
library(ggplot2) ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_mergelist() + theme(axis.text.x = element_text(angle = 90, hjust=1, vjust = 0.5)) ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_mergelist(sep = " & ", name = "Merged Movie Genres", position = "top") + theme(axis.text.x = element_text(angle = 90, hjust=0, vjust = 0.5))
library(ggplot2) ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_mergelist() + theme(axis.text.x = element_text(angle = 90, hjust=1, vjust = 0.5)) ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_mergelist(sep = " & ", name = "Merged Movie Genres", position = "top") + theme(axis.text.x = element_text(angle = 90, hjust=0, vjust = 0.5))
This function takes a list column and turns it into a combination matrix
axis. It internally wraps the call to scale_x_mergelist()
and
axis_combmatrix()
and makes sure that the elements are sorted by
size.
scale_x_upset( order_by = c("freq", "degree"), n_sets = Inf, n_intersections = Inf, sets = NULL, intersections = NULL, reverse = FALSE, ytrans = "identity", ..., position = "bottom" )
scale_x_upset( order_by = c("freq", "degree"), n_sets = Inf, n_intersections = Inf, sets = NULL, intersections = NULL, reverse = FALSE, ytrans = "identity", ..., position = "bottom" )
order_by |
either "freq" or "degree". Default: "freq" |
n_sets |
maximum number of sets that are displayed. Default: Inf |
n_intersections |
maximum number of intersections that are displayed. Default: Inf |
sets |
character vector that specifies which sets are displayed |
intersections |
a list of character vectors that specifies which intersections are displayed |
reverse |
boolean if the order of the intersections is reversed. Default: FALSE |
ytrans |
transformers for y axis. For more information see
|
... |
additional parameters for |
position |
either "top" or "bottom" to specify where the combination matrix is drawn. Default: "bottom" |
library(ggplot2) ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_upset(reverse = TRUE, sets=c("Drama", "Action")) ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 5, ytrans="sqrt") ggplot(tidy_movies[1:100, ], aes(x=Genres, y=year)) + geom_boxplot() + scale_x_upset(intersections = list(c("Drama", "Comedy"), c("Short"), c("Short", "Animation")), sets = c("Drama", "Comedy", "Short", "Animation", "Horror"))
library(ggplot2) ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_upset(reverse = TRUE, sets=c("Drama", "Action")) ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_upset(n_intersections = 5, ytrans="sqrt") ggplot(tidy_movies[1:100, ], aes(x=Genres, y=year)) + geom_boxplot() + scale_x_upset(intersections = list(c("Drama", "Comedy"), c("Short"), c("Short", "Animation")), sets = c("Drama", "Comedy", "Short", "Animation", "Horror"))
This theme sets the default styling for the combination matrix axis
by extending the default ggplot2 theme()
.
theme_combmatrix( combmatrix.label.make_space = TRUE, combmatrix.label.width = NULL, combmatrix.label.height = NULL, combmatrix.label.extra_spacing = 3, combmatrix.label.total_extra_spacing = unit(10, "pt"), combmatrix.label.text = NULL, combmatrix.panel.margin = unit(c(1.5, 1.5), "pt"), combmatrix.panel.striped_background = TRUE, combmatrix.panel.striped_background.color.one = "white", combmatrix.panel.striped_background.color.two = "#F7F7F7", combmatrix.panel.point.size = 3, combmatrix.panel.line.size = 1.2, combmatrix.panel.line.color = "black", combmatrix.panel.point.color.fill = "black", combmatrix.panel.point.color.empty = "#E0E0E0", ... )
theme_combmatrix( combmatrix.label.make_space = TRUE, combmatrix.label.width = NULL, combmatrix.label.height = NULL, combmatrix.label.extra_spacing = 3, combmatrix.label.total_extra_spacing = unit(10, "pt"), combmatrix.label.text = NULL, combmatrix.panel.margin = unit(c(1.5, 1.5), "pt"), combmatrix.panel.striped_background = TRUE, combmatrix.panel.striped_background.color.one = "white", combmatrix.panel.striped_background.color.two = "#F7F7F7", combmatrix.panel.point.size = 3, combmatrix.panel.line.size = 1.2, combmatrix.panel.line.color = "black", combmatrix.panel.point.color.fill = "black", combmatrix.panel.point.color.empty = "#E0E0E0", ... )
combmatrix.label.make_space |
Boolean indicator if the y-axis label is moved so far to the left to make enough space for the combination matrix labels. Default: TRUE |
combmatrix.label.width |
A unit that specifies how much space to make for the labels of the combination matrix. Default: NULL, which means the width of the label text is used |
combmatrix.label.height |
A unit that specifies how high the combination
matrix should be. Default: NULL, which means that the height of the label
text + |
combmatrix.label.extra_spacing |
A single number for the additional
height per row. Default: |
combmatrix.label.total_extra_spacing |
A unit that specifies the total offset for the height of the combination matrix |
combmatrix.label.text |
A |
combmatrix.panel.margin |
A two element unit vector to specify top
and bottom margin around the combination matrix. Default:
|
combmatrix.panel.striped_background |
Boolean to indicate if the background of the plot is striped. Default: TRUE |
combmatrix.panel.striped_background.color.one |
Color of the first kind of stripes. Default: "white" |
combmatrix.panel.striped_background.color.two |
Color of the second kind of stripes. Default: "#F7F7F7" |
combmatrix.panel.point.size |
Number to specify the size of the points in the combination matrix. Default: 3 |
combmatrix.panel.line.size |
Number to specify the size of the lines connecting the points. Default: 1.2 |
combmatrix.panel.line.color |
Color of the lines connecting the points. Default: "black" |
combmatrix.panel.point.color.fill |
Color of the filled points. Default: "black" |
combmatrix.panel.point.color.empty |
Color of the empty points. Default: "#E0E0E0" |
... |
additional arguments that are passed to |
library(ggplot2) # Ensure that the y-axis label is next to the axis by setting # combmatrix.label.make_space to FALSE ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_upset() + theme_combmatrix(combmatrix.label.text = element_text(color = "black", size=15), combmatrix.label.make_space = FALSE, plot.margin = unit(c(1.5, 1.5, 1.5, 65), "pt")) # Change the color of the background stripes ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_upset() + theme_combmatrix(combmatrix.panel.striped_background = TRUE, combmatrix.panel.striped_background.color.one = "grey")
library(ggplot2) # Ensure that the y-axis label is next to the axis by setting # combmatrix.label.make_space to FALSE ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_upset() + theme_combmatrix(combmatrix.label.text = element_text(color = "black", size=15), combmatrix.label.make_space = FALSE, plot.margin = unit(c(1.5, 1.5, 1.5, 65), "pt")) # Change the color of the background stripes ggplot(tidy_movies[1:100, ], aes(x=Genres)) + geom_bar() + scale_x_upset() + theme_combmatrix(combmatrix.panel.striped_background = TRUE, combmatrix.panel.striped_background.color.one = "grey")
The original ggplot2movies::movies
dataset has 7 columns that
contain indicators if a movies belongs to a certain genre. In this version
the 7 columns are collapsed to a single list column to create a tidy
dataset. It also has information on only 5,000 movies to reduce the size
of the dataset. Furthermore each star rating is in its on row.
tidy_movies
tidy_movies
a data frame with 50,000 rows and 10 columns
title. The title of the movie.
year. Year of release.
budget. Total budget (if known) in US dollars.
length. Length in minutes.
rating. Average IMDB user rating.
votes. Number of IMDB user who rated this movie.
mpaa. MPAA rating
Genres. List column with all genres the movie belongs to
stars, percent_rating. The number of stars and the corresponding percentage of people rating the movie with this many stars.
dim(tidy_movies) head(tidy_movies)
dim(tidy_movies) head(tidy_movies)