ggplot2包|盒箱图
专题介绍:R是一种广泛用于数据分析和统计计算的强大语言,于上世纪90年代开始发展起来。得益于全世界众多 爱好者的无尽努力,大家继而开发出了一种基于R但优于R基本文本编辑器的R Studio(用户的界面体验更好)。也正是由于全世界越来越多的数据科学社区和用户对R包的慷慨贡献,让R语言在全球范围内越来越流行。其中一些R包,例如MASS,SparkR, ggplot2,使数据操作,可视化和计算功能越来越强大。R是用于统计分析、绘图的语言和操作环境。R是属于GNU系统的一个自由、免费、源代码开放的软件,它是一个用于统计计算和统计制图的优秀工具。R作为一种统计分析软件,是集统计分析与图形显示于一体的。它可以运行于UNIX、Windows和Macintosh的操作系统上,而且嵌入了一个非常方便实用的帮助系统,相比于其他统计分析软件,R的学术性开发比较早,适合生物学和医学等学术学科的科研人员使用。
【R语言】开通了R语言群,大家相互学习和交流,请扫描下方二维码,备注:姓名-R群,我会邀请你入群,一起进步和成长。
伙伴们,我最近在重温李宏毅老师的机器学课程,并且做了一份学习笔记。你若是想学习和应用机器学习,强烈推荐你进入数据科学与人工智能公众号,查看我的学习笔记,亮点是1)点击阅读原文,就可以在手机上非常方便地观看课程视频;2)加入配套的机器学习群,参与讨论和交流,让我们一起来学习和应用机器学习。
我们先回顾一下,我已经总结和分享了ggplot2包绘制折线图、面积图、柱形图、散点图和气泡图,相关文章如下:
本文介绍ggplot2包绘制盒箱图。盒箱图可用于表示x轴是离散变量,y轴是连续变量的分布情况。我们逐步绘制盒箱图,步骤如下:
第一步:R包管理
1if (!require("pacman")) {
2 install.packages("pacman")
3 require("pacman")
4}
5p_load(datasets, ggplot2, ggthemes, grid, dplyr, RColorBrewer)
第二步:数据加载和理解
1data(airquality)
2airquality <- airquality %>%
3 mutate(Month = factor(Month,
4 labels = c("May", "Jun", "Jul", "Aug", "Sep")))
5str(airquality)
6summary(airquality)
7head(airquality)
第三步:逐步绘制盒箱图
1)基本盒箱图
p1 <- ggplot(airquality, aes(x = Month, y = Ozone)) +
geom_boxplot()
p1
2)定义x轴标签
p2 <- p1 + scale_x_discrete(name = "Month") +
scale_y_continuous(name = "Mean ozone in parts per billion")
p2
3)调整y轴刻度
p3 <- p1 + scale_x_discrete(name = "Month") +
scale_y_continuous(name = "Mean ozone innparts per billion",
breaks = seq(0, 175, 25), limits = c(0, 175))
p3
4)添加标题
p4 <- p3 +
labs(title = "Frequency histogram of mean ozone",
subtitle = "Source: New York State Department of Conservation")
p4
5)修改盒箱的颜色
fill <- "#4271AE"
line <- "#1F3552"
p5 <- ggplot(airquality, aes(x = Month, y = Ozone)) +
geom_boxplot(fill = fill, colour = line, alpha = 0.7,
outlier.colour = "#1F3552",
outlier.shape = 20,
outlier.size = 2) +
scale_y_continuous(name = "Mean ozone innparts per billion",
breaks = seq(0, 175, 25), limits = c(0, 175)) +
scale_x_discrete(name = "Month") +
labs(title = "Frequency histogram of mean ozone",
subtitle = "Source: New York State Department of Conservation")
p5
6)字体设置
windowsFonts(
# 中文字体
lishu = windowsFont(family = "LiSu"), # 隶书
yahei = windowsFont(family = "Microsoft YaHei"), # 微软雅黑
xinwei = windowsFont(family = "STXingwei"), # 华文新魏
kaiti = windowsFont(family = "KaiTi"), # 楷体
heiti = windowsFont(family = "SimHei"), # 黑体
# 英文字体
arial = windowsFont(family = "Arial"), # Arial字体
newman = windowsFont(family = "Times New Roman"), #Times New Roman字体
hand = windowsFont(family = "Lucida Calligraphy"), # Lucida手写体
Helvetica = windowsFont(family = "Helvetica") # 印刷体
)
7)主题装饰
主题一:经济学杂志
p6_1 <- ggplot(airquality, aes(x = Month, y = Ozone)) +
geom_boxplot(fill = fill, colour = line) +
scale_y_continuous(name = "Mean ozone innparts per billion",
breaks = seq(0, 175, 25), limits = c(0, 175)) +
scale_x_discrete(name = "Month") +
labs(title = "Frequency histogram of mean ozone",
subtitle = "Source: New York State Department of Conservation") +
theme_economist() + scale_fill_economist() +
theme(axis.line.x = element_line(size = .5, colour = "black"),
axis.title = element_text(size = 12),
legend.position = "bottom",
legend.direction = "horizontal",
legend.box = "horizontal",
legend.text = element_text(size = 10),
text = element_text(family = "yahei"),
plot.title = element_text(family = "yahei"))
p6_1
主题二:简洁主题
p6_2 <- ggplot(airquality, aes(x = Month, y = Ozone)) +
geom_boxplot(colour = "black", fill = "#56B4E9") +
scale_y_continuous(name = "Mean ozone innparts per billion",
breaks = seq(0, 175, 25), limits = c(0, 175)) +
scale_x_discrete(name = "Month") +
labs(title = "Frequency histogram of mean ozone",
subtitle = "Source: New York State Department of Conservation") +
theme(axis.line.x = element_line(size = 1, colour = "black"),
axis.line.y = element_line(size = 1, colour = "black"),
axis.text.x = element_text(colour = "black", size = 10),
axis.text.y = element_text(colour = "black", size = 10),
legend.position = "bottom",
legend.direction = "horizontal",
legend.box = "horizontal",
legend.key = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
plot.title = element_text(family = "Helvetica"),
text = element_text(family = "Helvetica"))
p6_2
8)分组观察
# 数据处理
airquality_trimmed <- airquality %>%
filter(Month %in% c("Jul", "Aug", "Sep")) %>%
mutate(Temp.f = factor(ifelse(Temp > mean(Temp), 1, 0),
labels = c("Low temp ", "High temp ")))
8.1)分面板
p8_1 <- ggplot(airquality_trimmed, aes(x = Month, y = Ozone)) +
geom_boxplot(fill = fill, colour = line,
alpha = 0.7) +
scale_y_continuous(name = "Mean ozone innparts per billion",
breaks = seq(0, 175, 50), limits = c(0, 175)) +
scale_x_discrete(name = "Month") +
labs(title = "Frequency histogram of mean ozone",
subtitle = "Source: New York State Department of Conservation") +
theme_bw() +
theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
panel.border = element_rect(colour = "black", fill = NA, size = .5),
text = element_text(size = 12, family = "Tahoma"),
axis.title = element_text(face = "bold"),
axis.text.x = element_text(size = 11)) +
facet_grid(. ~ Temp.f)
p8_1
8.2)同一个图形表示
p8_2 <- ggplot(airquality_trimmed, aes(x = Month, y = Ozone, fill = Temp.f)) +
geom_boxplot(alpha = 0.7) +
scale_y_continuous(name = "Mean ozone innparts per billion",
breaks = seq(0, 175, 25), limits = c(0, 175)) +
scale_x_discrete(name = "Month") +
labs(title = "Frequency histogram of mean ozone",
subtitle = "Source: New York State Department of Conservation") +
theme_bw() +
theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
panel.border = element_rect(colour = "black", fill = NA, size = .5),
text = element_text(size = 12, family = "Tahoma"),
axis.title = element_text(face = "bold"),
axis.text.x = element_text(size = 11),
legend.position = "bottom") +
scale_fill_brewer(palette = "Accent") +
labs(fill = "Temperature ")
p8_2
关于盒箱图,有什么问题,请留言。
附录:完整代码
1##################
2#盒箱图
3##################
4
5# 1 R包管理
6if (!require("pacman")) {
7 install.packages("pacman")
8 require("pacman")
9}
10p_load(datasets, ggplot2, ggthemes, grid, dplyr, RColorBrewer)
11
12# 2 数据加载和理解
13data(airquality)
14airquality <- airquality %>%
15 mutate(Month = factor(Month,
16 labels = c("May", "Jun", "Jul", "Aug", "Sep")))
17str(airquality)
18summary(airquality)
19head(airquality)
20
21# 3 逐步绘制盒箱图
22# 1)基本盒箱图
23p1 <- ggplot(airquality, aes(x = Month, y = Ozone)) +
24 geom_boxplot()
25p1
26
27# 2) 定义x轴标签
28p2 <- p1 + scale_x_discrete(name = "Month") +
29 scale_y_continuous(name = "Mean ozone in parts per billion")
30p2
31
32# 3) 调整y轴的刻度
33p3 <- p1 + scale_x_discrete(name = "Month") +
34 scale_y_continuous(name = "Mean ozone innparts per billion",
35 breaks = seq(0, 175, 25), limits = c(0, 175))
36p3
37
38# 4) 添加标题
39p4 <- p3 +
40 labs(title = "Frequency histogram of mean ozone",
41 subtitle = "Source: New York State Department of Conservation")
42p4
43
44# 5) 修改盒箱的颜色
45fill <- "#4271AE"
46line <- "#1F3552"
47p5 <- ggplot(airquality, aes(x = Month, y = Ozone)) +
48 geom_boxplot(fill = fill, colour = line, alpha = 0.7,
49 outlier.colour = "#1F3552",
50 outlier.shape = 20,
51 outlier.size = 2) +
52 scale_y_continuous(name = "Mean ozone innparts per billion",
53 breaks = seq(0, 175, 25), limits = c(0, 175)) +
54 scale_x_discrete(name = "Month") +
55 labs(title = "Frequency histogram of mean ozone",
56 subtitle = "Source: New York State Department of Conservation")
57p5
58# 6) 字体设置
59windowsFonts(
60 # 中文字体
61 lishu = windowsFont(family = "LiSu"), # 隶书
62 yahei = windowsFont(family = "Microsoft YaHei"), # 微软雅黑
63 xinwei = windowsFont(family = "STXingwei"), # 华文新魏
64 kaiti = windowsFont(family = "KaiTi"), # 楷体
65 heiti = windowsFont(family = "SimHei"), # 黑体
66 # 英文字体
67 arial = windowsFont(family = "Arial"), # Arial字体
68 newman = windowsFont(family = "Times New Roman"), #Times New Roman字体
69 hand = windowsFont(family = "Lucida Calligraphy"), # Lucida手写体
70 Helvetica = windowsFont(family = "Helvetica") # 印刷体
71)
72
73# 7) 主题装饰
74# 7.1)经济学杂志主题
75p6_1 <- ggplot(airquality, aes(x = Month, y = Ozone)) +
76 geom_boxplot(fill = fill, colour = line) +
77 scale_y_continuous(name = "Mean ozone innparts per billion",
78 breaks = seq(0, 175, 25), limits = c(0, 175)) +
79 scale_x_discrete(name = "Month") +
80 labs(title = "Frequency histogram of mean ozone",
81 subtitle = "Source: New York State Department of Conservation") +
82 theme_economist() + scale_fill_economist() +
83 theme(axis.line.x = element_line(size = .5, colour = "black"),
84 axis.title = element_text(size = 12),
85 legend.position = "bottom",
86 legend.direction = "horizontal",
87 legend.box = "horizontal",
88 legend.text = element_text(size = 10),
89 text = element_text(family = "yahei"),
90 plot.title = element_text(family = "yahei"))
91p6_1
92# 7.2) 简洁主题
93p6_2 <- ggplot(airquality, aes(x = Month, y = Ozone)) +
94 geom_boxplot(colour = "black", fill = "#56B4E9") +
95 scale_y_continuous(name = "Mean ozone innparts per billion",
96 breaks = seq(0, 175, 25), limits = c(0, 175)) +
97 scale_x_discrete(name = "Month") +
98 labs(title = "Frequency histogram of mean ozone",
99 subtitle = "Source: New York State Department of Conservation") +
100 theme(axis.line.x = element_line(size = 1, colour = "black"),
101 axis.line.y = element_line(size = 1, colour = "black"),
102 axis.text.x = element_text(colour = "black", size = 10),
103 axis.text.y = element_text(colour = "black", size = 10),
104 legend.position = "bottom",
105 legend.direction = "horizontal",
106 legend.box = "horizontal",
107 legend.key = element_blank(),
108 panel.grid.major = element_blank(),
109 panel.grid.minor = element_blank(),
110 panel.background = element_blank(),
111 plot.title = element_text(family = "Helvetica"),
112 text = element_text(family = "Helvetica"))
113p6_2
114
115# 8) 分组观察
116# 数据处理
117airquality_trimmed <- airquality %>%
118 filter(Month %in% c("Jul", "Aug", "Sep")) %>%
119 mutate(Temp.f = factor(ifelse(Temp > mean(Temp), 1, 0),
120 labels = c("Low temp ", "High temp ")))
121# 8.1)分面板
122p8_1 <- ggplot(airquality_trimmed, aes(x = Month, y = Ozone)) +
123 geom_boxplot(fill = fill, colour = line,
124 alpha = 0.7) +
125 scale_y_continuous(name = "Mean ozone innparts per billion",
126 breaks = seq(0, 175, 50), limits = c(0, 175)) +
127 scale_x_discrete(name = "Month") +
128 labs(title = "Frequency histogram of mean ozone",
129 subtitle = "Source: New York State Department of Conservation") +
130 theme_bw() +
131 theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
132 panel.border = element_rect(colour = "black", fill = NA, size = .5),
133 text = element_text(size = 12, family = "Tahoma"),
134 axis.title = element_text(face = "bold"),
135 axis.text.x = element_text(size = 11)) +
136 facet_grid(. ~ Temp.f)
137p8_1
138
139# 8.2) 同一个图形表示
140p8_2 <- ggplot(airquality_trimmed, aes(x = Month, y = Ozone, fill = Temp.f)) +
141 geom_boxplot(alpha = 0.7) +
142 scale_y_continuous(name = "Mean ozone innparts per billion",
143 breaks = seq(0, 175, 25), limits = c(0, 175)) +
144 scale_x_discrete(name = "Month") +
145 labs(title = "Frequency histogram of mean ozone",
146 subtitle = "Source: New York State Department of Conservation") +
147 theme_bw() +
148 theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
149 panel.border = element_rect(colour = "black", fill = NA, size = .5),
150 text = element_text(size = 12, family = "Tahoma"),
151 axis.title = element_text(face = "bold"),
152 axis.text.x = element_text(size = 11),
153 legend.position = "bottom") +
154 scale_fill_brewer(palette = "Accent") +
155 labs(fill = "Temperature ")
156p8_2
157
你若是要找数据类的工作,或者要招聘数据类人才,可以看下公众号数据人才,它是一个数据人才助手。
好书推荐
3 推断统计与数据科学,moderndive和tidyverse包
4 R for machine learning,从经典的机器学习算法入手
5 R for everyone,人人都可学R和用R,以发现数据里的价值
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!