ggplot2|线性回归图
专题介绍:R是一种广泛用于数据分析和统计计算的强大语言,于上世纪90年代开始发展起来。得益于全世界众多 爱好者的无尽努力,大家继而开发出了一种基于R但优于R基本文本编辑器的R Studio(用户的界面体验更好)。也正是由于全世界越来越多的数据科学社区和用户对R包的慷慨贡献,让R语言在全球范围内越来越流行。其中一些R包,例如MASS,SparkR, ggplot2,使数据操作,可视化和计算功能越来越强大。R是用于统计分析、绘图的语言和操作环境。R是属于GNU系统的一个自由、免费、源代码开放的软件,它是一个用于统计计算和统计制图的优秀工具。R作为一种统计分析软件,是集统计分析与图形显示于一体的。它可以运行于UNIX、Windows和Macintosh的操作系统上,而且嵌入了一个非常方便实用的帮助系统,相比于其他统计分析软件,R的学术性开发比较早,适合生物学和医学等学术学科的科研人员使用。
【R语言】开通了R语言群,大家相互学习和交流,请扫描下方二维码,备注:姓名-R群,我会邀请你入群,一起进步和成长。
我们先回顾一下,我已经总结和分享了ggplot2包绘制折线图、面积图、柱形图、散点图和气泡图等,相关文章如下:
本文介绍ggplot2包绘制线性回归图。
一、为什么要画线性回归图?
当两个连续变量的散点图满足某种线性关系时,利用线性回归图可以进一步描述这种线性关系。在散点图上,把线性回归模型可视化,从而形成了线性回归图。由此可知,画线性回归图,第一步,绘制散点图;第二步,构建变量之间的线性回归模型,第三,根据模型所学参数,添加线性直线。
二、线性回归图案例
第一步:R包管理。
# R 包管理
library(pacman)
p_load(ggplot2, ggthemes, grid, dplyr, HistData)
p_load(ggfortify)
第二步:数据导入和理解
data("Galton")
?Galton
str(Galton)
head(Galton)
summary(Galton)
第三步:拟合一个线性回归模型。
数学表达式:
# 3 线性回归表达式
fit <- lm(child ~ parent, data = Galton)
summary(fit)
第四步:绘制基础的线性回归图
p1 <- ggplot(
data = Galton,
mapping = aes(x = parent, y = child)
) +
geom_point(shape = 1) +
geom_smooth(method = 'lm')
p1
第五步:不显示置信度区间
p2 <- ggplot(
data = Galton,
mapping = aes(x = parent, y = child)
) +
geom_point(shape = 1) +
geom_smooth(method = 'lm', se = FALSE)
p2
第六步:图形修饰
6.1 定制轴标签
p3 <- p2 + scale_x_continuous(name = "Parent height") +
scale_y_continuous(name = "Child height")
p3
6.2 增加标题
p4 <- p3 +
labs(title = "Galton regression line",
subtitle = "Source: R Core Team")
p4
6.3 主题使用
1)经济学主题
# 字体设置
windowsFonts(
# 中文字体
lishu = windowsFont(family = "LiSu"), # 隶书
yahei = windowsFont(family = "Microsoft YaHei"), # 微软雅黑
xinwei = windowsFont(family = "STXingwei"), # 华文新魏
kaiti = windowsFont(family = "KaiTi"), # 楷体
heiti = windowsFont(family = "SimHei"), # 黑体
# 英文字体
arial = windowsFont(family = "Arial"), # Arial字体
newman = windowsFont(family = "Times New Roman"), #Times New Roman字体
hand = windowsFont(family = "Lucida Calligraphy"), # Lucida手写体
Helvetica = windowsFont(family = "Helvetica") # 印刷体
)
# 经济学主题
p5 <- ggplot(Galton, aes(x = parent, y = child)) +
geom_point(shape = 1) + geom_smooth(method = lm, se = FALSE) +
labs(title = "Galton regression line",
subtitle = "Source: R Core Team") +
scale_x_continuous(name = "Parent height") +
scale_y_continuous(name = "Child height") +
theme_economist() + scale_fill_economist() +
theme(axis.line.x = element_line(size = 1, colour = "black"),
axis.title = element_text(size = 12),
legend.position = "bottom",
legend.direction = "horizontal",
legend.box = "horizontal",
legend.text = element_text(size = 10),
text = element_text(family = "yahei"),
plot.title = element_text(family = "yahei"))
p5
2)自定义主题
p6 <- ggplot(Galton, aes(x = parent, y = child)) +
geom_point(shape = 1) + geom_smooth(method = lm, se = FALSE) +
labs(title = "Galton regression line",
subtitle = "Source: R Core Team") +
scale_x_continuous(name = "Parent height") +
scale_y_continuous(name = "Child height") +
theme(panel.border = element_rect(colour = "black", fill = NA, size = .5),
axis.text.x = element_text(colour = "black", size = 9),
axis.text.y = element_text(colour = "black", size = 9),
legend.position = "bottom",
legend.direction = "horizontal",
legend.box = "horizontal",
panel.grid.major = element_line(colour = "#d3d3d3"),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
plot.title = element_text(size = 14, family = "Helvetica", face = "bold"),
text = element_text(family = "Helvetica"))
p6
第七步:线性回归模型诊断可视化
1)经济学主题
autoplot(fit, label.size = 3) + theme_economist() +
theme(panel.border = element_rect(colour = "black",
fill = NA, size = .5),
axis.text.x = element_text(colour = "black", size = 9),
axis.text.y = element_text(colour = "black", size = 9),
panel.background = element_blank(),
plot.title = element_text(family = "arial"),
text = element_text(family = "arial"))
2)Five Thirty Eight 主题
autoplot(fit, label.size = 3) + theme_fivethirtyeight() +
theme(axis.title = element_text(family = "Atlas Grotesk Regular"),
axis.text.x = element_text(colour = "black", size = 9),
axis.text.y = element_text(colour = "black", size = 9),
legend.position = "bottom",
legend.direction = "horizontal",
legend.box = "horizontal",
plot.title = element_text(family = "yahei", size = 16),
text = element_text(family = "yahei"))
关于线性回归图的绘制,你有什么问题,请留言。
加入R语言群,一起学习更多R语言知识和技能。
附录:完整代码
###########
#线性回归图
##########
# 1 R 包管理
library(pacman)
p_load(ggplot2, ggthemes, grid, dplyr, HistData)
p_load(ggfortify)
# 2 数据导入和理解
data("Galton")
?Galton
str(Galton)
head(Galton)
summary(Galton)
# 3 线性回归表达式
fit <- lm(child ~ parent, data = Galton)
summary(fit)
# 4 绘制基础的线性回归图
p1 <- ggplot(
data = Galton,
mapping = aes(x = parent, y = child)
) +
geom_point(shape = 1) +
geom_smooth(method = 'lm')
p1
# 5 不显示置信度区间
p2 <- ggplot(
data = Galton,
mapping = aes(x = parent, y = child)
) +
geom_point(shape = 1) +
geom_smooth(method = 'lm', se = FALSE)
p2
# 6 图形修饰
# 6.1 定制轴标签
p3 <- p2 + scale_x_continuous(name = "Parent height") +
scale_y_continuous(name = "Child height")
p3
# 6.2 增加标题
p4 <- p3 +
labs(title = "Galton regression line",
subtitle = "Source: R Core Team")
p4
# 6.3 主题使用
# 字体设置
windowsFonts(
# 中文字体
lishu = windowsFont(family = "LiSu"), # 隶书
yahei = windowsFont(family = "Microsoft YaHei"), # 微软雅黑
xinwei = windowsFont(family = "STXingwei"), # 华文新魏
kaiti = windowsFont(family = "KaiTi"), # 楷体
heiti = windowsFont(family = "SimHei"), # 黑体
# 英文字体
arial = windowsFont(family = "Arial"), # Arial字体
newman = windowsFont(family = "Times New Roman"), #Times New Roman字体
hand = windowsFont(family = "Lucida Calligraphy"), # Lucida手写体
Helvetica = windowsFont(family = "Helvetica") # 印刷体
)
# 1)经济学主题
p5 <- ggplot(Galton, aes(x = parent, y = child)) +
geom_point(shape = 1) + geom_smooth(method = lm, se = FALSE) +
labs(title = "Galton regression line",
subtitle = "Source: R Core Team") +
scale_x_continuous(name = "Parent height") +
scale_y_continuous(name = "Child height") +
theme_economist() + scale_fill_economist() +
theme(axis.line.x = element_line(size = 1, colour = "black"),
axis.title = element_text(size = 12),
legend.position = "bottom",
legend.direction = "horizontal",
legend.box = "horizontal",
legend.text = element_text(size = 10),
text = element_text(family = "yahei"),
plot.title = element_text(family = "yahei"))
p5
# 2)自定义主题
p6 <- ggplot(Galton, aes(x = parent, y = child)) +
geom_point(shape = 1) + geom_smooth(method = lm, se = FALSE) +
labs(title = "Galton regression line",
subtitle = "Source: R Core Team") +
scale_x_continuous(name = "Parent height") +
scale_y_continuous(name = "Child height") +
theme(panel.border = element_rect(colour = "black", fill = NA, size = .5),
axis.text.x = element_text(colour = "black", size = 9),
axis.text.y = element_text(colour = "black", size = 9),
legend.position = "bottom",
legend.direction = "horizontal",
legend.box = "horizontal",
panel.grid.major = element_line(colour = "#d3d3d3"),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
plot.title = element_text(size = 14, family = "Helvetica", face = "bold"),
text = element_text(family = "Helvetica"))
p6
# 7 线性回归模型诊断可视化
# 1)经济学主题
autoplot(fit, label.size = 3) + theme_economist() +
theme(panel.border = element_rect(colour = "black",
fill = NA, size = .5),
axis.text.x = element_text(colour = "black", size = 9),
axis.text.y = element_text(colour = "black", size = 9),
panel.background = element_blank(),
plot.title = element_text(family = "arial"),
text = element_text(family = "arial"))
# 2)fivethirtyeight 主题
autoplot(fit, label.size = 3) + theme_fivethirtyeight() +
theme(axis.title = element_text(family = "Atlas Grotesk Regular"),
axis.text.x = element_text(colour = "black", size = 9),
axis.text.y = element_text(colour = "black", size = 9),
legend.position = "bottom",
legend.direction = "horizontal",
legend.box = "horizontal",
plot.title = element_text(family = "yahei", size = 16),
text = element_text(family = "yahei"))
伙伴们,我最近在重温李宏毅老师的机器学课程,并且做了一份学习笔记。你若是想学习和应用机器学习,强烈推荐你进入数据科学与人工智能公众号,查看我的学习笔记,亮点是1)点击阅读原文,就可以在手机上非常方便地观看课程视频;2)加入配套的机器学习群,参与讨论和交流,让我们一起来学习和应用机器学习。
你若是要找数据类的工作,或者要招聘数据类人才,可以看下公众号数据人才,它是一个数据人才助手。
我们每天都在学习,我们如何高效学习呢?给大家推荐:费尔曼学习法。(温馨提示:因为要参与视频号教育博主认证,请伙伴们关注、点赞、评论和分享,给予支持,深表感谢。)
好书推荐
3 推断统计与数据科学,moderndive和tidyverse包
4 R for machine learning,从经典的机器学习算法入手
5 R for everyone,人人都可学R和用R,以发现数据里的价值
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!