R语言画好看的散点图
专题介绍:R是一种广泛用于数据分析和统计计算的强大语言,于上世纪90年代开始发展起来。得益于全世界众多 爱好者的无尽努力,大家继而开发出了一种基于R但优于R基本文本编辑器的R Studio(用户的界面体验更好)。也正是由于全世界越来越多的数据科学社区和用户对R包的慷慨贡献,让R语言在全球范围内越来越流行。其中一些R包,例如MASS,SparkR, ggplot2,使数据操作,可视化和计算功能越来越强大。R是用于统计分析、绘图的语言和操作环境。R是属于GNU系统的一个自由、免费、源代码开放的软件,它是一个用于统计计算和统计制图的优秀工具。R作为一种统计分析软件,是集统计分析与图形显示于一体的。它可以运行于UNIX、Windows和Macintosh的操作系统上,而且嵌入了一个非常方便实用的帮助系统,相比于其他统计分析软件,R的学术性开发比较早,适合生物学和医学等学术学科的科研人员使用。
这是我的第52篇原创文章,关于数据可视化分析。
阅读完文本,你可以知道:
1散点图的用途
2使用graphics包画出好看的散点图
3使用export包导出高质量的散点图
“人类的使命,在于自强不息地追求完美。”—托尔斯泰
散点图(scatter plot)是一种常用的数据可视化图形。它用来观察两个连续性变量之间或者两两连续性变量之间(散点图矩阵)的关系。通过散点图,可以探索和发现变量之间是否有关系以及何种关系?比方说,是线性的,还是非线性;是强相关的,还是弱相关的;是正相关的,还是负相关等。
0 准备工作
0.1 加载R包
代码片段
# 加载所需R包
library(pacman)
p_load(tidyverse)
p_load(export)
0.2 加载示例数据和数据检视
代码片段
# 加载数据集
mydata <- read_csv('./data/mydata.csv')
mydata %>% head
mydata %>% glimpse
1 画好看的散点图
1.1 探索Height与Weight的关系
代码片段
# 探索Height与Weight变量的关系
mydata %>% select(Weight, Height) %>% plot()
1.2 美化散点图
散点图的美化,可以考虑这些因素。
-
添加x轴和y轴的标签
-
设置x轴和y轴的范围
-
添加标题和设置标题的字体大小
-
去掉图像框架
-
设置点的形状和颜色
等等
代码片段
# 散点图的美化工作
# 添加x和y轴的labels
# 修改x和y轴的范围
# 添加标题和设置标题字体的尺寸大小
# 去掉图像的框架
# 设置点的颜色和形状
mydata %>%
select(Weight, Height) %>%
plot(
xlab = "Weight (lbs)",
ylab = "Height (inches)",
xlim = c(80, 200),
ylim = c(55, 75),
main="Height vs Weight",
pch=2,
cex.main=1.5,
frame.plot=FALSE ,
col="blue"
)
1.3 根据Sex变量的值来设置散点的颜色和添加图例
代码片段
# 点的颜色控制使用数据的变量sex来控制
# 利用ifelse函数
mydata %>%
select(Weight, Height) %>%
plot(
xlab = "Weight (lbs)",
ylab = "Height (inches)",
xlim = c(80, 200),
ylim = c(55, 75),
main="Height vs Weight",
pch=2,
cex.main=1.5,
frame.plot=FALSE ,
col=ifelse(mydata$Sex == 1, "red", "blue")
)
# 添加图例
legend(80, 75,
pch = c(2, 2),
col = c("red", "blue"),
c("Male", "Female"),
bty="o",
box.col="darkgreen",
cex=.8)
1.4 两个散点图组合
同时,给图一添加垂直的均值曲线;图二添加线性回归拟合曲线
代码片段
par(mfrow=c(1,2))
mydata %>%
select(Weight, Height) %>%
plot(
xlab = "Weight (lbs)",
ylab = "Height (inches)",
xlim = c(80, 200),
ylim = c(55, 75),
main="Height vs Weight",
pch=2,
cex.main=1.5,
frame.plot=FALSE ,
col="blue"
)
# 添加体重平均曲线
abline(v = mean(mydata$Weight, na.rm = TRUE), col="orange")
text(140,73, cex=.8, pos=4, "Orange line isn sample averagen weight")
mydata %>%
select(Age, Height) %>%
plot(
xlab = "Age (years)",
ylab = "Height (inches)",
xlim = c(0, 80),
ylim = c(55, 75),
main="Height vs Age",
pch=3,
cex.main=1.5,
frame.plot=FALSE ,
col="darkred"
)
# 添加线性回归拟合曲线
reg <- lm(Height~Age, data=mydata)
abline(reg)
text(0,72,
paste0("Height ~ ", round(reg$coef[1],2), "+", round(reg$coef[2],2), "*Age"),
pos=4,
cex=.8)
我们使用export包可以便捷地导出高质量的图形,支持PDF格式,PGN格式,TIFF格式等。
以PNG格式图像为例
参考代码
plot_fun <- function() {
par(mfrow=c(1,2))
mydata %>%
select(Weight, Height) %>%
plot(
xlab = "Weight (lbs)",
ylab = "Height (inches)",
xlim = c(80, 200),
ylim = c(55, 75),
main="Height vs Weight",
pch=2,
cex.main=1.5,
frame.plot=FALSE ,
col="blue"
)
# 添加体重平均曲线
abline(v = mean(mydata$Weight, na.rm = TRUE), col="orange")
text(140,73, cex=.8, pos=4, "Orange line isn sample averagen weight")
mydata %>%
select(Age, Height) %>%
plot(
xlab = "Age (years)",
ylab = "Height (inches)",
xlim = c(0, 80),
ylim = c(55, 75),
main="Height vs Age",
pch=3,
cex.main=1.5,
frame.plot=FALSE ,
col="darkred"
)
# 添加线性回归拟合曲线
reg <- lm(Height~Age, data=mydata)
abline(reg)
text(0,72,
paste0("Height ~ ", round(reg$coef[1],2), "+", round(reg$coef[2],2), "*Age"),
pos=4,
cex=.8)
}
# 导出PNG格式
graph2png(file='scatter_plot0201.png', fun=plot_fun, dpi=400, height = 5, aspectr=4)
附录:
完整的参考代码
###################
#R语言与散点图
#王路情
#2020-02-01
##################
# R包
library(pacman)
p_load(tidyverse)
p_load(export)
# 加载数据集
mydata <- read_csv('./data/mydata.csv')
mydata %>% head
mydata %>% glimpse
# 多个连续变量的可视化
mydata %>% select(Age, Weight, Height) %>% plot()
par(mfrow=c(1,1))
# 探索Height与Weight变量的关系
mydata %>% select(Weight, Height) %>% plot()
# 散点图的美化工作
# 添加x和y轴的labels
# 修改x和y轴的范围
# 添加标题和设置标题字体的尺寸大小
# 去掉图像的框架
# 设置点的颜色和形状
mydata %>%
select(Weight, Height) %>%
plot(
xlab = "Weight (lbs)",
ylab = "Height (inches)",
xlim = c(80, 200),
ylim = c(55, 75),
main="Height vs Weight",
pch=2,
cex.main=1.5,
frame.plot=FALSE ,
col="blue"
)
# 进一步完善
# 点的颜色控制使用数据的变量sex来控制
# 利用ifelse函数
mydata %>%
select(Weight, Height) %>%
plot(
xlab = "Weight (lbs)",
ylab = "Height (inches)",
xlim = c(80, 200),
ylim = c(55, 75),
main="Height vs Weight",
pch=2,
cex.main=1.5,
frame.plot=FALSE ,
col=ifelse(mydata$Sex == 1, "red", "blue")
)
mydata %>% View
# 点的颜色控制使用数据的变量sex来控制
# 利用ifelse函数
mydata %>%
select(Weight, Height) %>%
plot(
xlab = "Weight (lbs)",
ylab = "Height (inches)",
xlim = c(80, 200),
ylim = c(55, 75),
main="Height vs Weight",
pch=2,
cex.main=1.5,
frame.plot=FALSE ,
col=ifelse(mydata$Sex == 1, "red", "blue")
)
# 添加图例
legend(80, 75,
pch = c(2, 2),
col = c("red", "blue"),
c("Male", "Female"),
bty="o",
box.col="darkgreen",
cex=.8)
# 或者
# 直接指定legend的具体位置
mydata %>%
select(Weight, Height) %>%
plot(
xlab = "Weight (lbs)",
ylab = "Height (inches)",
xlim = c(80, 200),
ylim = c(55, 75),
main="Height vs Weight",
pch=2,
cex.main=1.5,
frame.plot=FALSE ,
col=ifelse(mydata$Sex == 1, "red", "blue")
)
legend("topleft",
pch = c(2, 2),
col = c("red", "blue"),
c("Male", "Female"),
bty="o",
box.col="darkgreen",
cex=.8)
# 2个散点图组合
# 每个散点图添加曲线
par(mfrow=c(1,2))
mydata %>%
select(Weight, Height) %>%
plot(
xlab = "Weight (lbs)",
ylab = "Height (inches)",
xlim = c(80, 200),
ylim = c(55, 75),
main="Height vs Weight",
pch=2,
cex.main=1.5,
frame.plot=FALSE ,
col="blue"
)
# 添加体重平均曲线
abline(v = mean(mydata$Weight, na.rm = TRUE), col="orange")
text(140,73, cex=.8, pos=4, "Orange line isn sample averagen weight")
mydata %>%
select(Age, Height) %>%
plot(
xlab = "Age (years)",
ylab = "Height (inches)",
xlim = c(0, 80),
ylim = c(55, 75),
main="Height vs Age",
pch=3,
cex.main=1.5,
frame.plot=FALSE ,
col="darkred"
)
# 添加线性回归拟合曲线
reg <- lm(Height~Age, data=mydata)
abline(reg)
text(0,72,
paste0("Height ~ ", round(reg$coef[1],2), "+", round(reg$coef[2],2), "*Age"),
pos=4,
cex=.8)
# # 散点图保存为PDF格式
# pdf('plot3.pdf', width = 4, height = 4)
#
# # 图形保存
# invisible(dev.off())
plot_fun <- function() {
par(mfrow=c(1,2))
mydata %>%
select(Weight, Height) %>%
plot(
xlab = "Weight (lbs)",
ylab = "Height (inches)",
xlim = c(80, 200),
ylim = c(55, 75),
main="Height vs Weight",
pch=2,
cex.main=1.5,
frame.plot=FALSE ,
col="blue"
)
# 添加体重平均曲线
abline(v = mean(mydata$Weight, na.rm = TRUE), col="orange")
text(140,73, cex=.8, pos=4, "Orange line isn sample averagen weight")
mydata %>%
select(Age, Height) %>%
plot(
xlab = "Age (years)",
ylab = "Height (inches)",
xlim = c(0, 80),
ylim = c(55, 75),
main="Height vs Age",
pch=3,
cex.main=1.5,
frame.plot=FALSE ,
col="darkred"
)
# 添加线性回归拟合曲线
reg <- lm(Height~Age, data=mydata)
abline(reg)
text(0,72,
paste0("Height ~ ", round(reg$coef[1],2), "+", round(reg$coef[2],2), "*Age"),
pos=4,
cex=.8)
}
graph2png(file='scatter_plot0201.png', fun=plot_fun, dpi=400, height = 5, aspectr=4)
关于R语言画好看的散点图,您有什么想法请留言。
需要深入交流和沟通,请加我的微信:luqin360。备注:实名+工作或者专业,否则不会通过。
数据思考与践行
数据可视化分析系列文章

请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!