R语言字体设置和批量下载图片踩坑的解决方案
专题介绍:R是一种广泛用于数据分析和统计计算的强大语言,于上世纪90年代开始发展起来。得益于全世界众多 爱好者的无尽努力,大家继而开发出了一种基于R但优于R基本文本编辑器的R Studio(用户的界面体验更好)。也正是由于全世界越来越多的数据科学社区和用户对R包的慷慨贡献,让R语言在全球范围内越来越流行。其中一些R包,例如MASS,SparkR, ggplot2,使数据操作,可视化和计算功能越来越强大。R是用于统计分析、绘图的语言和操作环境。R是属于GNU系统的一个自由、免费、源代码开放的软件,它是一个用于统计计算和统计制图的优秀工具。R作为一种统计分析软件,是集统计分析与图形显示于一体的。它可以运行于UNIX、Windows和Macintosh的操作系统上,而且嵌入了一个非常方便实用的帮助系统,相比于其他统计分析软件,R的学术性开发比较早,适合生物学和医学等学术学科的科研人员使用。
【R语言】开通了R语言群,大家相互学习和交流,请扫描下方二维码,备注:姓名-R群,我会邀请你入群,一起进步和成长。
有时候,我们在对图形做美化和装饰的时候,需要对图形中的文本信息做字体设置,有可能是为了符合规范要求,也有可能是为了更加美观。
我们如何做字体的设置呢?
因为小编的系统是Windows系统,自带了一些字体,有时候需要自己从网上下载字体或者官方指定的字体,比方说下载和使用Google的字体,我们如何把自己下载的字体加载进来,供我们图形的泛文本信息使用呢?
通过R语言字体设置踩坑和实践,提供一种有效的解决方案。
第一步:下载所需要的字体,把字体放到一个文件夹
例如:Google的JosefinSlab-SemiBold字体,下载链接:
https://fonts.google.com/specimen/Josefin+Slab
下载成功后,做解压缩,我放在D盘下Fonts文件夹下(可以自定义)
第二步:使用showtext包的font_add函数,添加字体
第三步:把加载成功的字体做应用
# 1 数据准备
# 数据是原材料,数工难为无数之学。
# 基于业务问题采集或者创建数据集
streaming <- tibble::tribble(
~service, ~`2020`, ~`2021`,
"netflix", 29, 20,
"prime", 21, 16,
"hulu", 16, 13,
"disney", 12, 11,
"apple", 4, 5,
"peacock", 0, 5,
"hbo", 3, 12,
"paramount", 2, 3,
"other", 13, 15,
)
# 数据集检视
streaming
# 2 数据整理
# 数据转换为方便可视化的数据格式
# 宽数据-->转换为长数据
# 使用tidyr包的pivot_longer函数
streaming_long <- tidyr::pivot_longer(streaming,
cols = -service,
names_to = "year",
values_to = "share")
streaming_long
# 3 数据可视化
# 对比分析,柱形图
library(ggplot2)
ggplot(streaming_long) +
geom_col(aes(factor(service, levels = streaming$service),
share, fill = year), position = position_dodge(width = 0.9)) +
geom_point(aes(x = service, y = -10, color = year, fill = year), size = 4) +
geom_text(aes(service, share + 1, label = paste0(share, "%"), group = year),
position = position_dodge(width = 0.9), size = 3) +
scale_fill_manual(values = c(`2020` = "red3", `2021` = "black")) +
scale_color_manual(values = c(`2020` = "red3", `2021` = "black")) +
guides(fill = "none", color = guide_legend(direction = "horizontal")) +
scale_y_continuous(labels = scales::percent_format(scale = 1),
limits = c(0, 35)) +
labs(title = "US Streaming Market Share",
subtitle = "2020 vs 2021",
caption = "Source: Ampere Analytics via The Wrap
Other Streatming Services include ESPN+, Showtime,
Sling TV, Youtube TV, and Starz",
x = "", y = "") +
theme_minimal() +
theme(axis.text = element_text(size = 10),
plot.title = element_text(size = 28, hjust= 0.5),
plot.subtitle = element_text(size = 28, hjust = 0.5),
plot.caption = element_text(size = 7, color = "grey60"),
plot.background = element_rect(fill = "#f4f7fc", size = 0),
legend.title = element_blank(),
legend.text= element_text(size = 12),
panel.grid = element_blank(),
legend.position = c(0.85, 0.8))
效果图如下:
通过字体设置,修改标题的字体属性。
# 第一个坑:R语言字体设置解决方案
# 使用了Google的JosefinSlab-SemiBold字体
# 下载链接:
# https://fonts.google.com/specimen/Josefin+Slab
library(showtext)
font_add("Jose", "D:/Fonts/Josefin_Slab/static/JosefinSlab-SemiBold.ttf")
showtext_auto()
fontfamily <- "Jose"
# 图形字体设置
ggplot(streaming_long) +
geom_col(aes(factor(service, levels = streaming$service),
share, fill = year), position = position_dodge(width = 0.9)) +
geom_point(aes(x = service, y = -10, color = year, fill = year), size = 4) +
geom_text(aes(service, share + 1, label = paste0(share, "%"), group = year),
position = position_dodge(width = 0.9), size = 3) +
scale_fill_manual(values = c(`2020` = "red3", `2021` = "black")) +
scale_color_manual(values = c(`2020` = "red3", `2021` = "black")) +
guides(fill = "none", color = guide_legend(direction = "horizontal")) +
scale_y_continuous(labels = scales::percent_format(scale = 1),
limits = c(0, 35)) +
labs(title = "US Streaming Market Share",
subtitle = "2020 vs 2021",
caption = "Source: Ampere Analytics via The Wrap
Other Streatming Services include ESPN+, Showtime,
Sling TV, Youtube TV, and Starz",
x = "", y = "") +
theme_minimal() +
theme(axis.text = element_text(size = 10),
plot.title = element_text(family = fontfamily, size = 28, hjust= 0.5),
plot.subtitle = element_text(family = fontfamily, size = 28, hjust = 0.5),
plot.caption = element_text(size = 7, color = "grey60"),
plot.background = element_rect(fill = "#f4f7fc", size = 0),
legend.title = element_blank(),
legend.text= element_text(size = 12),
panel.grid = element_blank(),
legend.position = c(0.85, 0.8))
效果图如下:
通过效果图,发现标题的字体设置成功了。
有时候,你想可视化中x轴使用公司的logo图片来表示公司信息,这样我们就需要去找这些图片,一种方法,若是公司数目不是很多,就手动下载,另一种方法(也是我推荐的),找到公司的logo的网站,编写程式自动下载。
我在下载图片的时候,遇到了一个坑,当使用download.file()函数时,若是不指定参数mode,是无法把图片下载到本地。同时,先要在项目里创建存放图片的文件夹,例如:images。也就是说,使用函数下载图片的时候,请记得指定mode=’wb’, 实例代码如下:
# 第二个坑:图片批量下载和应用
# 把可视化的x轴的公司名称换作公司的Logo
# 使用download.file()函数
# 1)记得设置参数mode='wb'
# 2)在项目下提前创建images文件夹
library(tidyverse)
wiki <- "https://upload.wikimedia.org/wikipedia/commons/thumb/"
logos <- tibble::tribble(
~service, ~logo,
"netflix", paste0(wiki, "0/08/Netflix_2015_logo.svg/340px-Netflix_2015_logo.svg.png"),
"prime", paste0(wiki, "1/11/Amazon_Prime_Video_logo.svg/450px-Amazon_Prime_Video_logo.svg.png"),
"hulu", paste0(wiki, "e/e4/Hulu_Logo.svg/440px-Hulu_Logo.svg.png"),
"disney", paste0(wiki, "3/3e/Disney%2B_logo.svg/320px-Disney%2B_logo.svg.png"),
"apple", paste0(wiki, "2/28/Apple_TV_Plus_Logo.svg/500px-Apple_TV_Plus_Logo.svg.png"),
"peacock", paste0(wiki, "d/d3/NBCUniversal_Peacock_Logo.svg/440px-NBCUniversal_Peacock_Logo.svg.png"),
"hbo", paste0(wiki, "d/de/HBO_logo.svg/440px-HBO_logo.svg.png"),
"paramount", paste0(wiki, "a/a5/Paramount_Plus.svg/440px-Paramount_Plus.svg.png"),
"other", "other.png"
) %>%
mutate(path = file.path("images", paste(service, tools::file_ext(logo), sep = ".")))
logos
labels <- setNames(paste0("<img src='", logos$path, "' width='35' />"), logos$service)
labels[["other"]] <- "other<br />streaming<br />services"
# 执行下载操作
for (r in 1:8) {
download.file(logos$logo[r], logos$path[r], mode = 'wb')
}
# 利用公司的Logo再次绘制图形
ggplot(streaming_long) +
geom_col(aes(factor(service, levels = streaming$service),
share, fill = year), position = position_dodge(width = 0.9)) +
geom_point(aes(x = service, y = -10, color = year, fill = year), size = 4) +
geom_text(aes(service, share + 1, label = paste0(share, "%"), group = year),
position = position_dodge(width = 0.9), size = 3) +
scale_fill_manual(values = c(`2020` = "red3", `2021` = "black")) +
scale_color_manual(values = c(`2020` = "red3", `2021` = "black")) +
guides(fill = "none", color = guide_legend(direction = "horizontal")) +
scale_y_continuous(labels = scales::percent_format(scale = 1),
limits = c(0, 35)) +
labs(title = "US Streaming Market Share",
subtitle = "2020 vs 2021",
caption = "Source: Ampere Analytics via The Wrap
Other Streatming Services include ESPN+, Showtime,
Sling TV, Youtube TV, and Starz",
x = "", y = "") +
theme_minimal() +
theme(axis.text = element_text(size = 10),
plot.title = element_text(family = fontfamily, size = 28, hjust= 0.5),
plot.subtitle = element_text(family = fontfamily, size = 28, hjust = 0.5),
plot.caption = element_text(size = 7, color = "grey60"),
plot.background = element_rect(fill = "#f4f7fc", size = 0),
legend.title = element_blank(),
legend.text= element_text(size = 12),
panel.grid = element_blank(),
legend.position = c(0.85, 0.8)) +
scale_x_discrete(labels = labels) +
theme(axis.text.x = ggtext::element_markdown())
效果图如下:
这两个坑顺利解决后,完整代码如下:
# 数据准备
# 1 数据准备
# 数据是原材料,数工难为无数之学。
# 基于业务问题采集或者创建数据集
streaming <- tibble::tribble(
~service, ~`2020`, ~`2021`,
"netflix", 29, 20,
"prime", 21, 16,
"hulu", 16, 13,
"disney", 12, 11,
"apple", 4, 5,
"peacock", 0, 5,
"hbo", 3, 12,
"paramount", 2, 3,
"other", 13, 15,
)
# 数据集检视
streaming
# 2 数据整理
# 数据转换为方便可视化的数据格式
# 宽数据-->转换为长数据
# 使用tidyr包的pivot_longer函数
streaming_long <- tidyr::pivot_longer(streaming,
cols = -service,
names_to = "year",
values_to = "share")
streaming_long
# 3 数据可视化
# 对比分析,柱形图
library(ggplot2)
ggplot(streaming_long) +
geom_col(aes(factor(service, levels = streaming$service),
share, fill = year), position = position_dodge(width = 0.9)) +
geom_point(aes(x = service, y = -10, color = year, fill = year), size = 4) +
geom_text(aes(service, share + 1, label = paste0(share, "%"), group = year),
position = position_dodge(width = 0.9), size = 3) +
scale_fill_manual(values = c(`2020` = "red3", `2021` = "black")) +
scale_color_manual(values = c(`2020` = "red3", `2021` = "black")) +
guides(fill = "none", color = guide_legend(direction = "horizontal")) +
scale_y_continuous(labels = scales::percent_format(scale = 1),
limits = c(0, 35)) +
labs(title = "US Streaming Market Share",
subtitle = "2020 vs 2021",
caption = "Source: Ampere Analytics via The Wrap
Other Streatming Services include ESPN+, Showtime,
Sling TV, Youtube TV, and Starz",
x = "", y = "") +
theme_minimal() +
theme(axis.text = element_text(size = 10),
plot.title = element_text(size = 28, hjust= 0.5),
plot.subtitle = element_text(size = 28, hjust = 0.5),
plot.caption = element_text(size = 7, color = "grey60"),
plot.background = element_rect(fill = "#f4f7fc", size = 0),
legend.title = element_blank(),
legend.text= element_text(size = 12),
panel.grid = element_blank(),
legend.position = c(0.85, 0.8))
# 第一个坑:R语言字体设置解决方案
# 使用了Google的JosefinSlab-SemiBold字体
# 下载链接:
# https://fonts.google.com/specimen/Josefin+Slab
library(showtext)
font_add("Jose", "D:/Fonts/Josefin_Slab/static/JosefinSlab-SemiBold.ttf")
showtext_auto()
fontfamily <- "Jose"
# 图形字体设置
ggplot(streaming_long) +
geom_col(aes(factor(service, levels = streaming$service),
share, fill = year), position = position_dodge(width = 0.9)) +
geom_point(aes(x = service, y = -10, color = year, fill = year), size = 4) +
geom_text(aes(service, share + 1, label = paste0(share, "%"), group = year),
position = position_dodge(width = 0.9), size = 3) +
scale_fill_manual(values = c(`2020` = "red3", `2021` = "black")) +
scale_color_manual(values = c(`2020` = "red3", `2021` = "black")) +
guides(fill = "none", color = guide_legend(direction = "horizontal")) +
scale_y_continuous(labels = scales::percent_format(scale = 1),
limits = c(0, 35)) +
labs(title = "US Streaming Market Share",
subtitle = "2020 vs 2021",
caption = "Source: Ampere Analytics via The Wrap
Other Streatming Services include ESPN+, Showtime,
Sling TV, Youtube TV, and Starz",
x = "", y = "") +
theme_minimal() +
theme(axis.text = element_text(size = 10),
plot.title = element_text(family = fontfamily, size = 28, hjust= 0.5),
plot.subtitle = element_text(family = fontfamily, size = 28, hjust = 0.5),
plot.caption = element_text(size = 7, color = "grey60"),
plot.background = element_rect(fill = "#f4f7fc", size = 0),
legend.title = element_blank(),
legend.text= element_text(size = 12),
panel.grid = element_blank(),
legend.position = c(0.85, 0.8))
# 第二个坑:图片批量下载和应用
# 把可视化的x轴的公司名称换作公司的Logo
# 使用download.file()函数
# 1)记得设置参数mode='wb'
# 2)在项目下提前创建images文件夹
library(tidyverse)
wiki <- "https://upload.wikimedia.org/wikipedia/commons/thumb/"
logos <- tibble::tribble(
~service, ~logo,
"netflix", paste0(wiki, "0/08/Netflix_2015_logo.svg/340px-Netflix_2015_logo.svg.png"),
"prime", paste0(wiki, "1/11/Amazon_Prime_Video_logo.svg/450px-Amazon_Prime_Video_logo.svg.png"),
"hulu", paste0(wiki, "e/e4/Hulu_Logo.svg/440px-Hulu_Logo.svg.png"),
"disney", paste0(wiki, "3/3e/Disney%2B_logo.svg/320px-Disney%2B_logo.svg.png"),
"apple", paste0(wiki, "2/28/Apple_TV_Plus_Logo.svg/500px-Apple_TV_Plus_Logo.svg.png"),
"peacock", paste0(wiki, "d/d3/NBCUniversal_Peacock_Logo.svg/440px-NBCUniversal_Peacock_Logo.svg.png"),
"hbo", paste0(wiki, "d/de/HBO_logo.svg/440px-HBO_logo.svg.png"),
"paramount", paste0(wiki, "a/a5/Paramount_Plus.svg/440px-Paramount_Plus.svg.png"),
"other", "other.png"
) %>%
mutate(path = file.path("images", paste(service, tools::file_ext(logo), sep = ".")))
logos
labels <- setNames(paste0("<img src='", logos$path, "' width='35' />"), logos$service)
labels[["other"]] <- "other<br />streaming<br />services"
# 执行下载操作
for (r in 1:8) {
download.file(logos$logo[r], logos$path[r], mode = 'wb')
}
# 利用公司的Logo再次绘制图形
ggplot(streaming_long) +
geom_col(aes(factor(service, levels = streaming$service),
share, fill = year), position = position_dodge(width = 0.9)) +
geom_point(aes(x = service, y = -10, color = year, fill = year), size = 4) +
geom_text(aes(service, share + 1, label = paste0(share, "%"), group = year),
position = position_dodge(width = 0.9), size = 3) +
scale_fill_manual(values = c(`2020` = "red3", `2021` = "black")) +
scale_color_manual(values = c(`2020` = "red3", `2021` = "black")) +
guides(fill = "none", color = guide_legend(direction = "horizontal")) +
scale_y_continuous(labels = scales::percent_format(scale = 1),
limits = c(0, 35)) +
labs(title = "US Streaming Market Share",
subtitle = "2020 vs 2021",
caption = "Source: Ampere Analytics via The Wrap
Other Streatming Services include ESPN+, Showtime,
Sling TV, Youtube TV, and Starz",
x = "", y = "") +
theme_minimal() +
theme(axis.text = element_text(size = 10),
plot.title = element_text(family = fontfamily, size = 28, hjust= 0.5),
plot.subtitle = element_text(family = fontfamily, size = 28, hjust = 0.5),
plot.caption = element_text(size = 7, color = "grey60"),
plot.background = element_rect(fill = "#f4f7fc", size = 0),
legend.title = element_blank(),
legend.text= element_text(size = 12),
panel.grid = element_blank(),
legend.position = c(0.85, 0.8)) +
scale_x_discrete(labels = labels) +
theme(axis.text.x = ggtext::element_markdown())
大家可以进一步思考:
1 如何优化这个图形?
2 对于这份数据,还可以通过哪些可视化来传达信息?(选择可视化的形式之前,请思考你要给什么受众传达什么信息。)
参考资料:
1https://jcarroll.com.au/2021/07/02/improving-a-visualization/
2https://cran.rstudio.com/web/packages/showtext/vignettes/introduction.html
4https://zhuanlan.zhihu.com/p/27649216
伙伴们,给大家推荐【数据科学与人工智能】公众号,它分享和传播数据科学与人工智能的知识和技能。
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!