【典型算法】Apriori
R中有两个专用于关联分析的软件包——arules和arulesViz。其中arules用于管理规则的数字化生产,提供Apriori和Eclat两种快速挖掘频繁项集和关联规则算法的现实函数;arulesViz作为arules的扩展包,提供了实用而新颖的关联规则可视化技术,使得关联分析从算法运行到结果呈现一体化。
arules包apriori()函数用法:
apriori(data,parameter=NULL,appearance=NULL,control=NULL)
参数说明如下(来自R语言帮助文档):
表-apriori()函数参数说明
参数 |
意义 |
data |
可转换类别的实体或任何可以将其强制进行转换的数据结构(例如:一个二进制矩阵或数据框)。 |
parameter |
类别为APparameter的实体或命名列表。默认的挖掘规则为:支持度0.1,置信度0.8,最大长度10(此处长度指结果个数)。 |
appearance |
类别为APparameter的实体或命名列表。使用该参数来控制对应项。默认情况下,所有项目无限制。 |
control |
对象的类APcontrol或命名列表。控制挖掘算法的性能(例如:项目排序) |
例子–使用R语言帮助中的案例,优化语句,如下:
——————————————————————————————————————–
data("Adult")
##Mine association rules.
rules <- apriori(Adult, parameter = list(supp = 0.7, conf = 0.9, target = "rules"))
summary(rules)
inspect(rules)
——————————————————————————————————————–
结果如下:
set of 17 rules
rule length distribution (lhs + rhs):sizes
1 2 3
2 7 8
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 2.000 2.353 3.000 3.000
summary of quality measures:
support confidence lift
Min. :0.7195 Min. :0.9100 Min. :0.9919
1st Qu.:0.7490 1st Qu.:0.9143 1st Qu.:0.9951
Median :0.7818 Median :0.9206 Median :0.9970
Mean :0.8029 Mean :0.9300 Mean :1.0017
3rd Qu.:0.8548 3rd Qu.:0.9491 3rd Qu.:1.0000
Max. :0.9533 Max. :0.9533 Max. :1.0271
mining info:
data ntransactions support confidence
Adult 48842 0.7 0.9
频繁集如下:
lhs rhs support confidence lift |
1 {} => {capital-gain=None} 0.9173867 0.9173867 1.0000000 |
2 {} => {capital-loss=None} 0.9532779 0.9532779 1.0000000 |
3 {race=White} => {native-country=United-States} 0.7881127 0.9217231 1.0270761 |
4 {race=White} => {capital-gain=None} 0.7817862 0.9143240 0.9966616 |
5 {race=White} => {capital-loss=None} 0.8136849 0.9516307 0.9982720 |
6 {native-country=United-States} => {capital-gain=None} 0.8219565 0.9159062 0.9983862 |
7 {native-country=United-States} => {capital-loss=None} 0.8548380 0.9525461 0.9992323 |
8 {capital-gain=None} => {capital-loss=None} 0.8706646 0.9490705 0.9955863 |
9 {capital-loss=None} => {capital-gain=None} 0.8706646 0.9133376 0.9955863 |
10 {race=White, |
native-country=United-States} => {capital-gain=None} 0.7194628 0.9128933 0.9951019 |
11 {race=White, |
capital-gain=None} => {native-country=United-States} 0.7194628 0.9202807 1.0254689 |
12 {race=White, |
native-country=United-States} => {capital-loss=None} 0.7490480 0.9504325 0.9970152 |
13 {race=White, |
capital-loss=None} => {native-country=United-States} 0.7490480 0.9205626 1.0257830 |
14 {race=White, |
capital-gain=None} => {capital-loss=None} 0.7404283 0.9470983 0.9935175 |
15 {race=White, |
capital-loss=None} => {capital-gain=None} 0.7404283 0.9099693 0.9919147 |
16 {capital-gain=None, |
native-country=United-States} => {capital-loss=None} 0.7793702 0.9481891 0.9946618 |
17 {capital-loss=None, |
native-country=United-States} => {capital-gain=None} 0.7793702 0.9117168 0.9938195 |
通过支持度、置信度和提升度来对结果进行控制和排序。如下:
——————————————————————————————————————–
rules.sort1<-sort(rules,by="support")
inspect(rules.sort)
rules.sort2<-sort(rules,by="confidence")
inspect(rules.sort)
rules.sort3<-sort(rules,by="lift")
inspect(rules.sort)
——————————————————————————————————————–
使用arules.Viz包进行关联规则的可视化。如下:
——————————————————————————————————————–
library(arules.Viz)
rules<-apriori(Adult,parameter=list(supp=0.7,conf=0.9,target="rules"))
plot(rules)
plot(rules,interactive=TRUE)
plot(rules,method=”grouped”)
来源:http://www.data-analyse.cn/r/322.html
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!