模型|公司需要模型的可解释性
编者按:模型的可解释是非常重要,很多业务问题的解决与策略,都需要合理的解释。金融科技领域的评分卡模型,为什么偏爱逻辑回归算法,模型的可解释性是一个很重要的原因。我创建了R语言微信群,定位:R语言学习与实践,要进群的朋友,添加我微信:luqin360。
对于机器学习模型的可解释性有很多方法,但是它们都缺少什么呢?
LIME的ICE和部分依赖图不能告诉我拟合关系的准确性。此外,ICE没有确切地告诉我每条线发生的概率。您的模型很容易对数据进行欠拟合和过拟合,尤其是在使用深度学习模型时。LIME(应该叫LAME)没能告诉我模型实际上是如何运行的。
假设您正在处理一个价格弹性模型,该模型将指导定价决策。目前,您将显示模型能够拟合的关系。考虑到我们将使用一个模型来指导定价决策,一个明智的利益相关者可能会问,“我看到了您的模型所拟合的关系,但是我如何知道它与实际的关系相对应呢?”
你会做什么?给利益相关者一些模型精度度量?告诉他们你使用了深度学习,所以他们应该相信它,因为它是最先进的技术?
这里有一个解决部分依赖关系图不足的简单方法:对预测关系使用校准。就是这么简单。下面是来自r中的RemixAutoML包的一个示例图。x轴是感兴趣的自变量。刻度间的间距是根据自变量分布的百分位数确定的。这意味着,在x轴上,数据是均匀分布的,因此不需要如ICE图表中所示的dashes。 其次,我们可以看到自变量与目标变量的关系,部分依赖图也是如此,但我们也可以看到模型在自变量范围内的拟合程度。 这解决了利益相关者对预测准确性的怀疑。 如果您想查看预测的可变性,请使用boxplot版本。 如果要查看特定组的关系,只需将数据子集化,以便仅包含该感兴趣的组,然后重新运行该函数。
#######################################################
# Create data to simulate validation data with predicted values
#######################################################
# Correl: This is the correlation used to determine how correlated the variables are to
# the target variable. Switch it up (between 0 and 1) to see how the charts below change.
Correl <- 0.85
data <- data.table::data.table(Target = runif(1000))
# Mock independent variables - they are correlated variables with
# various transformations so you can see different kinds of relationships
# in the charts below
# Helper columns for creating simulated variables
data[, x1 := qnorm(Target)]
data[, x2 := runif(1000)]
# Create one variable at a time
data[, Independent_Variable1 := log(pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable2 := (pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable3 := exp(pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable4 := exp(exp(pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2))))]
data[, Independent_Variable5 := sqrt(pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable6 := (pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2)))^0.10]
data[, Independent_Variable7 := (pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2)))^0.25]
data[, Independent_Variable8 := (pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2)))^0.75]
data[, Independent_Variable9 := (pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2)))^2]
data[, Independent_Variable10 := (pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2)))^4]
data[, Independent_Variable11 := ifelse(Independent_Variable2 < 0.20, "A",
ifelse(Independent_Variable2 < 0.40, "B",
ifelse(Independent_Variable2 < 0.6, "C",
ifelse(Independent_Variable2 < 0.8, "D", "E"))))]
# We’ll use this as a mock predicted value
data[, Predict := (pnorm(Correl * x1 +
sqrt(1-Correl^2) * qnorm(x2)))]
# Remove the helper columns
data[, ':=' (x1 = NULL, x2 = NULL)]
# In the ParDepCalPlot() function below, note the Function argument -
# we are using mean() to aggregate our values but you
# can use quantile(x, probs = y) for quantile regression
# Partial Dependence Calibration Plot:
p1 <- RemixAutoML::ParDepCalPlots(data,
PredictionColName = "Predict",
TargetColName = "Target",
IndepVar = "Independent_Variable1",
GraphType = "calibration",
PercentileBucket = 0.05,
FactLevels = 10,
Function = function(x) mean(x, na.rm = TRUE))
# Partial Dependence Calibration BoxPlot: note the GraphType argument
p2 <- RemixAutoML::ParDepCalPlots(data,
PredictionColName = "Predict",
TargetColName = "Target",
IndepVar = "Independent_Variable1",
GraphType = "boxplot",
PercentileBucket = 0.05,
FactLevels = 10,
Function = function(x) mean(x, na.rm = TRUE))
# Partial Dependence Calibration Plot:
p3 <- RemixAutoML::ParDepCalPlots(data,
PredictionColName = "Predict",
TargetColName = "Target",
IndepVar = "Independent_Variable4",
GraphType = "calibration",
PercentileBucket = 0.05,
FactLevels = 10,
Function = function(x) mean(x, na.rm = TRUE))
# Partial Dependence Calibration BoxPlot for factor variables:
p4 <- RemixAutoML::ParDepCalPlots(data,
PredictionColName = "Predict",
TargetColName = "Target",
IndepVar = "Independent_Variable11",
GraphType = "calibration",
PercentileBucket = 0.05,
FactLevels = 10,
Function = function(x) mean(x, na.rm = TRUE))
# Plot all the individual graphs in a single pane
RemixAutoML::multiplot(plotlist = list(p1,p2,p3,p4), cols = 2)

原文链接:
https://www.remixinstitute.com/blog/companies-are-demanding-model-interpretability-heres-how-to-do-it-right/#.XMpYk44zbIU
您若是觉得有用,请点赞和分享给朋友或者同事。
您有任何问题或者想法,请留言或者评论。
内容推荐
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!