一直以來機器學習希望解決的一個問題就是'what if',也就是決策指導:
- 如果我給用戶發優惠券用戶會留下來么?
- 如果患者服了這個葯血壓會降低么?
- 如果APP增加這個功能會增加用戶的使用時長么?
- 如果實施這個貨幣政策對有效提振經濟么?
這類問題之所以難以解決是因為ground truth在現實中是觀測不到的,一個已經服了葯的患者血壓降低但我們無從知道在同一時刻如果他沒有服葯血壓是不是也會降低。
這個時候做分析的同學應該會說我們做AB實驗!我們估計整體差異,顯著就是有效,不顯著就是無效。但我們能做的只有這些么?
當然不是!因為每個個體都是不同的!整體無效不意味着局部群體無效!
- 如果只有5%的用戶對發優惠券敏感,我們能只觸達這些用戶么?或者不同用戶對優惠券敏感的閾值不同,如何通過調整優惠券的閾值吸引更多的用戶?
- 如果降壓葯只對有特殊症狀的患者有效,我們該如何找到這些患者?
- APP的新功能部分用戶不喜歡,部分用戶很喜歡,我能通過比較這些用戶的差異找到改進這個新功能的方向么?
以下方法從不同的角度嘗試解決這個問題,但基本思路是一致的:我們無法觀測到每個用戶的treatment effect,但我們可以找到一群相似用戶來估計實驗對他們的影響。
我會在之后的博客中,從CasualTree的第二篇Recursive partitioning for heterogeneous causal effects開始梳理下述方法中的異同。
整個領域還在發展中,幾個開源代碼都剛release不久,所以這個博客也會持續更新。如果大家看到好的文章和工程實現也歡迎在下面評論~
Uplift Modelling/Causal Tree
- Nicholas J Radcliffe and Patrick D Surry. Real-world uplift modelling with significance based uplift trees. White Paper TR-2011-1, Stochastic Solutions, 2011.[文章鏈接]
- Rzepakowski, P. and Jaroszewicz, S., 2012. Decision trees for uplift modeling with single and multiple treatments. Knowledge and Information Systems, 32(2), pp.303-327.[文章鏈接]
- Yan Zhao, Xiao Fang, and David Simchi-Levi. Uplift modeling with multiple treatments and general response types. Proceedings of the 2017 SIAM International Conference on Data Mining, SIAM, 2017. [文章鏈接] [Github鏈接]
- Athey, S., and Imbens, G. W. 2015. Machine learning methods for
estimating heterogeneous causal effects. stat 1050(5) [文章鏈接] - Athey, S., and Imbens, G. 2016. Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of
Sciences. [文章鏈接] [Github鏈接] - C. Tran and E. Zheleva, “Learning triggers for heterogeneous treatment effects,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019 [文章鏈接] [Github鏈接]
Forest Based Estimators
- Wager, S. & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association .
- M. Oprescu, V. Syrgkanis and Z. S. Wu. Orthogonal Random Forest for Causal Inference. Proceedings of the 36th International Conference on Machine Learning (ICML), 2019 [文章鏈接] [GitHub鏈接]
Double Machine Learning
- V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and a. W. Newey. Double Machine Learning for Treatment and Causal Parameters. ArXiv e-prints [文章鏈接] [Github鏈接]
- V. Chernozhukov, M. Goldman, V. Semenova, and M. Taddy. Orthogonal Machine Learning for Demand Estimation: High Dimensional Causal Inference in Dynamic Panels. ArXiv e-prints, December 2017.
- V. Chernozhukov, D. Nekipelov, V. Semenova, and V. Syrgkanis. Two-Stage Estimation with a High-Dimensional Second Stage. 2018.
- X. Nie and S. Wager. Quasi-Oracle Estimation of Heterogeneous Treatment Effects. arXiv preprint arXiv:1712.04912, 2017.[文章連接]
- D. Foster and V. Syrgkanis. Orthogonal Statistical Learning. arXiv preprint arXiv:1901.09036, 2019 [文章鏈接]
Meta Learner
- C. Manahan, 2005. A proportional hazards approach to campaign list selection. In SAS User Group International (SUGI) 30 Proceedings.
- Green DP, Kern HL (2012) Modeling heteroge-neous treatment effects in survey experiments with Bayesian additive regression trees. Public OpinionQuarterly 76(3):491–511.
- Sören R. Künzel, Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the National Academy of Sciences, 2019. [文章鏈接] [GitHub鏈接]
Deep Learning
- Shalit, U., Johansson, F. D., & Sontag, D. (2017). Estimating individual treatment effect: generalization bounds and algorithms. Proceedings of the 34th International Conference on Machine Learning (ICML 2017).[文章鏈接]
- Alaa, A. M., Weisz, M., & van der Schaar, M. (2017). Deep Counterfactual Networks with Propensity-Dropout. ArXiv E-Prints, arXiv:1706.05966.[文章鏈接]
- Shi, C., Blei, D. M., & Veitch, V. (2019). Adapting Neural Networks for the Estimation of Treatment Effects. ArXiv:1906.02120
[文章鏈接] [Github鏈接]
Uber專場
最早就是uber的博客在茫茫paper的海洋中幫我找到了方向,如今聽說它們AI LAB要解散了有些傷感,作為HTE最多star的開源方,它們值得擁有一個part
- Shuyang Du, James Lee, Farzin Ghaffarizadeh, 2017, Improve User Retention with Causal Learning [文章連接]
- Zhenyu Zhao, Totte Harinen, 2020, Uplift Modeling for Multiple Treatments with Cost [文章連接]
- Will Y. Zou, Smitha Shyam, Michael Mui, Mingshi Wang, 2020, Learning Continuous Treatment Policy and Bipartite Embeddings for Matching with Heterogeneous Causal Effects
Optimization [文章鏈接] - Will Y. Zou,Shuyang Du,James Lee,Jan Pedersen, 2020, Heterogeneous Causal Learning for Effectiveness Optimization
in User Marketing [文章連接]
持續更新中 ~