來自選修的一門統計課程:Advanced statistical methods
理論性較弱,實踐性很強的工具類課程,學完后可以直接拿R來分析數據。
課程目錄:
- Introduction to R
- Regression model in R
- Applied regression I
- Applied regression II
- Applied regression III
- Conditional logistic regression and propensity score method
- Inverse probability weighting and meta analysis
- Instrumental variable analysis
課程結業標准:
- Appropriate analytic method
- Accurate numerical results
- Clear presentation of the results and choice of methods
- Interpretation of the results relevant to the public health context
1 Introduction to R
- Use R to perform basic algebraic operations
- Work with variables, vectors and matrices in R
- Produce clear and well formatted graphs in R
- Install and load R packages for specific needs
數據基本操作
基本運算符:+ - * / ^
基本運算函數:sqrt、exp、log、abs、round
幫助:?、??
數據基本操作函數:rep、seq、length、sum、mean、sd、median、min、max、var、sort、order、which、summary、sample、runif
矩陣運算:%*%、solve、t、colSums、colMeans、dim、cbind、rbind
邏輯運算:! & |
判斷:is.na is.factor
類型轉換:as.factor
數據轉換:aggregate、plyr包、melt、table、prop.table
文件讀取
文件存儲
繪圖
基本繪圖
plot
pairs
hist
boxplot
points
lines
text
abline
polygon
legend
title
axis
par
windows
layout
dev.off
高級繪圖
ggplot2
cowplot
2 Regression model in R
生成隨機分布的數據:
runif - 均勻分布
rbinom
rnorm
sample
set.seed
cut
factor
relevel
線性回歸
simple linear regression
multiple linear regression
interactions
lm
summary
Residuals - the difference between the actual observed response values
Coefficients
【必須了解summary結果里面的每一個指標及其意義】
CI
confint
QUICK GUIDE: INTERPRETING SIMPLE LINEAR MODEL OUTPUT IN R
3 Applied regression I
針對特定的數據使用合適的模型
- Apply poisson and negative binomial regression models to count data
- Identify and apply suitable model to overdispersed data
count data
- Nonnegative
- positively skewed
- Variance tends to increase with mean
- 不符合Homoscedasticity, Normality
Generalized Linear Model (GLM)
maximum likelihood
很奇怪,對1回歸,summary(glm(deaths ~ 1, data=horse, family=poisson))?
Dispersion parameter for poisson family taken to be 1
glm的summary結果解讀
Model checking
compare the observed event counts to data that we might have expected, under a Poisson(0.61) model
Formal model goodness-of-fit
residual deviance/df should not be too much bigger than 1
A Poisson model with covariates in R
summary(glm(deaths~corps, data=horse, family=poisson))
Incidence rate ratios (IRR) / relative risks
Poisson regression with offsets
Overdispersion - Negative Binomial model
the variance (823.475) is much larger than the mean (28.41)
summary(glm.nb(y~1, data=epilepsy))
Comparing models
A lower AIC indicates a ‘better’ model
4 Applied regression II
- Apply Poisson and negative binomial regression models to count data
- Identify and apply suitable model to overdispersed data
- Identify influential observations影響點,去掉某點后的影響力大小
- Perform model diagnostics
- Understand and deal with multicollinearity
hatvalues(mvc.r.lm)
sort(round(cooks.distance(mvc.r.lm),2), decreasing=T)
Model diagnostics
Estimation method and statistical tests are based on model assumptions
- potential violated assumptions
- extent of violation
- Acknowledge limitation
- alternative statistical model
Assumptions of linear regression model
- Linearity
- Homoscedasticity
- Normality of the errors
- Independence
Residual plot against fitted values
Q-Q Plot
P-P Plot
ACF plot
Multicollinearity
VIF
5 Applied regression III
- Identify and handle multicollinearity
- Account for confounding factors in regression model
- Assess potential effect modifiers in regression model
- Perform basic mediation analysis
6 Conditional logistic regression and propensity score method
- Fit conditional logistic regression model to data from case control study
- Understand the assumptions of the propensity score method
- Interpret results from propensity score method
7 Inverse probability weighting and meta analysis
- Appreciate the use of inverse probability weighting
- Apply inverse probability weighting for analysis of missing data
- Perform meta analysis to obtain overall estimate of an intervention effect from multiple studies
8 Instrumental variable analysis
- Estimate treatment effect using instrumental variable analysis for noncontrolled experiment
- Understand the assumptions instrumental variable analysis
- Interpret results from instrumental variable analysis
基本概念:
RR
OR和β(estimated coefficients)
Final exam
An investigator conducted a retrospective analysis on the association between statin therapy and psychological disorders, based on a database of medical records. The analysis adjusted for potential confounders such as age, sex, BMI and comorbidity.
研究人員根據病歷數據庫對他汀類(statin)葯物治療與心理疾病之間的關聯進行了回顧性分析(retrospective analysis)。 該分析針對潛在的混雜因素(例如年齡,性別,BMI和合並症)進行了調整。
變量Variable name
- Id
- Male
- Age
- Bmi
- comorbid.s, Charlson comorbidity index
- Statin, Statin users
- Psych, Psychological disorder
id male age bmi comorbid.s statin psych 1 1 0 54 20.9 1 0 0 2 2 0 42 19.1 0 0 0 3 3 1 46 23.9 1 1 0 4 4 1 58 23.5 0 0 1 5 5 1 43 28.7 1 1 0 6 6 1 46 26.6 0 1 0
-
問題:
(A) Carry out a standard regression analysis to estimate the effect of statin therapy on psychological disorder, adjusting for sex, age, BMI and comorbidity. Present the odds ratios with 95% confidence intervals for the variables in a
table. [10%] 標准的線性模型
The investigator also decided to carry out a propensity score analysis. PSA分析參考作業2
(B) Fit a propensity score model to predict statin use. You may consider main effects only (even when not all patient characteristics can be satisfactorily balanced). Present and interpret the model results. [8%]
(C) Based on your propensity score model, how well the patient characteristics were balanced across statin users and non-users with similar propensity scores? [6%]
(D) State the key assumptions of propensity score analysis and assess if they are satisfied. [6%]
(E) Do you think it is appropriate to use propensity score analysis in this setting? Briefly explain why. [4%]
(F) Estimate the effect of statin therapy (and the corresponding 95% CI) on psychological disorder and compare with the results in (A). [8%]
(G) Based on the results in (A) - (F), summarize and interpret the main findings from the analyses. [8%]
結題思路:
1. 可以用的模型,標准linear regression;GLM:possion、NB;clogit等