高級統計方法 | Advanced statistical methods


來自選修的一門統計課程:Advanced statistical methods

理論性較弱,實踐性很強的工具類課程,學完后可以直接拿R來分析數據。

 

課程目錄:

  1. Introduction to R
  2. Regression model in R
  3. Applied regression I
  4. Applied regression II
  5. Applied regression III
  6. Conditional logistic regression and propensity score method
  7. Inverse probability weighting and meta analysis
  8. Instrumental variable analysis

 

課程結業標准:

  1. Appropriate analytic method
  2. Accurate numerical results
  3. Clear presentation of the results and choice of methods
  4. Interpretation of the results relevant to the public health context

 

1 Introduction to R

  1. Use R to perform basic algebraic operations
  2. Work with variables, vectors and matrices in R
  3. Produce clear and well formatted graphs in R
  4. Install and load R packages for specific needs

數據基本操作

基本運算符:+ - * / ^

基本運算函數:sqrt、exp、log、abs、round

幫助:?、??

數據基本操作函數:rep、seq、length、sum、mean、sd、median、min、max、var、sort、order、which、summary、sample、runif

矩陣運算:%*%、solve、t、colSums、colMeans、dim、cbind、rbind

邏輯運算:! & |

判斷:is.na is.factor

類型轉換:as.factor

數據轉換:aggregate、plyr包、melt、table、prop.table

文件讀取

文件存儲

繪圖

基本繪圖

plot

pairs

hist

boxplot

points

lines

text

abline

polygon

legend

title

axis

par

windows

layout

pdf

dev.off

高級繪圖

ggplot2

cowplot

 

2 Regression model in R

生成隨機分布的數據:

runif - 均勻分布

rbinom

rnorm

sample

set.seed

cut

factor

relevel

線性回歸

simple linear regression

multiple linear regression

interactions

lm

summary

Residuals - the difference between the actual observed response values

Coefficients

【必須了解summary結果里面的每一個指標及其意義】

CI

confint

QUICK GUIDE: INTERPRETING SIMPLE LINEAR MODEL OUTPUT IN R

 

3 Applied regression I

針對特定的數據使用合適的模型

  • Apply poisson and negative binomial regression models to count data
  • Identify and apply suitable model to overdispersed data

count data

  • Nonnegative
  • positively skewed
  • Variance tends to increase with mean
  • 不符合Homoscedasticity, Normality

Generalized Linear Model (GLM)

maximum likelihood

很奇怪,對1回歸,summary(glm(deaths ~ 1, data=horse, family=poisson))?

Dispersion parameter for poisson family taken to be 1

glm的summary結果解讀

Model checking

compare the observed event counts to data that we might have expected, under a Poisson(0.61) model

Formal model goodness-of-fit

residual deviance/df should not be too much bigger than 1

A Poisson model with covariates in R

summary(glm(deaths~corps, data=horse, family=poisson))

Incidence rate ratios (IRR) / relative risks

Poisson regression with offsets

Overdispersion - Negative Binomial model

the variance (823.475) is much larger than the mean (28.41)

summary(glm.nb(y~1, data=epilepsy))

Comparing models

A lower AIC indicates a ‘better’ model

 

4 Applied regression II

  1. Apply Poisson and negative binomial regression models to count data
  2. Identify and apply suitable model to overdispersed data
  3. Identify influential observations影響點,去掉某點后的影響力大小
  4. Perform model diagnostics
  5. Understand and deal with multicollinearity

hatvalues(mvc.r.lm)

sort(round(cooks.distance(mvc.r.lm),2), decreasing=T)

Model diagnostics

Estimation method and statistical tests are based on model assumptions

  • potential violated assumptions
  • extent of violation
  • Acknowledge limitation
  • alternative statistical model

Assumptions of linear regression model

  • Linearity
  • Homoscedasticity
  • Normality of the errors
  • Independence

Residual plot against fitted values

Q-Q Plot

P-P Plot

ACF plot

Multicollinearity

VIF

 

5 Applied regression III

  1. Identify and handle multicollinearity
  2. Account for confounding factors in regression model
  3. Assess potential effect modifiers in regression model
  4. Perform basic mediation analysis


6 Conditional logistic regression and propensity score method

  1. Fit conditional logistic regression model to data from case control study
  2. Understand the assumptions of the propensity score method
  3. Interpret results from propensity score method


7 Inverse probability weighting and meta analysis

  1. Appreciate the use of inverse probability weighting
  2. Apply inverse probability weighting for analysis of missing data
  3. Perform meta analysis to obtain overall estimate of an intervention effect from multiple studies


8 Instrumental variable analysis

  1. Estimate treatment effect using instrumental variable analysis for noncontrolled experiment
  2. Understand the assumptions instrumental variable analysis
  3. Interpret results from instrumental variable analysis

 

基本概念:

RR

OR和β(estimated coefficients)

 

 

Final exam

An investigator conducted a retrospective analysis on the association between statin therapy and psychological disorders, based on a database of medical records. The analysis adjusted for potential confounders such as age, sex, BMI and comorbidity.

研究人員根據病歷數據庫對他汀類(statin)葯物治療與心理疾病之間的關聯進行了回顧性分析(retrospective analysis)。 該分析針對潛在的混雜因素(例如年齡,性別,BMI和合並症)進行了調整。

變量Variable name

  • Id
  • Male
  • Age
  • Bmi
  • comorbid.s, Charlson comorbidity index
  • Statin, Statin users
  • Psych, Psychological disorder
  id male age  bmi comorbid.s statin psych
1  1    0  54 20.9          1      0     0
2  2    0  42 19.1          0      0     0
3  3    1  46 23.9          1      1     0
4  4    1  58 23.5          0      0     1
5  5    1  43 28.7          1      1     0
6  6    1  46 26.6          0      1     0

-

問題:

(A) Carry out a standard regression analysis to estimate the effect of statin therapy on psychological disorder, adjusting for sex, age, BMI and comorbidity. Present the odds ratios with 95% confidence intervals for the variables in a
table. [10%] 標准的線性模型

The investigator also decided to carry out a propensity score analysis. PSA分析參考作業2
(B) Fit a propensity score model to predict statin use. You may consider main effects only (even when not all patient characteristics can be satisfactorily balanced). Present and interpret the model results. [8%]
(C) Based on your propensity score model, how well the patient characteristics were balanced across statin users and non-users with similar propensity scores? [6%]
(D) State the key assumptions of propensity score analysis and assess if they are satisfied. [6%]
(E) Do you think it is appropriate to use propensity score analysis in this setting? Briefly explain why. [4%]
(F) Estimate the effect of statin therapy (and the corresponding 95% CI) on psychological disorder and compare with the results in (A). [8%]
(G) Based on the results in (A) - (F), summarize and interpret the main findings from the analyses. [8%]

結題思路:

1. 可以用的模型,標准linear regression;GLM:possion、NB;clogit等

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM