高級統計方法 | Advanced statistical methods

本文轉載自查看原文 2020-04-06 21:47 712 統計

來自選修的一門統計課程：Advanced statistical methods

理論性較弱，實踐性很強的工具類課程，學完后可以直接拿R來分析數據。

課程目錄：

Introduction to R
Regression model in R
Applied regression I
Applied regression II
Applied regression III
Conditional logistic regression and propensity score method
Inverse probability weighting and meta analysis
Instrumental variable analysis

課程結業標准：

Appropriate analytic method
Accurate numerical results
Clear presentation of the results and choice of methods
Interpretation of the results relevant to the public health context

1 Introduction to R

Use R to perform basic algebraic operations
Work with variables, vectors and matrices in R
Produce clear and well formatted graphs in R
Install and load R packages for specific needs

數據基本操作

基本運算符：+ - * / ^

基本運算函數：sqrt、exp、log、abs、round

幫助：？、？？

數據基本操作函數：rep、seq、length、sum、mean、sd、median、min、max、var、sort、order、which、summary、sample、runif

矩陣運算：%*%、solve、t、colSums、colMeans、dim、cbind、rbind

邏輯運算：! & |

判斷：is.na is.factor

類型轉換：as.factor

數據轉換：aggregate、plyr包、melt、table、prop.table

文件讀取

文件存儲

繪圖

基本繪圖

plot

pairs

hist

boxplot

points

lines

text

abline

polygon

legend

title

axis

par

windows

layout

pdf

dev.off

高級繪圖

ggplot2

cowplot

2 Regression model in R

生成隨機分布的數據：

runif - 均勻分布

rbinom

rnorm

sample

set.seed

cut

factor

relevel

線性回歸

simple linear regression

multiple linear regression

interactions

summary

Residuals - the difference between the actual observed response values

Coefficients

【必須了解summary結果里面的每一個指標及其意義】

confint

QUICK GUIDE: INTERPRETING SIMPLE LINEAR MODEL OUTPUT IN R

3 Applied regression I

針對特定的數據使用合適的模型

Apply poisson and negative binomial regression models to count data
Identify and apply suitable model to overdispersed data

count data

Nonnegative
positively skewed
Variance tends to increase with mean
不符合Homoscedasticity, Normality

Generalized Linear Model (GLM)

maximum likelihood

很奇怪，對1回歸，summary(glm(deaths ~ 1, data=horse, family=poisson))？

Dispersion parameter for poisson family taken to be 1

glm的summary結果解讀

Model checking

compare the observed event counts to data that we might have expected, under a Poisson(0.61) model

Formal model goodness-of-fit

residual deviance/df should not be too much bigger than 1

A Poisson model with covariates in R

summary(glm(deaths~corps, data=horse, family=poisson))

Incidence rate ratios (IRR) / relative risks

Poisson regression with offsets

Overdispersion - Negative Binomial model

the variance (823.475) is much larger than the mean (28.41)

summary(glm.nb(y~1, data=epilepsy))

Comparing models

A lower AIC indicates a ‘better’ model

4 Applied regression II

Apply Poisson and negative binomial regression models to count data
Identify and apply suitable model to overdispersed data
Identify influential observations影響點，去掉某點后的影響力大小
Perform model diagnostics
Understand and deal with multicollinearity

hatvalues(mvc.r.lm)

sort(round(cooks.distance(mvc.r.lm),2), decreasing=T)

Model diagnostics

Estimation method and statistical tests are based on model assumptions

potential violated assumptions
extent of violation
Acknowledge limitation
alternative statistical model

Assumptions of linear regression model

Linearity
Homoscedasticity
Normality of the errors
Independence

Residual plot against fitted values

Q-Q Plot

P-P Plot

ACF plot

Multicollinearity

VIF

5 Applied regression III

Identify and handle multicollinearity
Account for confounding factors in regression model
Assess potential effect modifiers in regression model
Perform basic mediation analysis

6 Conditional logistic regression and propensity score method

Fit conditional logistic regression model to data from case control study
Understand the assumptions of the propensity score method
Interpret results from propensity score method

7 Inverse probability weighting and meta analysis

Appreciate the use of inverse probability weighting
Apply inverse probability weighting for analysis of missing data
Perform meta analysis to obtain overall estimate of an intervention effect from multiple studies

8 Instrumental variable analysis

Estimate treatment effect using instrumental variable analysis for noncontrolled experiment
Understand the assumptions instrumental variable analysis
Interpret results from instrumental variable analysis

基本概念：

OR和β（estimated coefficients）

Final exam

An investigator conducted a retrospective analysis on the association between statin therapy and psychological disorders, based on a database of medical records. The analysis adjusted for potential confounders such as age, sex, BMI and comorbidity.

研究人員根據病歷數據庫對他汀類（statin）葯物治療與心理疾病之間的關聯進行了回顧性分析（retrospective analysis）。該分析針對潛在的混雜因素（例如年齡，性別，BMI和合並症）進行了調整。

變量Variable name

Id
Male
Age
Bmi
comorbid.s, Charlson comorbidity index
Statin, Statin users
Psych, Psychological disorder

  id male age  bmi comorbid.s statin psych
1  1    0  54 20.9          1      0     0
2  2    0  42 19.1          0      0     0
3  3    1  46 23.9          1      1     0
4  4    1  58 23.5          0      0     1
5  5    1  43 28.7          1      1     0
6  6    1  46 26.6          0      1     0

問題：

(A) Carry out a standard regression analysis to estimate the effect of statin therapy on psychological disorder, adjusting for sex, age, BMI and comorbidity. Present the odds ratios with 95% confidence intervals for the variables in a
table. [10%] 標准的線性模型

The investigator also decided to carry out a propensity score analysis. PSA分析參考作業2
(B) Fit a propensity score model to predict statin use. You may consider main effects only (even when not all patient characteristics can be satisfactorily balanced). Present and interpret the model results. [8%]
(C) Based on your propensity score model, how well the patient characteristics were balanced across statin users and non-users with similar propensity scores? [6%]
(D) State the key assumptions of propensity score analysis and assess if they are satisfied. [6%]
(E) Do you think it is appropriate to use propensity score analysis in this setting? Briefly explain why. [4%]
(F) Estimate the effect of statin therapy (and the corresponding 95% CI) on psychological disorder and compare with the results in (A). [8%]
(G) Based on the results in (A) - (F), summarize and interpret the main findings from the analyses. [8%]

結題思路：

1. 可以用的模型，標准linear regression；GLM：possion、NB；clogit等

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Vue——方法（methods） Vue - methods 方法 Java 8 默認方法（Default Methods） Vue中methods方法this指向問題 vue 在methods中調用mounted中的方法？ DRL之：策略梯度方法　（Policy Gradient Methods） vue methods 中方法的相互調用 APIC: Advanced Programmable Interrupt Controller高級可編程中斷控制器總結谷歌高級搜索 (Google Advanced Search) 語法學習筆記 FPGA中改善時序性能的方法_advanced FPGA design