1、什么是T test?
t-test:比較數據的均值,告訴你這兩者之間是否相同,並給出這種不同的顯著性(即是否是因為偶然導致的不同)
The t test (also called Student’s T Test) compares two averages (means) and tells you if they
are different from each other. The t test also tells you how significant the differences are;In
other words it lets you know if those differences could have happened by chance.
例子:制葯公司可能想測試一種新的抗癌葯物,看看它是否能提高預期壽命。在實驗中,總有一個對照組(給一組人服用安慰劑,或“糖丸”)。對照組的平均預期壽命為+5歲,而服用新葯的組的平均預期壽命為+6歲。看來這種葯可能有效。但這可能是一個巧合。為了驗證這一點,研究人員將使用Student’s t-test來發現結果是否可以在整個人群中重復
Student’s T-tests can be used in real life to compare means. For example, a drug company may
want to test a new cancer drug to find out if it improves life expectancy. In an experiment,
there’s always a control group (a group who are given a placebo, or “sugar pill”). The control
group may show an average life expectancy of +5 years, while the group taking the new drug
might have a life expectancy of +6 years. It would seem that the drug might work. But it could
be due to a fluke. To test this, researchers would use a Student’s t-test to find out if the
results are repeatable for an entire population.
2、The T Score.
T Score:是兩組之間的差異與組內差異的比值。t值越大,組間差異越大。t值越小,組間的相似性越大。t得分為3表示兩個組之間的差異是它們內部差異的三倍。當您運行t測試時,t值越大,結果越有可能是可重復的
The t score is a ratio between the difference between two groups and the difference within the
groups. The larger the t score, the more difference there is between groups. The smaller the t
score, the more similarity there is between groups. A t score of 3 means that the groups are
three times as different from each other as they are within each other. When you run a t test,
the bigger the t-value, the more likely it is that the results are repeatable
- A large t-score tells you that the groups are different.
- A small t-score tells you that the groups are similar
3、T-Values and P-values
P-values:“足夠大”有多大?每個t值都有一個p值。p值是樣本數據的結果偶然發生的概率。p值從0%到100%。它們通常寫成小數。例如,5%的p值是0.05。低p值是好的;它們表明您的數據不是偶然產生的。例如,p值為0.01意味着實驗結果碰巧發生的概率只有1%。在大多數情況下,p值為0.05(5%)表示數據有效
How big is “big enough”? Every t-value has a p-value to go with it. A p-value is the probability
that the results from your sample data occurred by chance. P-values are from 0% to 100%.
They are usually written as a decimal. For example, a p value of 5% is 0.05. Low p-values are
good; They indicate your data did not occur by chance. For example, a p-value of .01 means
there is only a 1% probability that the results from an experiment happened by chance. In
most cases, a p-value of 0.05 (5%) is accepted to mean the data is valid.
4、main types of t-test
An Independent Samples t-test((also called the unpaired samples t test)) compares the means for two groups.
A Paired sample t-test(also called a correlated pairs t-test, a paired samples t test or dependent
samples t test) compares means from the same group at different times (say, one year apart).
A One sample t-test tests the mean of a single group against a known mean.
4.1、 Independent Samples t-test
獨立樣本t檢驗(又稱未配對樣本t檢驗)是t檢驗最常見的形式。它幫助您比較兩組數據的方法。例如,你可以做一個t測試,看看男性和女性的平均測試成績是否不同;測試回答了這個問題:“這些差異可能是隨機產生的嗎?”
前提是需要三個假設:獨立假設、正態分布假設、方差同質性假設
Assumption of Independence: you need two independent, categorical groups that represent
your independent variable. In the above example of test scores “males” or “females” would be
your independent variable.
Assumption of normality: the dependent variable should be approximately normally
distributed. The dependent variable should also be measured on a continuous scale. In the
above example on average test scores, the “test score” would be the dependent variable.
Assumption of Homogeneity of Variance: The variances of the dependent variable should be equal.
手動計算:Calculating an Independent Samples T Test By hand(https://www.statisticshowto.datasciencecentral.com/independent-samples-t-test/)
Sample question: Calculate an independent samples t test for the following data sets:
Data set A: 1,2,2,3,3,4,4,5,5,6
Data set B: 1,2,4,5,5,5,6,6,7,9
Step 1: Sum the two groups(求各自樣品的總和):
A: 1 + 2 + 2 + 3 + 3 + 4 + 4 + 5 + 5 + 6 = 35
B: 1 + 2 + 4 + 5 + 5 + 5 + 6 + 6 + 7 + 9 = 50
Step 2: Square the sums from Step 1(求各自樣品總和的平方):
35 * 35= 1225
49 * 49 = 2401
Step 3: Calculate the means for the two groups:
A: (1 + 2 + 2 + 3 + 3 + 4 + 4 + 5 + 5 + 6)/10 = 35/10 = 3.5
B: (1 + 2 + 4 + 5 + 5 + 5 + 6 + 6 + 7 + 9) = 50/10 = 5
Step 4: Square the individual scores and then add them up:
A: 11 + 22 + 22 + 33 + 33 + 44 + 44 + 55 + 55 + 66 = 145
B: 12 + 22 + 44 + 55 + 55 + 55 + 66 + 66 + 77 + 99 = 298
Step 5: Find the Degrees of freedom
(nA-1 + nB-1) =(10-1)+(10-1)= 18
Step 6: Insert your numbers into the following formula and solve:
帶入求值結果為-1.69
(ΣA)2: Sum of data set A, squared (Step 2).
(ΣB)2: Sum of data set B, squared (Step 2).
μA: Mean of data set A (Step 3)
μB: Mean of data set B (Step 3)
ΣA2: Sum of the squares of data set A (Step 4)
ΣB2: Sum of the squares of data set B (Step 4)
nA: Number of items in data set A
nB: Number of items in data set B
Step 7: Look up your degrees of freedom (Step 5) in the t-table(https://www.statisticshowto.datasciencecentral.com/tables/t-distribution-table/).
If you don’t know what your alpha level is, use 5% (0.05).18 degrees of freedom at an alpha level of 0.05 = 2.10.
Step 8: Compare your calculated value (Step 6) to your table value (Step 7).
The calculated value of -1.79 is less than the cutoff of 2.10 from the table. Therefore p >0 .05.
As the p-value is greater than the alpha level, we cannot conclude that there is a difference
between means.
4.2、 A paired t test
配對t測試(也稱為相關配對t測試、配對樣本t測試或依賴樣本t測試)是在依賴樣本上運行t測試的地方。依賴的樣本本質上是相連的—它們是同一個人或事物的測試。例如訓練前后對同一個人進行兩次測試;同一個人使用不同的儀器進行兩次血壓測量。
手動計算paired t test:https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/t-test/
小明前后兩次考試:
score1: 3、 3、 3、12、15、16、17、19、23、24、32
score2: 20、13、13、20、29、32、23、20、25、15、30
Step 1: Subtract each Y score from each X score(每個Y分數減去每個X分數).
步驟2:將步驟1中的所有值相加
步驟3:將與步驟1的差異平方
步驟4:將所有與步驟3不同的平方相加
步驟5:用以下公式計算t-score:
ΣD: Sum of the differences (Sum of X-Y from Step 2)
ΣD2: Sum of the squared differences (from Step 4)
(ΣD)2: Sum of the differences (from Step 2), squared
Step 6: Subtract 1 from the sample size to get the degrees of freedom(計算自由度).
We have 11 items, so 11-1 = 10.
Step 7: Find the p-value in the t-table, using the degrees of freedom in Step 6.
If you don’t have a specified alpha level, use 0.05 (5%). For this sample problem, with df=10, the t-value is 2.228.
步驟8:比較步驟7(2.228)到計算t值(-2.74)的t表值(table表)。
在alpha水平為0.05時,計算的t值大於表值。p值小於alpha水平:p <0.05。我們可以拒絕均值之間沒有差異的零假設。
5、參考網址
https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/t-test/
https://www.statisticshowto.datasciencecentral.com/tables/t-distribution-table/