機器學習基石 | 作業一
個人基礎不太好,做題花了挺長時間,分享一下自己的解題方法。基礎不太好嘛,可能比較啰嗦。
原題目和編程題的程序(Jupyter Notebook 源文件),還有本解答的 PDF 版本都放在了 此鏈接 中。
題目
見文件 作業一_Coursera_2020-03-23.html
Q1
§ Q1 題目
Which of the following problems are best suited for machine learning?
(i) Classifying numbers into primes and non-primes
(ii) Detecting potential fraud in credit card charges
(iii) Determining the time it would take a falling object to hit the ground
(iv) Determining the optimal cycle for traffic lights in a busy intersection
(v) Determining the age at which a particular medical test is recommended
🔘 (i) and (ii)
🔘 (ii), (iv), and (v)
🔘 none of the other choices
🔘 (i), (ii), (iii), and (iv)
🔘 (i), (iii), and (v)
Q1 解析
(i) ✖️ 判斷一個數是否為質數。這個問題可以簡單地用編程實現,比如:判斷一個數是否只能被 1 和它自身整除。
(ii) ✔️ 檢測潛在的信用卡詐騙問題。
(iii) ✖️ 確定物體下落到地面的時間。自由落體運動可以簡單地用編程實現:\(h = \frac{1}{2}gt^2\)
(iv) ✔️ 確定繁忙交通路口的最佳交通信號燈周期。
(v) ✔️ 確定某種醫療檢查的推薦年齡。
Q1 知識點
Key Essence of Machine Learning: Machine Learning Foundations - Slide 01 - P10
- exists some ‘underlying pattern’ to be learned
—so ‘performance measure’ can be improved - but no programmable (easy) definition
—so ‘ML’ is needed - somehow there is data about the pattern
—so ML has some ‘inputs’ to learn from
Q2-Q5
For Questions 2-5, identify the best type of learning that can be used to solve each task below.
§ Q2 題目
Play chess better by practicing different strategies and receive outcome as feedback.
🔘 active learning
🔘 reinforcement learning
🔘 supervised learning
🔘 none of other choices
🔘 unsupervised learning
Q2 解析
本題目的關鍵點在於 recieve outcome as feedback,強化學習通過接收環境對動作的獎勵(反饋)獲得學習信息並更新模型參數。
Q2 知識點
Learning with Different Data Label \(y_n\): Machine Learning Foundations - Slide 03 - P14
- supervised: all \(y_n\)
- unsupervised: no \(y_n\)
- semi-supervised: some \(y_n\)
- reinforcement: implicit \(y_n\) by goodness(\(\widetilde{y}_n\))
§ Q3 題目
Categorize books into groups without pre-defined topics.
🔘 none of other choices
🔘 active learning
🔘 unsupervised learning
🔘 supervised learning
🔘 reinforcement learning
Q3 解析
本題關鍵在於 without pre-defined topics,是典型的非監督學習中的聚類學習問題。
Q4 題目
Recognize whether there is a face in the picture by a thousand face pictures and ten thousand non-face pictures.
🔘 unsupervised learning
🔘 supervised learning
🔘 active learni
🔘 reinforcement learning
🔘 none of other choices
Q4 解析
face 和 non-face 都是給圖片打的標簽,本題是典型的監督學習問題。
§ Q5 題目
Selectively schedule experiments on mice to quickly evaluate the potential of cancer medicines.
🔘 reinforcement learning
🔘 unsupervised learning
🔘 supervised learning
🔘 active learning
🔘 none of other choices
Q5 解析
slides 中對主動學習(active learning)的定義是:improve hypothesis with fewer labels
(hopefully) by asking questions strategically.
Active Learning 的主要方式是模型通過與用戶或專家進行交互,拋出 “query” (unlabel data) 讓專家來確定數據的標簽,如此反復,以期讓模型利用較少的標記數據獲得較好“性能”[1]。
本題可以理解為機器選擇性地拋出“需要驗證的可能會帶來很大進展的實驗”,讓專家執行給出的實驗判斷其是否真的有用,將結果返回給機器以學習。
Q5 知識點
Learning with Different Protocol \(f \Rightarrow (\mathbf{x}_n, y_n)\): Machine Learning Foundations - Slide 03 - P20
- batch: all known data
- online: sequential (passive) data
- active: strategically-observed data
Q6-Q8
Question 6-8 are about Off-Training-Set error.
Let \(\mathcal{X} = \{\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_N,\mathbf{x}_{N {\!+\!} 1},\ldots,\mathbf{x}_{N {\!+\!} L}\}\) and \(\mathcal{Y} = \{-1,+1\}\) (binary classification). Here the set of training examples is \(\mathcal{D}=\Bigl\{(\mathbf{x}_n,y_n)\Bigr\}^{N}_{n=1}\), where \(y_n \in \mathcal{Y}\), and the set of test inputs is \(\Bigl\{\mathbf{x}_{N {\!+\!} \ell}\Bigr\}_{\ell=1}^L\). The Off-Training-Set error (\(\mathit{OTS}\;\)) with respect to an underlying target \(\mathit{f}\;\) and a hypothesis \(\mathit{g}\;\) is \(E_{OTS}(g, f)= \frac{1}{L} \sum_{\ell=1}^{L}\bigl[\bigl[ g(\mathbf{x}_{N {\!+\!} \ell}) \neq f(\mathbf{x}_{N {\!+\!} \ell})\bigr]\bigr]\) .
§ Q6 題目
Consider \(f(\mathbf{x})=+1\) for all \(\mathbf(x)\) and \(g(\mathbf{x})=\left \{\begin{array}{cc}+1, & \mbox{ for } \mathbf{x} = \mathbf{x}_k \mbox{ and } k \mbox{ is odd } \mbox{ and } 1 \le k \le N+L\\-1, & \mbox{ otherwise}\end{array}\right.\)
\(E_{OTS}(g,f)=?\) (Please note the difference between floor and ceiling functions in the choices)
🔘 \({\frac{1}{L} \times ( \lfloor \frac{N+L}{2} \rfloor - \lceil \frac{N}{2} \rceil )}\)
🔘 \({\frac{1}{L} \times ( \lceil \frac{N+L}{2} \rceil - \lceil \frac{N}{2} \rceil )}\)
🔘 \({\frac{1}{L} \times ( \lceil \frac{N+L}{2} \rceil - \lfloor \frac{N}{2} \rfloor )}\)
🔘 \({\frac{1}{L} \times ( \lfloor \frac{N+L}{2} \rfloor - \lfloor \frac{N}{2} \rfloor )}\)
🔘 none of the other ch oices
Q6 解析
這其實是一道閱讀理解題 😆
一起讀一下題:
- 第一句話表明這是一個二分類問題
- 第二句話表明數據分為兩部分:
- 訓練集 \(\mathcal{D}\) - 從 \(\mathbf{x}_1\) 到 \(\mathbf{x}_N\) (含對應的 \(y\) 標簽)總共 \(N\) 個樣本
- 測試集 - 從 \(\mathbf{x}_{N\!+\!1}\) 到 \(\mathbf{x}_{N\!+\!L}\) (含對應的 \(y\) 標簽)總共 \(L\) 個樣本
- 第三句話在講什么是 Off-Training-Set error, 就是 \(g\) 在測試集上的錯誤率,即在 \(g\) 測試集上錯誤點的個數除以測試集總樣本數
- 第四句話說 \(f\) 恆為 \(+1\); \(g\) 在整個數據集上第奇數個 \(\mathbf{x}\) 上是 \(+1\),在第偶數個 \(\mathbf{x}\) 上是 \(-1\)
- 求 \(E_{OTS}(g, f)\). 由第四句我們知道 \(g\) 在第偶數個 \(\mathbf{x}\) 時是錯的(與 \(f\) 值不相等)
所以這題目這么長,竟然就在考 \(N+1\) 到 \(N+L\) 總共有多少個偶數???
下面來解題:
- (\(N\!+\!1\) 到 \(N\!+\!L\) 偶數的個數) = (\(1\) 到 \(N\!+\!L\) 偶數個數) - (\(1\) 到 \(N\) 偶數個數)
- 上式右側等式減號兩邊用的是同樣的算法,排除選項中兩個前后
floor
ceiling
不同的選項 - \(1\) 到 \(N\) 有多少個偶數?想不明白的用賦值法,假如 \(N = 2\),那么有 \(1\) 個偶數,正好是 \(\frac{N}{2}\);假如 \(N=3\),那么還是只有一個偶數,\(\frac{N}{2} = 1.5\) 需要向下取整為 \(1\),故該項為 \(\lfloor{\frac{N}{2}}\rfloor\)
§ Q7 題目
We say that a target function \(f\) can "generate'' \(\mathcal{D}\) in a noiseless setting if $f(\mathbf{x}_n) = y_n $for all \((\mathbf{x}_n,y_n) \in \mathcal{D}\).
For all possible \(f: \mathcal{X} \rightarrow \mathcal{Y}\), how many of them can generate \(\mathcal{D}\) in a noiseless setting?
Note that we call two functions \(f_1\) and \(f_2\) the same if \(f_1(\mathbf{x})=f_2(\mathbf{x})\) for all \(\mathbf{x} \in \mathcal{X}\).
🔘 \(1\)
🔘 \(2^L\)
🔘 \(2^{N+L}\)
🔘 none of the other choices
🔘 \(2^N\)
Q7 解析
這仍然是一道閱讀理解題。
讀題:
- 第一句解釋了 \(f\) can "generate'' \(\mathcal{D}\) in a noiseless setting 的定義:在訓練集 \(\mathcal{D}\) 上,對所有樣本都有\(f(\mathbf{x}_n)=y_n\).
- 結合上一題信息:
- 訓練集 \(\mathcal{D}\) - 從 \(\mathbf{x}_1\) 到 \(\mathbf{x}_N\) (含對應的 \(y\) 標簽)總共 \(N\) 個樣本
- 測試集 - 從 \(\mathbf{x}_{N\!+\!1}\) 到 \(\mathbf{x}_{N\!+\!L}\) (含對應的 \(y\) 標簽)總共 \(L\) 個樣本
- 第三句說如果對整個數據集上(訓練集 + 測試集)的 \(\mathbf{x}\),\(f(\mathbf{x}_n)=y_n\) 得到的 \(y\) 都是一樣的話,那么稱這些 \(f\) 是相同的
分析一下:
- \(f\) can "generate'' \(\mathcal{D}\) 說明這些 \(f\) 在 訓練集上 所有數據是相同的
- 不同的 \(f\) 在 整個數據集(訓練集 + 測試集) 上不能完全相同,又由於他們在訓練集上相同,所以只能在 測試集上 不同
- 由上題信息,\(y\) 只能取 \(\left\{-1, +1\right\}\) ,測試集上有 \(L\) 個點,所以有 \(2^{L}\) 種不同的組合方式
§ Q8 題目
A determistic algorithm \(\mathcal{A}\) is defined as a procedure that takes \(\mathcal{D}\) as an input, and outputs a hypothesis \(g\). For any two deterministic algorithms \(\mathcal{A}_1\) and \(\mathcal{A}_2\), if all those \(f\) that can "generate'' \(\mathcal{D}\) in a noiseless setting are equally likely in probability,
🔘 For any given \(f\) that "generates" \(\mathcal{D}\), \(E_{OTS}(\mathcal{A}_1(\mathcal{D}), f) = E_{OTS}(\mathcal{A}_2(\mathcal{D}), f)\).
🔘 none of the other choices
🔘 \(\mathbb{E}_f\left\{E_{OTS}(\mathcal{A}_1(\mathcal{D}), f)\right\} = \mathbb{E}_f\left\{E_{OTS}(f, f)\right\}\)
🔘 \(\mathbb{E}_f\left\{E_{OTS}(\mathcal{A}_1(\mathcal{D}), f)\right\} = \mathbb{E}_f\left\{E_{OTS}(\mathcal{A}_2(\mathcal{D}), f)\right\}\)
🔘 For any given \(f'\) that does not "generate" \(\mathcal{D}\), \(\left\{E_{OTS}(\mathcal{A}_1(\mathcal{D}), f')\right\} = \left\{E_{OTS}(\mathcal{A}_2(\mathcal{D}), f')\right\}\)
Q8 解析
- \(\mathcal{D} \xrightarrow{\mathcal{A}} g\)
- 在訓練集中,由於所有 \(f\) 等概,所以對任意一個樣本點,\(f\) 為 \(-1\) 或 \(+1\) 的概率都為 \(\frac{1}{2}\),任一 \(g\) 在每個點上錯誤概率的期望都為 \(\frac{1}{2}\),因此對每個 \(\mathcal{A}\),對應的 \(E_{OTS}\) 的期望都相等
Q9-Q12
For Questions 9-12, consider the bin model introduced in class. Consider a bin with infinitely many marbles, and let \(\mu\) be the fraction of orange marbles in the bin, and \(\nu\) is the fraction of orange marbles in a sample of 10 marbles.
§ Q9 題目
If \(\mu = 0.5\), what is the probability of \(\nu=\mu\)? Please choose the closest number.
🔘 \(0.12\)
🔘 \(0.90\)
🔘 \(0.56\)
🔘 \(0.24\)
🔘 \(0.39\)
Q9 解析
\(\mu =\) fraction of orange marbles in bin
\(\nu =\) fraction of orange marbles in sample
\(n_{sample} = 10\)
題干中沒有提到霍夫丁不等式的題目,都用普通的排列組合方法去求准確的答案。
題目可以理解為:在橙球概率為 \(0.5\) 的罐子中隨機取出 \(10\) 個小球,其中有 \(5\) 個小球為橙球的概率是多少?
Q9 知識點
排列組合基礎知識
放一張之前復習排列組合的思維導圖,參考了 知乎 | 如何通俗的解釋排列公式和組合公式的含義?- 浣熊老師的回答[2]
§ Q10 題目
If \(\mu = 0.9\), what is the probability of \(\nu=\mu\)? Please choose the closest number.
🔘 \(0.39\)
🔘 \(0.90\)
🔘 \(0.12\)
🔘 \(0.56\)
🔘 \(0.24\)
Q10 解析
解法同上題,只是換了個數據。
§ Q11 題目
If \(\mu = 0.9\), what is the actual probability of \(\nu \le 0.1\)?
🔘 \(9.1 \times 10^{-9}\)
🔘 \(0.1 \times 10^{-9}\)
🔘 \(4.8 \times 10^{-9}\)
🔘 \(1.0 \times 10^{-9}\)
🔘 \(8.5 \times 10^{-9}\)
Q11 解析
\(\nu \le 0.1\) 有兩種情況:\(10\) 個球里只有 \(0\) 個或 \(1\) 個橙球。兩種情況的概率相加即可。
§ Q12 題目
If \(\mu = 0.9\), what is the bound given by Hoeffding's Inequality for the probability of \(\nu \le 0.1\)?
🔘 \(5.52 \times 10^{-6}\)
🔘 \(5.52 \times 10^{-10}\)
🔘 \(5.52 \times 10^{-4}\)
🔘 \(5.52 \times 10^{-12}\)
🔘 \(5.52 \times 10^{-8}\)
Q12 解析
本題要計算的是代入霍夫丁不等式后得到的邊界值,而不是計算實際的概率(如上題)。
霍夫丁不等式:\(P[|\nu - \mu| > \epsilon] \le 2\exp(-2{\epsilon^2}N)\).
\(\exp(n)\) 即 \(e^n\).
已知 \(\mu = 0.9\),要使 \(\nu \le 0.1\),需有 \(\epsilon \ge 0.8\).
等號右側式子隨 \(\epsilon\) 增加而減小,故在 \(\epsilon\) 取最小值 \(0.8\) 時得到式子的上界,代入 \(N = 10\),得到 \(2\exp(-2{\epsilon^2}N) = 5.52 \times 10^{-6}\)
Q12 知識點
Hoeffding’s Inequality: Machine Learning Foundations - Slide 04 - P11
Slide 04 - P13 的課后題,第 ③ ④ 選項知道分別是怎么得出的,就能舉一反三求得 Q9-Q12.
Q13-Q14
Questions 13-14 illustrate what happens with multiple bins using dice to indicate 6 bins. Please note that the dice is not meant to be thrown for random experiments in this problem. They are just used to bind the six faces together. The probability below only refers to drawing from the bag.
Consider four kinds of dice in a bag, with the same (super large) quantity for each kind.
A: all even numbers are colored orange, all odd numbers are colored green
B: all even numbers are colored green, all odd numbers are colored orange
C: all small (1~3) are colored orange, all large numbers (4~6) are colored green
D: all small (1~3) are colored green, all large numbers (4~6) are colored orange
§ Q13 題目
If we pick \(5\) dice from the bag, what is the probability that we get \(5\) orange 1's?
🔘 \(\frac{1}{256}\)
🔘 \(\frac{8}{256}\)
🔘 \(\frac{31}{256}\)
🔘 \(\frac{46}{256}\)
🔘 none of the other choices
Q13 解析
注意理解題目 They are just used to bind the six faces together. 取出的每一個骰子,都有六面(六個數字),只是骰子每一面塗顏色的方法不同而已,不要理解為取一個骰子只能有一個數字。
骰子種類 | 抽到每種骰子的概率 | 骰子上的 1 被塗成橙色? |
---|---|---|
Type A | \(P_A = \frac{1}{4}\) | ✖️ |
Type B | \(P_B = \frac{1}{4}\) | ✔️ |
Type C | \(P_C = \frac{1}{4}\) | ✔️ |
Type D | \(P_D = \frac{1}{4}\) | ✖️ |
\(P_{Q_{13}} = \left(P_B+P_C\right)^{5} = \left(\frac{1}{4} + \frac{1}{4}\right)^{5} = \frac{1}{32} = \frac{8}{256}\)
Q13 知識點
dice: dice - volabulary.com
The noun dice is the plural form of the singular die. Although many people use the word dice when they're talking about a single die, it's actually only correct to call two or more of the dotted cubes dice. You can also use the word as a verb to mean "chop into tiny pieces or cubes." You might, for example, read a recipe instruction that says: "Dice three tomatoes."
英語中,dice 是復數形式,指兩個或以上的骰子;它的單數形式是 die. 所以題干中是 5 dice 而不是 5 dices.
但包括在英語母語人群中,也有很多誤用成 a dice 的情況。
§ Q14 題目
If we pick \(5\) dice from the bag, what is the probability that we get "some number" that is purely orange?
🔘 \(\frac{1}{256}\)
🔘 \(\frac{8}{256}\)
🔘 \(\frac{31}{256}\)
🔘 \(\frac{46}{256}\)
🔘 none of the other choices
Q14 解析
骰子種類 | 抽到每種骰子的概率 | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
Type A | \(P_A = \frac{1}{4}\) | green | orange | green | orange | green | orange |
Type B | \(P_B = \frac{1}{4}\) | orange | green | orange | green | orange | green |
Type C | \(P_C = \frac{1}{4}\) | orange | orange | orange | green | green | green |
Type D | \(P_D = \frac{1}{4}\) | green | green | green | orange | orange | orange |
為使某個數字全為橙色:
數字 | 組合(包括僅有組合中單種骰子的情況,如:全 B 時數字 1 也為全 orange) |
---|---|
1 | BC |
2 | AC |
3 | BC |
4 | AD |
5 | BD |
6 | AD |
取他們的並集,得到可能的組合有:AC, AD, BC, BD
其中,AC 組合與 AD 組合的交集是全 A,故全 A 重復計算了一次。其他項同理,要從組合概率和中減去重復的部分:
Q15-Q17
見文件 my_hw1_Q15~Q17.html
Q18-Q19
見文件 my_hw1_Q18~Q20.html