Coursera - 機器學習基石 - 林軒田 | 作業一 - 題目 & 答案 & 解析


機器學習基石 | 作業一

個人基礎不太好,做題花了挺長時間,分享一下自己的解題方法。基礎不太好嘛,可能比較啰嗦。

原題目和編程題的程序(Jupyter Notebook 源文件),還有本解答的 PDF 版本都放在了 此鏈接 中。


題目

見文件 作業一_Coursera_2020-03-23.html


Q1

§ Q1 題目


Which of the following problems are best suited for machine learning?

(i) Classifying numbers into primes and non-primes

(ii) Detecting potential fraud in credit card charges

(iii) Determining the time it would take a falling object to hit the ground

(iv) Determining the optimal cycle for traffic lights in a busy intersection

(v) Determining the age at which a particular medical test is recommended

🔘 (i) and (ii)
🔘 (ii), (iv), and (v)
🔘 none of the other choices
🔘 (i), (ii), (iii), and (iv)
🔘 (i), (iii), and (v)

Q1 解析

(i) ✖️ 判斷一個數是否為質數。這個問題可以簡單地用編程實現,比如:判斷一個數是否只能被 1 和它自身整除。

(ii) ✔️ 檢測潛在的信用卡詐騙問題。

(iii) ✖️ 確定物體下落到地面的時間。自由落體運動可以簡單地用編程實現:\(h = \frac{1}{2}gt^2\)

(iv) ✔️ 確定繁忙交通路口的最佳交通信號燈周期。

(v) ✔️ 確定某種醫療檢查的推薦年齡。

Q1 知識點

Key Essence of Machine Learning: Machine Learning Foundations - Slide 01 - P10

  • exists some ‘underlying pattern’ to be learned
    —so ‘performance measure’ can be improved
  • but no programmable (easy) definition
    —so ‘ML’ is needed
  • somehow there is data about the pattern
    —so ML has some ‘inputs’ to learn from

Q2-Q5

For Questions 2­-5, identify the best type of learning that can be used to solve each task below.

§ Q2 題目


Play chess better by practicing different strategies and receive outcome as feedback.

🔘 active learning
🔘 reinforcement learning
🔘 supervised learning
🔘 none of other choices
🔘 unsupervised learning

Q2 解析

本題目的關鍵點在於 recieve outcome as feedback,強化學習通過接收環境對動作的獎勵(反饋)獲得學習信息並更新模型參數。

Q2 知識點

Learning with Different Data Label \(y_n\): Machine Learning Foundations - Slide 03 - P14

  • supervised: all \(y_n\)
  • unsupervised: no \(y_n\)
  • semi-supervised: some \(y_n\)
  • reinforcement: implicit \(y_n\) by goodness(\(\widetilde{y}_n\))

§ Q3 題目


Categorize books into groups without pre-defined topics.

🔘 none of other choices
🔘 active learning
🔘 unsupervised learning
🔘 supervised learning
🔘 reinforcement learning

Q3 解析

本題關鍵在於 without pre-defined topics,是典型的非監督學習中的聚類學習問題。


Q4 題目


Recognize whether there is a face in the picture by a thousand face pictures and ten thousand non-face pictures.

🔘 unsupervised learning
🔘 supervised learning
🔘 active learni
🔘 reinforcement learning
🔘 none of other choices

Q4 解析

facenon-face 都是給圖片打的標簽,本題是典型的監督學習問題。


§ Q5 題目


Selectively schedule experiments on mice to quickly evaluate the potential of cancer medicines.

🔘 reinforcement learning
🔘 unsupervised learning
🔘 supervised learning
🔘 active learning
🔘 none of other choices

Q5 解析

slides 中對主動學習(active learning)的定義是:improve hypothesis with fewer labels
(hopefully) by asking questions strategically.

Active Learning 的主要方式是模型通過與用戶或專家進行交互,拋出 “query” (unlabel data) 讓專家來確定數據的標簽,如此反復,以期讓模型利用較少的標記數據獲得較好“性能”[1]

本題可以理解為機器選擇性地拋出“需要驗證的可能會帶來很大進展的實驗”,讓專家執行給出的實驗判斷其是否真的有用,將結果返回給機器以學習。

Q5 知識點

Learning with Different Protocol \(f \Rightarrow (\mathbf{x}_n, y_n)\): Machine Learning Foundations - Slide 03 - P20

  • batch: all known data
  • online: sequential (passive) data
  • active: strategically-observed data

Q6-Q8

Question 6-8 are about Off-Training-Set error.

Let \(\mathcal{X} = \{\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_N,\mathbf{x}_{N {\!+\!} 1},\ldots,\mathbf{x}_{N {\!+\!} L}\}\) and \(\mathcal{Y} = \{-1,+1\}\) (binary classification). Here the set of training examples is \(\mathcal{D}=\Bigl\{(\mathbf{x}_n,y_n)\Bigr\}^{N}_{n=1}\), where \(y_n \in \mathcal{Y}\), and the set of test inputs is \(\Bigl\{\mathbf{x}_{N {\!+\!} \ell}\Bigr\}_{\ell=1}^L\). The Off-Training-Set error (\(\mathit{OTS}\;\)) with respect to an underlying target \(\mathit{f}\;\) and a hypothesis \(\mathit{g}\;\) is \(E_{OTS}(g, f)= \frac{1}{L} \sum_{\ell=1}^{L}\bigl[\bigl[ g(\mathbf{x}_{N {\!+\!} \ell}) \neq f(\mathbf{x}_{N {\!+\!} \ell})\bigr]\bigr]\) .

§ Q6 題目


Consider \(f(\mathbf{x})=+1\) for all \(\mathbf(x)\) and \(g(\mathbf{x})=\left \{\begin{array}{cc}+1, & \mbox{ for } \mathbf{x} = \mathbf{x}_k \mbox{ and } k \mbox{ is odd } \mbox{ and } 1 \le k \le N+L\\-1, & \mbox{ otherwise}\end{array}\right.\)

\(E_{OTS}(g,f)=?\) (Please note the difference between floor and ceiling functions in the choices)

🔘 \({\frac{1}{L} \times ( \lfloor \frac{N+L}{2} \rfloor - \lceil \frac{N}{2} \rceil )}\)

🔘 \({\frac{1}{L} \times ( \lceil \frac{N+L}{2} \rceil - \lceil \frac{N}{2} \rceil )}\)

🔘 \({\frac{1}{L} \times ( \lceil \frac{N+L}{2} \rceil - \lfloor \frac{N}{2} \rfloor )}\)

🔘 \({\frac{1}{L} \times ( \lfloor \frac{N+L}{2} \rfloor - \lfloor \frac{N}{2} \rfloor )}\)

🔘 none of the other ch oices

Q6 解析

這其實是一道閱讀理解題 😆

一起讀一下題:

  1. 第一句話表明這是一個二分類問題
  2. 第二句話表明數據分為兩部分:
    • 訓練集 \(\mathcal{D}\) - 從 \(\mathbf{x}_1\)\(\mathbf{x}_N\) (含對應的 \(y\) 標簽)總共 \(N\) 個樣本
    • 測試集 - 從 \(\mathbf{x}_{N\!+\!1}\)\(\mathbf{x}_{N\!+\!L}\) (含對應的 \(y\) 標簽)總共 \(L\) 個樣本
  3. 第三句話在講什么是 Off-Training-Set error, 就是 \(g\) 在測試集上的錯誤率,即在 \(g\) 測試集上錯誤點的個數除以測試集總樣本數
  4. 第四句話說 \(f\) 恆為 \(+1\)\(g\) 在整個數據集上第奇數個 \(\mathbf{x}\) 上是 \(+1\),在第偶數個 \(\mathbf{x}\) 上是 \(-1\)
  5. \(E_{OTS}(g, f)\). 由第四句我們知道 \(g\) 在第偶數個 \(\mathbf{x}\) 時是錯的(與 \(f\) 值不相等)

所以這題目這么長,竟然就在考 \(N+1\)\(N+L\) 總共有多少個偶數???

下面來解題:

  1. \(N\!+\!1\)\(N\!+\!L\) 偶數的個數) = (\(1\)\(N\!+\!L\) 偶數個數) - (\(1\)\(N\) 偶數個數)
  2. 上式右側等式減號兩邊用的是同樣的算法,排除選項中兩個前后 floor ceiling 不同的選項
  3. \(1\)\(N\) 有多少個偶數?想不明白的用賦值法,假如 \(N = 2\),那么有 \(1\) 個偶數,正好是 \(\frac{N}{2}\);假如 \(N=3\),那么還是只有一個偶數,\(\frac{N}{2} = 1.5\) 需要向下取整為 \(1\),故該項為 \(\lfloor{\frac{N}{2}}\rfloor\)

§ Q7 題目


We say that a target function \(f\) can "generate'' \(\mathcal{D}\) in a noiseless setting if $f(\mathbf{x}_n) = y_n $for all \((\mathbf{x}_n,y_n) \in \mathcal{D}\).

For all possible \(f: \mathcal{X} \rightarrow \mathcal{Y}\), how many of them can generate \(\mathcal{D}\) in a noiseless setting?

Note that we call two functions \(f_1\) and \(f_2\) the same if \(f_1(\mathbf{x})=f_2(\mathbf{x})\) for all \(\mathbf{x} \in \mathcal{X}\).

🔘 \(1\)

🔘 \(2^L\)

🔘 \(2^{N+L}\)

🔘 none of the other choices

🔘 \(2^N\)

Q7 解析

這仍然是一道閱讀理解題。

讀題:

  1. 第一句解釋了 \(f\) can "generate'' \(\mathcal{D}\) in a noiseless setting 的定義:在訓練集 \(\mathcal{D}\) 上,對所有樣本都有\(f(\mathbf{x}_n)=y_n\).
  2. 結合上一題信息:
    • 訓練集 \(\mathcal{D}\) - 從 \(\mathbf{x}_1\)\(\mathbf{x}_N\) (含對應的 \(y\) 標簽)總共 \(N\) 個樣本
    • 測試集 - 從 \(\mathbf{x}_{N\!+\!1}\)\(\mathbf{x}_{N\!+\!L}\) (含對應的 \(y\) 標簽)總共 \(L\) 個樣本
  3. 第三句說如果對整個數據集上(訓練集 + 測試集)的 \(\mathbf{x}\)\(f(\mathbf{x}_n)=y_n\) 得到的 \(y\) 都是一樣的話,那么稱這些 \(f\) 是相同的

分析一下:

  1. \(f\) can "generate'' \(\mathcal{D}\) 說明這些 \(f\)訓練集上 所有數據是相同的
  2. 不同的 \(f\)整個數據集(訓練集 + 測試集) 上不能完全相同,又由於他們在訓練集上相同,所以只能在 測試集上 不同
  3. 由上題信息,\(y\) 只能取 \(\left\{-1, +1\right\}\) ,測試集上有 \(L\) 個點,所以有 \(2^{L}\) 種不同的組合方式

§ Q8 題目


A determistic algorithm \(\mathcal{A}\) is defined as a procedure that takes \(\mathcal{D}\) as an input, and outputs a hypothesis \(g\). For any two deterministic algorithms \(\mathcal{A}_1\) and \(\mathcal{A}_2\), if all those \(f\) that can "generate'' \(\mathcal{D}\) in a noiseless setting are equally likely in probability,

🔘 For any given \(f\) that "generates" \(\mathcal{D}\), \(E_{OTS}(\mathcal{A}_1(\mathcal{D}), f) = E_{OTS}(\mathcal{A}_2(\mathcal{D}), f)\).

🔘 none of the other choices

🔘 \(\mathbb{E}_f\left\{E_{OTS}(\mathcal{A}_1(\mathcal{D}), f)\right\} = \mathbb{E}_f\left\{E_{OTS}(f, f)\right\}\)

🔘 \(\mathbb{E}_f\left\{E_{OTS}(\mathcal{A}_1(\mathcal{D}), f)\right\} = \mathbb{E}_f\left\{E_{OTS}(\mathcal{A}_2(\mathcal{D}), f)\right\}\)

🔘 For any given \(f'\) that does not "generate" \(\mathcal{D}\), \(\left\{E_{OTS}(\mathcal{A}_1(\mathcal{D}), f')\right\} = \left\{E_{OTS}(\mathcal{A}_2(\mathcal{D}), f')\right\}\)

Q8 解析

  1. \(\mathcal{D} \xrightarrow{\mathcal{A}} g\)
  2. 在訓練集中,由於所有 \(f\) 等概,所以對任意一個樣本點,\(f\)\(-1\)\(+1\) 的概率都為 \(\frac{1}{2}\),任一 \(g\) 在每個點上錯誤概率的期望都為 \(\frac{1}{2}\),因此對每個 \(\mathcal{A}\),對應的 \(E_{OTS}\) 的期望都相等

Q9-Q12

For Questions 9-12, consider the bin model introduced in class. Consider a bin with infinitely many marbles, and let \(\mu\) be the fraction of orange marbles in the bin, and \(\nu\) is the fraction of orange marbles in a sample of 10 marbles.

§ Q9 題目


If \(\mu = 0.5\), what is the probability of \(\nu=\mu\)? Please choose the closest number.

🔘 \(0.12\)

🔘 \(0.90\)

🔘 \(0.56\)

🔘 \(0.24\)

🔘 \(0.39\)

Q9 解析

\(\mu =\) fraction of orange marbles in bin
\(\nu =\) fraction of orange marbles in sample

\(n_{sample} = 10\)

題干中沒有提到霍夫丁不等式的題目,都用普通的排列組合方法去求准確的答案。

題目可以理解為:在橙球概率為 \(0.5\) 的罐子中隨機取出 \(10\) 個小球,其中有 \(5\) 個小球為橙球的概率是多少?

\[\begin{equation} \begin{aligned} P_{Q_9} &= (10\ 選\ 5) \times (5\ 個橙球\ 5\ 個綠球的概率)\\ &= \dbinom{10}{5} \times (0.5)^{5} \times (1-0.5)^{10-5}\\ &= 0.24 \end{aligned} \end{equation} \]

Q9 知識點

排列組合基礎知識

放一張之前復習排列組合的思維導圖,參考了 知乎 | 如何通俗的解釋排列公式和組合公式的含義?- 浣熊老師的回答[2]

image-mindmap-排列組合基礎知識


§ Q10 題目


If \(\mu = 0.9\), what is the probability of \(\nu=\mu\)? Please choose the closest number.

🔘 \(0.39\)

🔘 \(0.90\)

🔘 \(0.12\)

🔘 \(0.56\)

🔘 \(0.24\)

Q10 解析

解法同上題,只是換了個數據。

\[\begin{equation} \begin{aligned} P_{Q_{10}} &= (10\ 選\ 9) \times (9\ 個橙球\ 1\ 個綠球的概率)\\ &= \dbinom{10}{9} \times (0.9)^9 \times (1-0.9)^{10-9}\\ &= 0.39 \end{aligned} \end{equation} \]

§ Q11 題目


If \(\mu = 0.9\), what is the actual probability of \(\nu \le 0.1\)?

🔘 \(9.1 \times 10^{-9}\)

🔘 \(0.1 \times 10^{-9}\)

🔘 \(4.8 \times 10^{-9}\)

🔘 \(1.0 \times 10^{-9}\)

🔘 \(8.5 \times 10^{-9}\)

Q11 解析

\(\nu \le 0.1\) 有兩種情況:\(10\) 個球里只有 \(0\) 個或 \(1\) 個橙球。兩種情況的概率相加即可。

\[\begin{equation} \begin{aligned} P_{Q_{11}} &= {10\ 個球里沒有橙球的概率} + {10\ 個球里只有\ 1\ 個橙球的概率}\\ &= {(10\ 選\ 0) \times (0\ 個橙球\ 10\ 個綠球的概率)} + {(10\ 選\ 1) \times (1\ 個橙球\ 9\ 個綠球的概率)}\\ &= {\dbinom{10}{0} \times (1-0.9)^{10-0}} + {\dbinom{10}{1} \times (0.9)^1 \times (1-0.9)^{10-1}}\\ &= 9.1 \times 10^{-9} \end{aligned} \end{equation} \]

§ Q12 題目


If \(\mu = 0.9\), what is the bound given by Hoeffding's Inequality for the probability of \(\nu \le 0.1\)?

🔘 \(5.52 \times 10^{-6}\)

🔘 \(5.52 \times 10^{-10}\)

🔘 \(5.52 \times 10^{-4}\)

🔘 \(5.52 \times 10^{-12}\)

🔘 \(5.52 \times 10^{-8}\)

Q12 解析

本題要計算的是代入霍夫丁不等式后得到的邊界值,而不是計算實際的概率(如上題)。

霍夫丁不等式:\(P[|\nu - \mu| > \epsilon] \le 2\exp(-2{\epsilon^2}N)\).

\(\exp(n)\)\(e^n\).

已知 \(\mu = 0.9\),要使 \(\nu \le 0.1\),需有 \(\epsilon \ge 0.8\).

等號右側式子隨 \(\epsilon\) 增加而減小,故在 \(\epsilon\) 取最小值 \(0.8\) 時得到式子的上界,代入 \(N = 10\),得到 \(2\exp(-2{\epsilon^2}N) = 5.52 \times 10^{-6}\)

Q12 知識點

Hoeffding’s Inequality: Machine Learning Foundations - Slide 04 - P11

Slide 04 - P13 的課后題,第 ③ ④ 選項知道分別是怎么得出的,就能舉一反三求得 Q9-Q12.


Q13-Q14

Questions 13­-14 illustrate what happens with multiple bins using dice to indicate 6 bins. Please note that the dice is not meant to be thrown for random experiments in this problem. They are just used to bind the six faces together. The probability below only refers to drawing from the bag.

Consider four kinds of dice in a bag, with the same (super large) quantity for each kind.

A: all even numbers are colored orange, all odd numbers are colored green

B: all even numbers are colored green, all odd numbers are colored orange

C: all small (1~­3) are colored orange, all large numbers (4­~6) are colored green

D: all small (1­~3) are colored green, all large numbers (4~­6) are colored orange

§ Q13 題目


If we pick \(5\) dice from the bag, what is the probability that we get \(5\) orange 1's?

🔘 \(\frac{1}{256}\)

🔘 \(\frac{8}{256}\)

🔘 \(\frac{31}{256}\)

🔘 \(\frac{46}{256}\)

🔘 none of the other choices

Q13 解析

注意理解題目 They are just used to bind the six faces together. 取出的每一個骰子,都有六面(六個數字),只是骰子每一面塗顏色的方法不同而已,不要理解為取一個骰子只能有一個數字。

骰子種類 抽到每種骰子的概率 骰子上的 1 被塗成橙色?
Type A \(P_A = \frac{1}{4}\) ✖️
Type B \(P_B = \frac{1}{4}\) ✔️
Type C \(P_C = \frac{1}{4}\) ✔️
Type D \(P_D = \frac{1}{4}\) ✖️

\(P_{Q_{13}} = \left(P_B+P_C\right)^{5} = \left(\frac{1}{4} + \frac{1}{4}\right)^{5} = \frac{1}{32} = \frac{8}{256}\)

Q13 知識點

dice: dice - volabulary.com

The noun dice is the plural form of the singular die. Although many people use the word dice when they're talking about a single die, it's actually only correct to call two or more of the dotted cubes dice. You can also use the word as a verb to mean "chop into tiny pieces or cubes." You might, for example, read a recipe instruction that says: "Dice three tomatoes."

英語中,dice 是復數形式,指兩個或以上的骰子;它的單數形式是 die. 所以題干中是 5 dice 而不是 5 dices.

但包括在英語母語人群中,也有很多誤用成 a dice 的情況。


§ Q14 題目


If we pick \(5\) dice from the bag, what is the probability that we get "some number" that is purely orange?

🔘 \(\frac{1}{256}\)

🔘 \(\frac{8}{256}\)

🔘 \(\frac{31}{256}\)

🔘 \(\frac{46}{256}\)

🔘 none of the other choices

Q14 解析

骰子種類 抽到每種骰子的概率 1 2 3 4 5 6
Type A \(P_A = \frac{1}{4}\) green orange green orange green orange
Type B \(P_B = \frac{1}{4}\) orange green orange green orange green
Type C \(P_C = \frac{1}{4}\) orange orange orange green green green
Type D \(P_D = \frac{1}{4}\) green green green orange orange orange

為使某個數字全為橙色:

數字 組合(包括僅有組合中單種骰子的情況,如:全 B 時數字 1 也為全 orange)
1 BC
2 AC
3 BC
4 AD
5 BD
6 AD

取他們的並集,得到可能的組合有:AC, AD, BC, BD

其中,AC 組合與 AD 組合的交集是全 A,故全 A 重復計算了一次。其他項同理,要從組合概率和中減去重復的部分:

\[\begin{equation} \begin{aligned} P_{Q_{14}} &= P_\text{AC 組合} + P_\text{AD 組合} + P_\text{BC 組合} + P_\text{BD 組合} - P_\text{全 A} - P_\text{全 B} - P_\text{全 C} - P_\text{全 D}\\ &= \frac{4 \times (\text{5 個骰子,每個骰子有 2 種取法}) - 4 \times (\text{5 個骰子,每個骰子有 1 種取法})}{\text{5 個骰子,每個骰子有 4 種取法}}\\ &= \frac{4\times2^5 - 4\times1^5}{4^5}\\ &= \frac{31}{256} \end{aligned} \end{equation} \]


Q15-Q17

見文件 my_hw1_Q15~Q17.html


Q18-Q19

見文件 my_hw1_Q18~Q20.html


參考


  1. 知乎專欄 | 主動學習(Active Learning)-少標簽數據學習 ↩︎

  2. 知乎 | 如何通俗的解釋排列公式和組合公式的含義?- 浣熊老師的回答 ↩︎


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM