1. 第 1 個問題

You are building a 3-class object classification and localization algorithm. The classes are: pedestrian (c=1), car (c=2), motorcycle (c=3). What would be the label for the following image? Recall $[p_c, b_x, b_y, b_h, b_w, c_1, c_2, c_3]y=[pc,bx,by,bh,bw,c1,c2,c3]$

第 2 個問題

1
point

2. 第 2 個問題

Continuing from the previous problem, what should y be for the image below? Remember that “?” means “don’t care”, which means that the neural network loss function won’t care what the neural network gives for that component of the output. As before, $[p_c, b_x, b_y, b_h, b_w, c_1, c_2, c_3]y=[pc,bx,by,bh,bw,c1,c2,c3].$

第 3 個問題

1
point

3. 第 3 個問題

You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appears as the same size in the image. There is at most one soft drink can in each image. Here’re some typical images in your training set:

What is the most appropriate set of output units for your neural network?

Logistic unit (for classifying if there is a soft-drink can in the image)

Logistic unit, $b_x and b_y √$

Logistic unit, $b_x, b_y, b_h (since b_w= b_h)$

Logistic unit, $b_x, b_y, b_h, b_w$

第 4 個問題

1
point

4. 第 4 個問題

If you build a neural network that inputs a picture of a person’s face and outputs N landmarks on the face (assume the input image always contains exactly one face), how many output units will the network have?

N

2N √

3N

$N^2N2$

第 5 個問題

1
point

5. 第 5 個問題

When training one of the object detection systems described in lecture, you need a training set that contains many pictures of the object(s) you wish to detect. However, bounding boxes do not need to be provided in the training set, since the algorithm can learn to detect the objects by itself.

True

False √

第 6 個問題

1
point

6. 第 6 個問題

Suppose you are applying a sliding windows classifier (non-convolutional implementation). Increasing the stride would tend to increase accuracy, but decrease computational cost.

True

False √

第 7 個問題

1
point

7. 第 7 個問題

In the YOLO algorithm, at training time, only one cell ---the one containing the center/midpoint of an object--- is responsible for detecting this object.

True √

False

第 8 個問題

1
point

8. 第 8 個問題

What is the IoU between these two boxes? The upper-left box is 2x2, and the lower-right box is 2x3. The overlapping region is 1x1.

1/6

1/9 √

1/10

None of the above

第 9 個問題

1
point

9. 第 9 個問題

Suppose you run non-max suppression on the predicted boxes above. The parameters you use for non-max suppression are that boxes with probability $\leq≤ 0.4 are discarded, and the IoU threshold for deciding if two boxes overlap is 0.5. How many boxes will remain after non-max suppression?$

3

4

5 √

6

7

第 10 個問題

1
point

10. 第 10 個問題

Suppose you are using YOLO on a 19x19 grid, on a detection problem with 20 classes, and with 5 anchor boxes. During training, for each image you will need to construct an output volume

19x19x(5x25)

19x19x(25x20)

19x19x(5x20) √

19x19x(20x25)

-----------------------------------------------------------中文版-------------------------------------------------------------------------

中文版摘自:https://blog.csdn.net/u013733326/article/details/80306093

檢測算法

現在你要構建一個能夠識別三個對象並定位位置的算法，這些對象分別是：行人（c=1），汽車（c=2），摩托車（c=3）。下圖中的標簽哪個是正確的？注： $y = [p_{c}, b_{x}, b_{y}, b_{h}, b_{w}, c_{1}, c_{2}, c_{3}]$
- 【★】 y=[1, 0.3, 0.7, 0.3, 0.3, 0, 1, 0]
- 【】 y=[1, 0.7, 0.5, 0.3, 0.3, 0, 1, 0]
- 【】 y=[1, 0.3, 0.7, 0.5, 0.5, 0, 1, 0]
- 【】 y=[1, 0.3, 0.7, 0.5, 0.5, 1, 0, 0]
- 【】 y=[0, 0.2, 0.4, 0.5, 0.5, 0, 1, 0]
繼續上一個問題，下圖中y的值是多少？注：“？”是指“不關心這個值”，這意味着神經網絡的損失函數不會關心神經網絡對輸出的結果，和上面一樣， $y = [p_{c}, b_{x}, b_{y}, b_{h}, b_{w}, c_{1}, c_{2}, c_{3}]$
- 【】 y=[1, ?, ?, ?, ?, 0, 0, 0]
- 【★】y=[0, ?, ?, ?, ?, ?, ?, ?]
- 【】 y=[?, ?, ?, ?, ?, ?, ?, ?]
- 【】 y=[0, ?, ?, ?, ?, 0, 0, 0]
- 【】 y=[1, ?, ?, ?, ?, ?, ?, ?]
你現在任職於自動化工廠中，你的系統會看到一罐飲料從傳送帶上下來，你想要對其進行拍照，然后確定照片中是否有飲料罐，如果有的話就對其進行包裝。飲料罐頭是圓的，而包裝盒是方的，每一罐飲料的大小是一樣的，每個圖像中最多只有一罐飲料，現在你有下面的方案可供選擇，這里有一些訓練集圖像：
- 【】 Logistic unit (用於分類圖像中是否有罐頭)
- 【★】Logistic unit, $b_{x}$
- 【】 Logistic unit, $b_{x}$
- 【】 Logistic unit, $b_{x}$
博主注：因為每個罐頭大小是一定的，所以我們只需要知道它的中心位置就好了。
如果你想要構建一個能夠輸入人臉圖片輸出為N個標記的神經網絡（假設圖像只包含一張臉），那么你的神經網絡有多少個輸出節點？
- 【】 N
- 【★】2N
- 【】 3N
- 【】 $N^{2}$
博主注：圖像是二維的，指定一個位置應該是(x,y)，那么，一個標記就需要兩個節點。
當你訓練一個視頻中描述的對象檢測系統時，里需要一個包含了檢測對象的許多圖片的訓練集，然而邊界框不需要在訓練集中提供，因為算法可以自己學習檢測對象，這個說法對嗎？
- 【】正確
- 【★】錯誤
假如你正在應用一個滑動窗口分類器（非卷積實現），增加步伐不僅會提高准確性，也會降低成本。
- 【】正確
- 【★】錯誤
在YOLO算法訓練時候，只有一個包含對象的中心/中點的一個單元負責檢測這個對象。
- 【★】正確
- 【】錯誤
這兩個框中IoU大小是多少？左上角的框是2x2大小，右下角的框是2x3大小，重疊部分是1x1。
- 【】 1/6
- 【★】1/9
- 【】 1/10
- 【】以上都不是
博主注： $\frac{1 \times 1}{2 \times 2 + 2 \times 3 - 1 \times 1} = \frac{1}{9}$
假如你在下圖中的預測框中使用非最大值抑制，其參數是放棄概率≤ 0.4的框，並決定兩個框IoU的閾值為0.5，使用非最大值抑制后會保留多少個預測框？
- 【】 3
- 【】 4
- 【★】5
- 【】 6
- 【】 7
假如你使用YOLO算法，使用19x19格子來檢測20個分類，使用5個錨框（anchor box）。在訓練的過程中，對於每個圖像你需要輸出卷積后的結果 $y$
- 【】 19x19x(25x20)
- 【】 19x19x(20x25)
- 【★】19x19x(5x25)
- 【】 19x19x(5x20)

吳恩達深度學習筆記 course4 week3 測驗

1. 第 1 個問題

2. 第 2 個問題

3. 第 3 個問題

4. 第 4 個問題

5. 第 5 個問題

6. 第 6 個問題

7. 第 7 個問題

8. 第 8 個問題

9. 第 9 個問題

10. 第 10 個問題

檢測算法

免責聲明！