吳恩達《深度學習》-課后測驗-第五門課 序列模型(Sequence Models)-Week 2: Natural Language Processing and Word Embeddings (第二周測驗:自然語言處理與詞嵌入)


Week 2 Quiz: Natural Language Processing and Word Embeddings (第二周測驗:自然語言處理與詞嵌入)

1.Suppose you learn a word embedding for a vocabulary of 10000 words. Then the embedding vectors should be 10000 dimensional, so as to capture the full range of variation and meaning in those words. (假設你學習了一個為 10000 個單詞的詞匯表嵌入的單詞,那么嵌入向量應 該為 10000 維,這樣就可以捕捉到所有的變化和意義)

【 】 True(正確) 【 】 False(錯誤)

答案

False

Note: The dimension of word vectors is usually smaller than the size of the vocabulary. Most common sizes for word vectors ranges between 50 and 400. (注:詞向量的維數通常小於詞匯表的維數,詞向量最常見的大小在 50 到 400 之間。)

 

2.What is t-SNE?( t-SNE 是什么?)

【 】 A linear transformation that allows us to solve analogies on word vectors(一種可以讓我們解決詞向量的相似性的線性變換)

【 】 A non-linear dimensionality reduction technique(一種非線性降維技術)

【 】 A supervised learning algorithm for learning word embeddings(一種學習詞嵌入的監督學習算法)

【 】 An open-source sequence modeling library(一個開源序列建模庫)

答案

【★】 A non-linear dimensionality reduction technique(一種非線性降維技術)

 

3.Suppose you download a pre-trained word embedding which has been trained on a huge corpus of text. You then use this word embedding to train an RNN for a language task of recognizing if someone is happy from a short snippet of text, using a small training set. (假設 你下載了一個經過預先訓練的詞嵌入模型,該模型是在一個龐大的語料庫上訓練的出來的。 然后使用這個詞嵌入來訓練一個 RNN 來完成一項語言任務,即使用一個小的訓練集從一小 段文字中識別出某人是否快樂。)

【 】 True(正確)

【 】 False(錯誤)

答案

True

Note: Then even if the word "ecstatic" does not appear in your small training set, your RNN might reasonably be expected to recognize "I’m ecstatic" as deserving a label 𝑦 = 1. (注:即使“欣喜若狂”這個詞沒有出現在你的小訓練集中,你的 RNN 也會理所當然的認為“我欣喜若狂”應該被貼 上“y=1”的標簽。)

 

4.Which of these equations do you think should hold for a good word embedding? (Check all that apply) (你認為以下哪些公式是合適的詞嵌入?)

【 】 \(e_{𝑏𝑜𝑦} − 𝑒_{𝑔𝑖𝑟𝑙} ≈ 𝑒_{𝑏𝑟𝑜𝑡ℎ𝑒𝑟} − 𝑒_{𝑠𝑖𝑠𝑡𝑒𝑟}\) 【 】 \(𝑒_{𝑏𝑜𝑦} − 𝑒_{𝑔𝑖𝑟𝑙} ≈ 𝑒_{𝑠𝑖𝑠𝑡𝑒𝑟} − 𝑒_{𝑏𝑟𝑜𝑡ℎ𝑒𝑟}\)

【 】 \(𝑒_{𝑏𝑜𝑦} − 𝑒_{𝑏𝑟𝑜𝑡ℎ𝑒𝑟} ≈ 𝑒_{𝑔𝑖𝑟𝑙} − 𝑒_{𝑠𝑖𝑠𝑡𝑒𝑟}\) 【 】 \(𝑒_{𝑏𝑜𝑦} − 𝑒_{𝑏𝑟𝑜𝑡ℎ𝑒𝑟} ≈ 𝑒_{𝑠𝑖𝑠𝑡𝑒𝑟} − 𝑒_{𝑔𝑖𝑟𝑙}\)

答案

【★】 𝑒𝑏𝑜𝑦 − 𝑒𝑔𝑖𝑟𝑙 ≈ 𝑒𝑏𝑟𝑜𝑡ℎ𝑒𝑟 − 𝑒𝑠𝑖𝑠𝑡𝑒r

【★】 𝑒𝑏𝑜𝑦 − 𝑒𝑏𝑟𝑜𝑡ℎ𝑒𝑟 ≈ 𝑒𝑔𝑖𝑟𝑙 − 𝑒𝑠𝑖𝑠𝑡𝑒r

 

\5. Let 𝐸 be an embedding matrix, and let \(𝑜_{1234}\) be a one-hot vector,corresponding to word 1234. Then to get the embedding of word 1234, Why don't we call \(𝐸^𝑇 ∗ 𝑜_{1234}\) in Python?( 設 𝐸為嵌入矩陣,\(𝑜_{1234}\)為獨熱向量,對應單詞 1234,那么為了得到嵌入單詞 1234,為什么 不在 Python 中調用\(𝐸^𝑇 ∗ 𝑜_{1234}\))

【 】 It is computationally wasteful(這非常耗費計算資源)

【 】The correct formula is \(𝐸^𝑇 ∗ 𝑜_{1234}\)()

【 】This doesnt handle unknown words (這不能處理未知單詞)

【 】None of the above: calling the python snippet as described above is fine(以上都不是,調用 上面描述的 Python 代碼片段是可以的)

答案

【★】 It is computationally wasteful(這非常耗費計算資源)

Note:Yes, the element-Wise multiplication will be extremely inefficient(注:是的,基於元素乘法效率非常低)

 

\6. When learning word embeddings, we create an artificial task of estimating 𝑃(𝑡𝑎𝑟𝑔𝑒𝑡|𝑐𝑜𝑛𝑡𝑒𝑥𝑡). It is okay if we do poorly on this artificial prediction task; the more important by-product of this task is that we learn a useful set of word embeddings. (在學習詞嵌入時,我們創建了一個估算𝑃(𝑡𝑎𝑟𝑔𝑒𝑡|𝑐𝑜𝑛𝑡𝑒𝑥𝑡)的人工任務,如果我們在這個人工預測上做得不好也沒關系:這個任務一個重要的副產品就是我們學習了一組有用的詞嵌入)

【 】 True(正確) 【 】 False(錯誤)

答案

True

 

7.In the word2vec algorithm, you estimate 𝑃(𝑡|𝑐), where 𝑡 is the target word and 𝑐 is a context word. How are 𝑡 and 𝑐 chosen from the training set? Pick the best answer. (在 word2vec 算法中,估計𝑃(𝑡|𝑐),其中𝑡是目標詞,𝑐是上下文詞,如何從訓練集中選擇𝑡和𝑐?選擇最好的答 案)

【 】𝑐 and 𝑡 are chosen to be nearby words. (𝑐和𝑡被選為鄰近詞)

【 】𝑐 is a sequence of several words immediately before 𝑡.( 𝑐是緊接在𝑡前面的幾個詞的序列)

【 】𝑐 is the sequence of all the words in the sentence before 𝑡.( 𝑐是在𝑡之前所有單詞的序列)

【 】𝑐 is the one word that comes immediately before 𝑡. (𝑐是在𝑡之前出現的一個詞)

答案

【★】𝑐 and 𝑡 are chosen to be nearby words. (𝑐和𝑡被選為鄰近詞)

 

8.Suppose you have a 10000 word vocabulary, and are learning 500- dimensional word embeddings. The word2vec model uses the following softmax function: Which of these statements are correct? Check all that apply. (假設你有 10000 個詞匯表,並且正在學習 500 維的詞嵌入, word2vec 模型使用了以下 softmax 函數:這些語句中哪一個是正確的?多選 題)

【 】 𝜃𝑡 and 𝑒𝑐 are both 500 dimensional vectors.( 𝜃𝑡 和𝑒𝑐 都是 500 維向量)

【 】 𝜃𝑡 and 𝑒𝑐 are both 10000 dimensional vectors.( 𝜃𝑡 和𝑒𝑐 都是 1000 維向量)

【 】 𝑒𝑐𝜃𝑡 and 𝑒𝑐 are both trained with an optimization algorithm such as Adam or gradient descent.( 𝑒𝑐𝜃𝑡和𝑒𝑐都使用了優化算法的訓練,比如 Adam 或者梯度下降)

【 】 After training, we should expect 𝜃𝑡 to be very close to 𝑒𝑐 when 𝑡 and 𝑐 are the same word.( 當𝑡和𝑐是同一個單詞,我們期望經過訓練 𝜃𝑡應該非常接近𝑒𝑐 )

答案

【★】 𝜃𝑡 and 𝑒𝑐 are both 500 dimensional vectors.( 𝜃𝑡 和𝑒𝑐 都是 500 維向量)

【★】 𝑒𝑐𝜃𝑡 and 𝑒𝑐 are both trained with an optimization algorithm such as Adam or gradient descent.( 𝑒𝑐𝜃𝑡和𝑒𝑐都使用了優化算法的訓練,比如 Adam 或者梯度下降)

 

9.Suppose you have a 10000 word vocabulary, and are learning 500- dimensional word embeddings. The GloVe model minimizes this objective: Which of these statements are correct? Check all that apply. (假設你有 10000 個詞匯表,並且正在學習 500 維的詞嵌入, GloVe 模型最小化了這個目標:這些表述中哪一個是正確的?多選題)

【 】𝜃𝑖 and 𝑒𝑗 should be initialized to 0 at the beginning of training. (𝜃𝑖 和 𝑒𝑗在訓練開始時應該初始化為 0)

【 】 𝜃𝑖 and 𝑒𝑗 should be initialized randomly at the beginning of training. (𝜃𝑖 和 𝑒𝑗在訓練開始 時應該隨機初始化)

【 】𝑋𝑖𝑗 is the number of times word i appears in the context of word j. (𝑋𝑖𝑗是單詞 i 在單詞 j 上下文中出現的次數)

【 】The weighting function 𝑓(. ) must satisfy 𝑓(0) = 0. Note:The weighting function helps prevent learning only from extremely common word pairs. It is not necessary that it satisfies this function. (權重函數𝑓(. )必須滿足𝑓(0) = 0,注:權重函數有助於防止學習到極端常見的單詞對,它不需要滿足這個函數)

答案

【★】 𝜃𝑖 and 𝑒𝑗 should be initialized randomly at the beginning of training. (𝜃𝑖 和 𝑒𝑗在訓練開始 時應該隨機初始化)

【★】𝑋𝑖𝑗 is the number of times word i appears in the context of word j. (𝑋𝑖𝑗是單詞 i 在單詞 j 上下文中出現的次數)

【★】The weighting function 𝑓(. ) must satisfy 𝑓(0) = 0. Note:The weighting function helps prevent learning only from extremely common word pairs. It is not necessary that it satisfies this function. (權重函數𝑓(. )必須滿足𝑓(0) = 0,注:權重函數有助於防止學習到極端常見的單 詞對,它不需要滿足這個函數)

 

10.You have trained word embeddings using a text dataset of 𝑚1words. You are considering using these word embeddings for a language task, for which you have a separate labeled dataset of 𝑚2 words. Keeping in mind that using word embeddings is a form of transfer learning, under which of these circumstance would you expect the word embeddings to be helpful? (你已經使用 𝑚1的文本數據集訓練了詞嵌入。當你有一個單獨標記的 𝑚2 數據集你正在考慮使用這些詞嵌入 到語言任務中。記住,使用使用詞嵌入是一種遷移學習的形式,你希望哪種情況對詞嵌入有幫助?)

【 】𝑚1»𝑚2 【 】𝑚1«𝑚2

答案

【★】𝑚1»𝑚2

 

 



Week 2 Code Assignments:

✧Course 5 -序列模型(Sequence Models)- 第二周測驗-自然語言處理與詞嵌入

assignment 1:Operations on word vectors)

assignment 2:Emojify!)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM