[LeetCode] 737. Sentence Similarity II 句子相似度之二

本文轉載自查看原文 2017-12-17 23:49 5080 LeetCode

Given two sentences words1, words2 (each represented as an array of strings), and a list of similar word pairs pairs, determine if two sentences are similar.

For example, words1 = ["great", "acting", "skills"] and words2 = ["fine", "drama", "talent"] are similar, if the similar word pairs are pairs = [["great", "good"], ["fine", "good"], ["acting","drama"], ["skills","talent"]].

Note that the similarity relation is transitive. For example, if "great" and "good" are similar, and "fine" and "good" are similar, then "great" and "fine" are similar.

Similarity is also symmetric. For example, "great" and "fine" being similar is the same as "fine" and "great" being similar.

Also, a word is always similar with itself. For example, the sentences words1 = ["great"], words2 = ["great"], pairs = [] are similar, even though there are no specified similar word pairs.

Finally, sentences can only be similar if they have the same number of words. So a sentence like words1 = ["great"] can never be similar to words2 = ["doubleplus","good"].

Note:

The length of words1 and words2 will not exceed 1000.
The length of pairs will not exceed 2000.
The length of each pairs[i] will be 2.
The length of each words[i] and pairs[i][j] will be in the range [1, 20].

這道題是之前那道 Sentence Similarity 的拓展，那道題說單詞之間不可傳遞，於是乎這道題就變成可以傳遞了，難度就增加了。不過沒有關系，還是用經典老三樣來解，BFS，DFS，和 Union Find。先來看 BFS 的解法，其實這道題的本質是無向連通圖的問題，首先要做的就是建立這個連通圖的數據結構，對於每個結點來說，要記錄所有和其相連的結點，建立每個結點和其所有相連結點集合之間的映射，比如對於這三個相似對 (a, b), (b, c)，和(c, d)，我們有如下的映射關系：

a -> {b}

b -> {a, c}

c -> {b, d}

d -> {c}

那么如果要驗證a和d是否相似，就需要用到傳遞關系，a只能找到b，b可以找到a，c，為了不陷入死循環，將訪問過的結點加入一個集合 visited，那么此時b只能去，c只能去d，那么說明a和d是相似的了。用for循環來比較對應位置上的兩個單詞，如果二者相同，那么直接跳過去比較接下來的。否則就建一個訪問即可 visited，建一個隊列 queue，然后把 words1 中的單詞放入 queue，建一個布爾型變量 succ，標記是否找到，然后就是傳統的 BFS 遍歷的寫法了，從隊列中取元素，如果和其相連的結點中有 words2 中的對應單詞，標記 succ 為 true，並 break 掉。否則就將取出的結點加入隊列 queue，並且遍歷其所有相連結點，將其中未訪問過的結點加入隊列 queue 繼續循環，參見代碼如下：

解法一：

class Solution {
public:
    bool areSentencesSimilarTwo(vector<string>& words1, vector<string>& words2, vector<pair<string, string>> pairs) {
        if (words1.size() != words2.size()) return false;
        unordered_map<string, unordered_set<string>> m;
        for (auto pair : pairs) {
            m[pair.first].insert(pair.second);
            m[pair.second].insert(pair.first);
        }    
        for (int i = 0; i < words1.size(); ++i) {
            if (words1[i] == words2[i]) continue;
            unordered_set<string> visited;
            queue<string> q{{words1[i]}};
            bool succ = false;
            while (!q.empty()) {
                auto t = q.front(); q.pop();
                if (m[t].count(words2[i])) {
                    succ = true; break;
                }
                visited.insert(t);
                for (auto a : m[t]) {
                    if (!visited.count(a)) q.push(a);
                }
            }
            if (!succ) return false;
        }    
        return true;
    }
};

下面來看遞歸的寫法，解題思路跟上面的完全一樣，把主要操作都放到了一個遞歸函數中來寫，參見代碼如下：

解法二：

class Solution {
public:
    bool areSentencesSimilarTwo(vector<string>& words1, vector<string>& words2, vector<pair<string, string>> pairs) {
        if (words1.size() != words2.size()) return false;
        unordered_map<string, unordered_set<string>> m;
        for (auto pair : pairs) {
            m[pair.first].insert(pair.second);
            m[pair.second].insert(pair.first);
        }
        for (int i = 0; i < words1.size(); ++i) {
            unordered_set<string> visited;
            if (!helper(m, words1[i], words2[i], visited)) return false;
        }
        return true;
    }
    bool helper(unordered_map<string, unordered_set<string>>& m, string& cur, string& target, unordered_set<string>& visited) {
        if (cur == target) return true;
        visited.insert(cur);
        for (string word : m[cur]) {
            if (!visited.count(word) && helper(m, word, target, visited)) return true;
        }
        return false;
    }
};

下面這種解法就是碉堡了的聯合查找 Union Find 了，這種解法的核心是一個 getRoot 函數，如果兩個元素屬於同一個群組的話，調用 getRoot 函數會返回相同的值。主要分為兩部，第一步是建立群組關系，suppose 開始時每一個元素都是獨立的個體，各自屬於不同的群組。然后對於每一個給定的關系對，對兩個單詞分別調用 getRoot 函數，找到二者的祖先結點，如果從未建立過聯系的話，那么二者的祖先結點時不同的，此時就要建立二者的關系。等所有的關系都建立好了以后，第二步就是驗證兩個任意的元素是否屬於同一個群組，就只需要比較二者的祖先結點都否相同啦。是不是有點深度學習的趕腳，先建立模型 training，然后再 test。哈哈，博主亂扯的，二者並沒有什么聯系。這里保存群組關系的數據結構，有時用數組，有時用 HashMap，看輸入的數據類型吧，如果輸入元素的整型數的話，用 root 數組就可以了，如果是像本題這種的字符串的話，需要用 HashMap 來建立映射，建立每一個結點和其祖先結點的映射。注意這里的祖先結點不一定是最終祖先結點，而最終祖先結點的映射一定是最重祖先結點，所以 getRoot 函數的設計思路就是要找到最終祖先結點，那么就是當結點和其映射結點相同時返回，否則繼續循環，可以遞歸寫，也可以迭代寫，這無所謂。注意這里第一行判空是相當於初始化，這個操作可以在外面寫，就是要讓初始時每個元素屬於不同的群組，參見代碼如下：

解法三：

class Solution {
public:
    bool areSentencesSimilarTwo(vector<string>& words1, vector<string>& words2, vector<pair<string, string>> pairs) {
        if (words1.size() != words2.size()) return false;
        unordered_map<string, string> m;       
        for (auto pair : pairs) {
            string x = getRoot(pair.first, m), y = getRoot(pair.second, m);
            if (x != y) m[x] = y;
        }
        for (int i = 0; i < words1.size(); ++i) {
            if (getRoot(words1[i], m) != getRoot(words2[i], m)) return false;
        }
        return true;
    }
    string getRoot(string word, unordered_map<string, string>& m) {
        if (!m.count(word)) m[word] = word;
        return word == m[word] ? word : getRoot(m[word], m);
    }
};

Github 同步地址：

https://github.com/grandyang/leetcode/issues/737

類似題目：

Friend Circles

Accounts Merge

Sentence Similarity

參考資料：

https://leetcode.com/problems/sentence-similarity-ii/

https://leetcode.com/problems/sentence-similarity-ii/discuss/109747/Java-Easy-DFS-solution-with-Explanation

https://leetcode.com/problems/sentence-similarity-ii/discuss/109752/JavaC%2B%2B-Clean-Code-with-Explanation

LeetCode All in One 題目講解匯總(持續更新中...)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 NLP（三十三）：sentence-transformers句子相似度官方示例 NLP（三十三）：sentence-transformers句子相似度官方示例 Sentence-BERT: 一種能快速計算句子相似度的孿生網絡 [LeetCode] Sentence Screen Fitting 調整屏幕上的句子句子相似度計算方法 Elasticsearch中的相似度模型(原文：Similarity in Elasticsearch) [LeetCode] Beautiful Arrangement II 優美排列之二 [LeetCode] Wiggle Sort II 擺動排序之二 [LeetCode] 505. The Maze II 迷宮之二 NLP入門（一）詞袋模型及句子相似度