lua 文件讀寫處理（操作敏感詞庫）

本文轉載自查看原文 2019-07-27 20:18 533 實用工具/ lua/ io庫/ 文件讀寫

最近需要給游戲做一個敏感詞新系統，我采用的方法是比較常用的DFA（確定有窮狀態機）算

法，先不講算法，而這種算法的實現需要一個相應的敏感詞庫。

我拿到了詞庫后發現詞庫中大概有8000+個詞，其中包括很多重復的，還有很多有着頭包含關

系的詞；

　　什么是頭包含詞呢？看如下例子：

　　我們知道在DFA算法讀取敏感詞后如果存在這種情況：

　　詞1: "ab" 詞2: "abc"

　　在讀取之后“ ab “這個敏感詞就會不復存在而被abc覆蓋掉，而我們游戲需要對敏感詞進行的

操作不是以其他字符（如 * *）代替句子中的敏感詞而是如果判斷出句子中含有敏感詞，則無法發

出。所以，如果 “ab” 已經是敏感詞了，“abc”就沒有必要出現在敏感詞庫中了所以我需要將敏感

詞庫中的

　　1. 相同的詞只留下一個

　　2. 刪除頭包含其他敏感詞的敏感詞

　　但是現有的敏感詞庫中有8000+ 個詞我不可能一個個去找，所以我就想到了利用現有的lua io

文件庫對原先的敏感詞庫進行處理這樣可以節省太多的時間代碼如下

local function getNewWord()
    local wordsDataInput  = {}
    local wordsDataOutput = {}
    -- 讀取文件
    -- 以只讀方式打開文件
    local file_input = io.open("sensitive_words_input.txt", "r")
    
    -- 設置默認輸入文件為 test.lua
    io.input(file_input)

    -- 逐行讀取文件
    local string_l = file_input:read("*l") 
    while(string_l ~= nil)
    do
        table.insert(wordsDataInput, string_l)
        string_l = file_input:read("*l") 
    end
    io.close(file_input)

    -- 寫入文件
    -- 以只寫方式打開文件
    local file_output = io.open("sensitive_words.txt", "a")

    -- 設置默認輸出文件為
    io.output(file_output)

    -- 對數據進行處理
    -- 如果有頭包含
    local function ifIsHeadInTable(str)
        for i = 1, #wordsDataInput do
            local startIndex, endIndex = string.find(wordsDataInput[i], str)
            if startIndex ~= nil and endIndex ~= nil then
                -- 如果find到頭索引為1,尾索引不為字符串長度則可以認定為是頭包含關系
                if startIndex == 1 and endIndex ~= string.len(wordsDataInput[i]) then
                    wordsDataInput[i] = "\n"
                end
            end
        end    
    end 
    
    -- 是否已經有相同的
    local function isHasSameInTable(str)
        if not wordsDataOutput or not next(wordsDataOutput) then return false end

        for key, value in ipairs(wordsDataOutput) do
            if value == str then
                return true
            end
        end

        return false
    end

    -- 先剔除頭包含
    for key, value in pairs(wordsDataInput) do
        ifIsHeadInTable(value)
    end

    -- 再剔除相同的
    for key, value in ipairs(wordsDataInput) do
        if not isHasSameInTable(value) then
            table.insert(wordsDataOutput, value)
        end
    end

    for index, word in pairs(wordsDataOutput) do
        io.write(word.."\n")
    end
    io.close(file_output)
end

　　操作后的文件少了整整4000個詞，差不多35kb，這樣加載詞庫需要的空間和時間都大大減少。但是要注意的是lua對文件的操作都是以UTF-8編碼來的，如果是其他編碼的文件就不能用了。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Lua讀寫文件 lua讀寫文件 lua文件讀寫 Lua的文件操作 python json處理、集合操作、函數定義、文件讀寫 lua敏感詞過濾網站敏感詞過濾的實現（附敏感詞庫）利用Lua讀寫本地文件 csv文件讀寫處理 go 文件讀寫操作