lua字符串分割函數[適配中文特殊符號混合]

本文轉載自查看原文 2020-01-02 08:26 1090 lua

lua的官方函數里無字符串分割,起初寫了個簡單的，隨之發現如果是中文、字符串、特殊符號就會出現分割錯誤的情況，所以就有了這個zsplit.

function zsplit(strn, chars)

  function stringPatch(str)
    --格式化輸入包含特殊符號的分割字符
    local str_p = str: gsub("%)", "% %)")
    : gsub("%(", "%%(")
    : gsub("%[", "%%[")
    : gsub('%]', '%%]')
    : gsub('%:', '%%:')
    : gsub('%;', '%%;')
    : gsub('%+', '%%+')
    : gsub('%-', '%%-');
    return str_p;
  end
  function jbyteCount(jstr)
    local lenInByte = #jstr;
    local tbyteCount = {};
    local totallen = 0;
    for i = 1, lenInByte do
      --計算傳入的字符串的每一個字符長度
      local curByte = string.byte(jstr, i);

      local byteCount = 0;--這里的初始長度設為0
      if curByte > 0 and curByte <= 127 then
        byteCount = 1
       elseif curByte >= 192 and curByte < 223 then
        byteCount = 2
       elseif curByte >= 224 and curByte < 239 then
        byteCount = 3
       elseif curByte >= 240 and curByte <= 247 then
        byteCount = 4
      end
      table.insert(tbyteCount,byteCount);
      totallen = totallen + byteCount;
    end
    -- print('totallen長度:',totallen);
    return totallen,tbyteCount;
  end

  --第二參數可省略 此時默認每個字符分割
  if not chars then
    chars = ''
  end
  --沒有第一參數或為空值時報錯
  if not strn then
    return "zsplit 錯誤: #1 nil 參數1為空值!";
  end
  local strSun = {};
  if chars == '' then
    --[[當默認每個字符分割時的補充方案.
          因為遇到分割中文時，因為長度問題導致分割錯誤
    ]]
    local lenInByte = #strn;
    local width = 0
    local fuckn = 0
    for i = 1, lenInByte do
      --計算傳入的字符串的每一個字符長度
      local curByte = string.byte(strn, i);
      local byteCount = 1;
      if curByte > 0 and curByte <= 127 then
        byteCount = 1
       elseif curByte >= 192 and curByte < 223 then
        byteCount = 2
       elseif curByte >= 224 and curByte < 239 then
        byteCount = 3
       elseif curByte >= 240 and curByte <= 247 then
        byteCount = 4
      end
      local char = string.sub(strn, i, i + byteCount - 1)
      fuckn = i + byteCount - 1;
      if (i~= fuckn or curByte < 127) then
        table.insert(strSun, char)
      end
      if (i == #strn) then
        return strSun
      end
    end
   else
    --endsign結束標志
    local endsign = 1;
    local ongsubs, gsubs = string.gsub(strn,stringPatch(chars), chars)
    print('\n替換結束:',ongsubs,
    '\n替換次數:',gsubs,
    '\n源字符串:',strn,
    '\n格式化匹配條件:',stringPatch(chars),
    '\n源匹配條件',chars)
    for i = 0,gsubs do
      local wi = string.find(ongsubs, stringPatch(chars));
      --print('匹配條件所在位置:',wi);
      if (wi == nil) then
        --當沒有匹配到條件時 截取當前位置到最后一個位置
        wi = -1
        endsign = 0;
      end
      local acc = string.sub(ongsubs, 1, wi-endsign)
      table.insert(strSun,acc)-- (string.gsub(acc, stringPatch(chars), '')));
      ongsubs = string.sub(ongsubs, wi + jbyteCount(chars), -1);
    end
  end
  return strSun
end

用法:

zsplit(需要分割的字符串[string],分割條件[string])

讓我們來測試一下:

str = "中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**";

sstr=zsplit(str,"哈");--中文
for i=1,#sstr do print(sstr[i]);end

sstr=zsplit(str,"de");--英文
for i=1,#sstr do print(sstr[i]);end

sstr=zsplit(str,"\\");--特殊符號
for i=1,#sstr do print(sstr[i]);end

sstr=zsplit(str,"嗯emm-");--混合
for i=1,#sstr do print(sstr[i]);end

sstr=zsplit(str,"😳");--由於表情字符的特殊性輸出的可能顯示上會錯誤，實際是正確的
for i=1,#sstr do print(sstr[i]);end

他的輸出是這樣的（換成圖片，之前被博客園誤認為排版混亂哈哈）

替換結束:	中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**	
替換次數:	2	
源字符串:	中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**	
格式化匹配條件:	哈	
源匹配條件	哈
中文:你好，
嘻嘻
。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**
替換結束:	中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**	
替換次數:	1	
源字符串:	中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**	
格式化匹配條件:	de	
源匹配條件	de
中文:你好，哈嘻嘻哈。,英文:abc
fg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**
替換結束:	中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**	
替換次數:	1	
源字符串:	中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**	
格式化匹配條件:	\	
源匹配條件	\
中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%
*$'/@|,混合:嗯emm-嗯-嗯emm嗯**
替換結束:	中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**	
替換次數:	1	
源字符串:	中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**	
格式化匹配條件:	嗯emm%-	
源匹配條件	嗯emm-
中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:
嗯-嗯emm嗯**
替換結束:	中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**	
替換次數:	1	
源字符串:	中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情😳🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**	
格式化匹配條件:	😳	
源匹配條件	😳
中文:你好，哈嘻嘻哈。,英文:abcdefg,emoji表情
🙄👍🍎🌹,特殊字符:%\*$'/@|,混合:嗯emm-嗯-嗯emm嗯**

如果有寫錯或者可以優化的地方，望指出！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 根據特殊符號拆分字符串 C中帶空格（或其他特殊符號）的字符串的輸入 String字符串增加特殊符號 python 正則表達式按大寫字母、中文、特殊符號分離字符串 java去除字符串中的特殊符號或指定的字符 Java去除字符串中的特殊符號或者指定的字符 lua 分割字符串 lua 字符串分割 js正則刪除字符串中的部分內容（支持變量和特殊符號） Oracle字符串中包含數字、特殊符號的排序