Java讀取文本文件，按字節長度解析數據入庫

本文轉載自查看原文 2018-05-05 13:55 4285 Java

一般在解析文件讀取數據時，文件每一行的字段與字段之間都會以指定符合隔開，比如："|"、"，"等。但是最近一個項目，文件中每一行是以由字段指定的字節長所組成的，中間並無任何符號，這倒是少見。

按照正常的思路，讀取每一行時按照指定字節的長度，使用subString截取即可。但是在生產上，文件都是放在linux服務器上的，文件的編碼格式一般為：GBK，而且是經過GBK編碼的字節文件，subString是截取字符串的，自然不能用了。

那具體該如何做呢，1：替換subString，重寫新方法去進行字節截取

2：在解析之前對文件先進行GBK編碼

代碼貼出來：

/**
     * 解析黑名單表文件
     * 
     * @param filePath
     *            黑名單表文件路徑
     * @return blackList 要新增的黑名單集合
     * @throws Exception
     */
    public static List<LoanBlackList> parseCunegFile(String filePath)
            throws Exception {
        // 創建接受list
        List<LoanBlackList> blackList = new ArrayList<>();
        try {
            File file = new File(filePath);
            InputStream is = new FileInputStream(file);
            BufferedReader br = new BufferedReader(new InputStreamReader(is,Charset.forName("GBK")));
            String line = "";
            while ((line = br.readLine()) != null) {
                // 客戶證件號碼-19位-截取6-25
                String customerIdNumber = StringCommonUtil.substringByte(line,6, 19).trim();
                // 黑名單類型-5位-截取167-172
                String blackListType = StringCommonUtil.substringByte(line,167, 5).trim();
                LoanBlackList lbl = new LoanBlackList();
                // 重新set更新數據
                lbl.setCustomerIdNumber(customerIdNumber);
                lbl.setBlackListType(blackListType);
                blackList.add(lbl);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return blackList;
    }

其中，這一句進行對文件輸入流的字節編碼

BufferedReader br = new BufferedReader(new InputStreamReader(is,Charset.forName("GBK")));

字節截取的方法如下：

/**
     * 按字節截取字符串 ，指定截取起始字節位置與截取字節長度
     * 
     * @param orignal
     *            要截取的字符串
     * @param offset
     *            截取Byte長度；
     * @return 截取后的字符串
     * @throws UnsupportedEncodingException
     *             使用了JAVA不支持的編碼格式
     */
    public static String substringByte(String orignal, int start, int count) {

        // 如果目標字符串為空，則直接返回，不進入截取邏輯；
        if (orignal == null || "".equals(orignal))
            return orignal;

        // 截取Byte長度必須>0
        if (count <= 0)
            return orignal;

        // 截取的起始字節數必須比
        if (start < 0)
            start = 0;

        // 目標char Pull buff緩存區間；
        StringBuffer buff = new StringBuffer();

        try {
            // 截取字節起始字節位置大於目標String的Byte的length則返回空值
            if (start >= getStringByteLenths(orignal))
                return null;
            int len = 0;
            char c;
            // 遍歷String的每一個Char字符，計算當前總長度
            // 如果到當前Char的的字節長度大於要截取的字符總長度，則跳出循環返回截取的字符串。
            for (int i = 0; i < orignal.toCharArray().length; i++) {
                c = orignal.charAt(i);

                // 當起始位置為0時候
                if (start == 0) {
                    len += String.valueOf(c).getBytes("GBK").length;
                    if (len <= count)
                        buff.append(c);
                    else
                        break;
                } else {
                    // 截取字符串從非0位置開始
                    len += String.valueOf(c).getBytes("GBK").length;
                    if (len >= start && len <= start + count) {
                        buff.append(c);
                    }
                    if (len > start + count)
                        break;
                }
            }
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }
        // 返回最終截取的字符結果;
        // 創建String對象，傳入目標char Buff對象
        return new String(buff);
    }

    /**
     * 計算當前String字符串所占的總Byte長度
     * 
     * @param args
     *            要截取的字符串
     * @return 返回值int型，字符串所占的字節長度，如果args為空或者“”則返回0
     * @throws UnsupportedEncodingException
     */
    public static int getStringByteLenths(String args)
            throws UnsupportedEncodingException {
        return args != null && args != "" ? args.getBytes("GBK").length : 0;
    }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 java讀取文本文件內容2 自動判斷文本文件編碼來讀取文本文件內容(.net版本和java版本) java基礎-輸入流-讀取文本文件中數據至字符串數組 c++實現按行讀取文本文件 php 逐行讀取文本文件 uniGUI讀取文本文件(08) PHP讀取文本文件（TXT） Java一次讀取文本文件所有內容 Java 輸入流讀取文本文件換行符問題 Java中文本文件的讀取(按行讀取)