最近一直在寫一個手機端的小說閱讀器,想了想還是寫一個系列的博客記錄一下踩到的坑吧。
首先,既然是小說閱讀器,當然少不了智能分章的功能,話不多說,直接上代碼。

import java.io.BufferedReader; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStreamReader; import java.io.UnsupportedEncodingException; import java.util.ArrayList; import java.util.List; public class NovelParser{ private String path; private String charset; private String name; private List<TitleInfo> titleList; public static final int MAX_PARSE_NUMBER = 200; public NovelParser(String path,String charset){ this.path = path; this.charset = charset; titleList = new ArrayList<>(); int index = path.lastIndexOf("\\"); name = index == -1 ? path : path.substring(index + 1,path.lastIndexOf(".")); } //execute only once public void parseTitleInfo(){ long time = System.currentTimeMillis(); int count = 0; BufferedReader reader = null; InputStreamReader inputStreamReader = null; FileInputStream inputStream = null; try { inputStream = new FileInputStream(path); inputStreamReader = new InputStreamReader(inputStream,charset); reader = new BufferedReader(inputStreamReader); String line; //之所以設置這個變量是因為有的TXT文檔會在一章的開頭將標題重復一遍,造成一章內容被解析成兩章 //所以設置一個最小行數,兩個章節之間的行數差距最小為5 int number = 5; //因為一般的TXT文檔開頭都會有一些介紹性信息,這些不能被歸到第一章中,所以單獨新建一個章節保存起來 TitleInfo titleInfo = new TitleInfo(); titleInfo.setTitle(name); titleInfo.setIndex(0); titleInfo.setStartLength(0); titleList.add(titleInfo); System.out.println("書籍開始章節 : " + titleInfo.toString()); StringBuilder builder = new StringBuilder(); int parseLength = 0; while ((line = reader.readLine()) != null){ line = line.trim(); if(line.equals("")){ parseLength += 2;//這里的+2是因為要加上換行的長度 continue; } if(line.trim().length() < 4){ if(number >= 5 && TitleMatches.isExtra(line)) {//如果是額外章節 count++; parseLength += builder.toString().getBytes(charset).length; builder.delete(0,builder.length()); titleInfo = new TitleInfo(count, line, parseLength); titleList.add(titleInfo); number = 0; System.out.println("檢測到額外章節" + titleInfo.toString()); } }else{ if(number >= 5 && TitleMatches.isZhang(line)){//如果是正文章節 count++; parseLength += builder.toString().getBytes(charset).length; builder.delete(0,builder.length()); titleInfo = new TitleInfo(count,line, parseLength); titleList.add(titleInfo); number = 0; System.out.println("檢測到新章節" + titleInfo.toString()); } } builder.append(line); parseLength += 2; number++; if(number >= MAX_PARSE_NUMBER){ //為了避免某個文檔一直沒有匹配到新章節而不停的向StringBuilder中添加內容,導致Android內存溢出,這里對StringBuilder的大小進行了一定的限制 //即解析的行數達到一定的數目之后,即使沒有匹配到新章節也將StringBuilder清空,同時更新parseLength。 //注意:這個數目的設定會影響到解析的時間,請謹慎設置!!!! parseLength += builder.toString().getBytes(charset).length; builder.delete(0,builder.length()); number = 5; } } } catch (UnsupportedEncodingException e) { e.printStackTrace(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { if(inputStream != null){ try { inputStream.close(); } catch (IOException e) { e.printStackTrace(); } } if(inputStreamReader != null){ try { inputStreamReader.close(); } catch (IOException e) { e.printStackTrace(); } } if(reader != null){ try { reader.close(); } catch (IOException e) { e.printStackTrace(); } } System.out.println("執行完畢,耗時 : " + (System.currentTimeMillis() - time) + ",檢測到" + titleList.size() + "章"); } } }
NovelParser類就是主要的工作類了。解析的原理很簡單,就是用BufferedReader從文本文檔中一行一行的讀取內容,然后用正則來判斷這一行是否是新章節的開始。主要的部分都有注釋,下面是用來存儲章節信息的TitleInfo類:

public class TitleInfo { private int index;//章節下下標 private String title;//章節標題 private int startLength;//章節開始字節數,用來和RandomAccessFile作章節跳轉和單章解析用 public TitleInfo(){ } public TitleInfo(int index, String title, int startLength) { this.index = index; this.title = title; this.startLength = startLength; } public int getIndex() { return index; } public String getTitle() { return title; } public long getStartLength() { return startLength; } public void setIndex(int index) { this.index = index; } public void setTitle(String title) { this.title = title; } public void setStartLength(int startLength) { this.startLength = startLength; } public String toString(){ return "[index = " + index + ",title = " + title + ",startLength = " + startLength + "]"; } }
以及用來匹配新章節的TitleMatches類:

import java.util.regex.Pattern; /** * 這個類用來判斷某一行是否為新章節 * 判定條件:1.如果是新章節則必定以"第"開頭,且至少包含關鍵字數組key中的一個元素,且"第"到該關鍵字中的內容匹配正則p * 2.如果是額外章節,則其單行長度(去掉空格之后)不得超過3,且至少滿足下列條件中的一條 * a.其第一個或者第二個字為"序"(e.g.序,序言,序章,魔序)且字符長度不超過2 * b.以extra_key_start關鍵字數組中任意一項開頭(e.g.前言,附錄1,后記1) */ public class TitleMatches { //匹配的優先度依次遞減 public static final String[] key = {"部","卷","章","節","集","回","幕","計"}; public static final Pattern p = Pattern.compile("^[0-9零一二三四五六七八九十百千]+$"); public static boolean isZhang(String line){ if(!line.startsWith("第")){ return false; } int index = -1; for (int i = 0; i < key.length; i++) { index = line.indexOf(key[i]); if(index != -1){ break; } } if(index == -1){ return false; } String zhang = line.substring(1,index); return p.matcher(zhang).matches(); } public static final String[] extra_key = {"序"}; public static final String[] extra_key_start = {"前言","后記","楔子","附錄","外傳"}; public static boolean isExtra(String line){ if(line.length() > 3){ return false; } int index = line.indexOf(extra_key[0]); if(index != -1){ return (index == 0 || index == 1) && line.length() <= 2; }else{ for (int i = 0; i < extra_key_start.length; i++) { if(line.startsWith(extra_key_start[i])){ return true; } } return false; } } }
這幾個類都添加完成之后就萬事俱備了,需要注意的地方在代碼中都有注釋。下面是測試代碼,用來測試的小說是希靈帝國,大小是15.7MB。

public static void main(String[] args){ NovelParser parser = new NovelParser("D:\\Users\\Excalibur\\Desktop\\希靈帝國.txt","GBK"); parser.parseTitleInfo(); }
以下是運行截圖:
去掉輸出語句之后如下:
至此,小說的智能分章就已經實現了,可以在各個章節之間自由跳轉而不會導致閱讀器卡頓了。