xmlWriter 以UTF-8格式寫xml問題

本文轉載自查看原文 2016-03-15 16:39 6130 Java/ XML

dom4j中的XMLWriter提供以下幾種構造方法：

XMLWriter() 
XMLWriter(OutputFormat format) 
XMLWriter(OutputStream out) 
XMLWriter(OutputStream out, OutputFormat format) 
XMLWriter(Writer writer) 
XMLWriter(Writer writer, OutputFormat format)

最簡單常用的可能是new XMLWriter(new FileWriter(...))這樣的形式。可如果你一旦這么用，就會造成編碼問題。由於dom4j對於文件編碼的選擇是用java本身類的處理方式（可以從源碼看到），這么寫就是采用FileWriter的處理方式，而FileWriter是不提供對編碼的處理的。於是會調用系統自身的編碼，比如用中文操作系統，編碼方式就是gbk，但是它默認的在文件頭寫上<?xml version="1.0" encoding="UTF-8"?>。
也就是說，他以當前操作系統的編碼保存文件，並且竟然自動添加文件頭為"utf-8"格式，這會導致很多程序無法讀取正確編碼，而且具有很差的移植性（比如在windows下開發，放到linux服務器下跑，畢竟一般linux服務器默認local都是utf-8）。

解決途徑一：

使用new XMLWriter(new FileOutputStream(...))方法

這樣做，因為dom4j默認使用utf-8編碼，即xml文件頭默認編碼方式，並且內容也會使用utf-8保存，這樣可以做到一致的編碼，不會出問題

解決途徑二：

使用new XMLWriter(new FileOutputStream(...), outputFormat)的構造方法

OutputFormat xmlFormat = OutputFormat.createPrettyPrint();
xmlFormat.setEncoding("utf-8");
XmLWriter writer = new XMLWriter(new FileOutputStream(...), xmlFormat);
writer.write(document);
writer.close();

如上，setEncoding可以設置存儲的文件編碼格式，createPrettyPrint是得到美化xml格式輸出。這樣的話，在不同的環境下可以獲得同樣的編碼讀寫，並且真正保證了文件標稱與實際編碼的一致性。

注意如果使用OutputFormat是為了設置文件編碼，那千萬別用 XMLWriter(new FileWriter(...), outputFormat)構造方法，因為如前面所說，FileWriter不會處理編碼，
所以即使你使用format.setEncoding("utf-8");他仍然不會使用utf-8編碼，而只是把文件頭指定為utf-8，這類似不使用outputFormat的情況。

以下為個人實踐代碼：

    /**
     * 輸出xml文件
     * 
     * @param document
     * @param filePath
     * @throws IOException
     */
    public static void writeXml(Document document, String filePath) throws IOException {
        File xmlFile = new File(filePath);
        XMLWriter writer = null;
        try {
            if (xmlFile.exists())
                xmlFile.delete();
            writer = new XMLWriter(new FileOutputStream(xmlFile), OutputFormat.createPrettyPrint());
            writer.write(document);
            writer.close();
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (writer != null)
                writer.close();
        }
    }

    @Test
    public void testXMLDoc() {
        try {
            String filePath = "E:/eXML.xml";
            Document document = XMLUtil.getDocument(filePath);
            Element root = null;
            document = XMLUtil.createDocument("vrvscript", "Class", "POLICY_BASE_LINE");
            root = document.getRootElement();
            root.addAttribute("P_ID", "12");
            root.addAttribute("StartPolicy", "1中文");
            root.addAttribute("PolicyVersion", "1.0");
            root.addAttribute("ScheduleMode", "6");
            root.addAttribute("ScheduleTime", "1:1:1");
            root.addAttribute("RuleHandle", "2");
            XMLUtil.writeXml(document, filePath);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

測試結果發現：當寫入的內容包含中文時產生的xml文件是UTF-8;但如果寫入的內容中不包含中文,僅包含ANSI字符,那么產生的xml文件就是ANSI

@Test
    public void testXMLDoc() {
        try {
            String filePath = "E:/eXML.xml";
            Document document = XMLUtil.getDocument(filePath);
            Element root = null;
            document = XMLUtil.createDocument("vrvscript", "Class", "POLICY_BASE_LINE");
            root = document.getRootElement();
            root.addAttribute("P_ID", "12");
            root.addAttribute("StartPolicy", "1");
            root.addAttribute("PolicyVersion", "1.0");
            root.addAttribute("ScheduleMode", "6");
            root.addAttribute("ScheduleTime", "1:1:1");
            root.addAttribute("RuleHandle", "2");
            XMLUtil.writeXml(document, filePath);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python 讀取帶BOM的utf-8格式文件 2. 關於webstorm設置utf-8格式 python修改文件編碼為utf-8格式 java知識點 --javac.exe編譯utf-8格式的源代碼 MAC 下 Excel打開UTF-8格式的文件亂碼 Android 讀取txt文件並以utf-8格式轉換成字符串 python輸出excel能夠識別的utf-8格式csv文件 Java以UTF-8格式讀寫及追加寫文件示例 \r\n 如何轉換成utf-8格式的，在jsp頁面中正常顯示換行 Java讀取UTF-8格式txt文件第一行出現亂碼——問號“?”及解決;Java讀帶有BOM的UTF-8文件亂碼原因及解決方法