PrintWriter裝飾FileWriter后，對字符串的默認編碼方式

本文轉載自查看原文 2020-02-13 21:19 655

import java.io.*;

public class test2 {
    public static void main(String[] args) throws IOException {
        PrintWriter out = new PrintWriter(
                new BufferedWriter(new FileWriter("BasicFileOutput.out")));
        out.println("我是第一行");
        out.close();
        // Show the stored file:
        System.out.println(new BufferedReader(new FileReader("BasicFileOutput.out")).readLine());
    }
}

上面程序對一個文件進行寫入，我們知道Reader是處理字符的，但最終存入到文件里是需要通過編碼把字符變成對應若干字節的。

我們知道IO體系使用了裝飾器模式，而PrintWriter和BufferedWriter都是裝飾類，都是為了拓展功能的。
通過對out.printlnctrl+點擊追蹤源碼，能發現裝飾類最終都會調用到自身一個Writer類型的成員的write函數上。主要過程就是：PrintWriter對象去調用BufferedWriter的write，BufferedWriter對象去調用FileWriter的write。所以最終應該看FileWriter的write實現。
去查看FileWriter的源碼發現根本沒有write函數，原來write函數在其父類OutputStreamWriter里就寫好了。發現其調用了StreamEncoder類型成員變量se的write函數。

//OutputStreamWriter
	private final StreamEncoder se;

    public void write(char cbuf[], int off, int len) throws IOException {
        se.write(cbuf, off, len);
    }

再去看StreamEncoder的write實現：

//StreamEncoder
    public void write(char cbuf[], int off, int len) throws IOException {
        synchronized (lock) {
            ensureOpen();
            if ((off < 0) || (off > cbuf.length) || (len < 0) ||
                ((off + len) > cbuf.length) || ((off + len) < 0)) {
                throw new IndexOutOfBoundsException();
            } else if (len == 0) {
                return;
            }
            implWrite(cbuf, off, len);//調用下面的函數
        }
    }

    void implWrite(char cbuf[], int off, int len)
        throws IOException
    {
        CharBuffer cb = CharBuffer.wrap(cbuf, off, len);

        if (haveLeftoverChar)
        flushLeftoverChar(cb, false);

        while (cb.hasRemaining()) {
        CoderResult cr = encoder.encode(cb, bb, false);//關鍵。調用了encoder成員的encode函數。這里打斷點
        if (cr.isUnderflow()) {
           assert (cb.remaining() <= 1) : cb.remaining();
           if (cb.remaining() == 1) {
                haveLeftoverChar = true;
                leftoverChar = cb.get();
            }
            break;
        }
        if (cr.isOverflow()) {
            assert bb.position() > 0;
            writeBytes();
            continue;
        }
        cr.throwException();
        }
    }

這句CoderResult cr = encoder.encode(cb, bb, false)打完斷點的截圖如下，可以看到encoder是UTF-8。似乎這樣就可以結束分析，但是我們還是沒有搞清楚UTF-8到底怎么來的。所以接着分析。
在這里插入圖片描述
既然encoder是StreamEncoder的成員變量，那么我們看一下它的構造器是否為encoder賦了值：

//StreamEncoder
    private StreamEncoder(OutputStream out, Object lock, CharsetEncoder enc) {//在這里打斷點
        super(lock);
        this.out = out;
        this.ch = null;
        this.cs = enc.charset();
        this.encoder = enc;

        // This path disabled until direct buffers are faster
        if (false && out instanceof FileOutputStream) {
                ch = ((FileOutputStream)out).getChannel();
        if (ch != null)
                    bb = ByteBuffer.allocateDirect(DEFAULT_BYTE_BUFFER_SIZE);
        }
            if (ch == null) {
        bb = ByteBuffer.allocate(DEFAULT_BYTE_BUFFER_SIZE);
        }
    }

發現構造器會為其賦值，所以再回到OutputStreamWriter，看看它的StreamEncoder類型成員變量se是怎么來的：

//FileWriter
    //本文程序用的是這個重載版本的FileWriter構造器
    public FileWriter(String fileName) throws IOException {
        super(new FileOutputStream(fileName));//調用FileWriter的父類OutputStreamWriter構造器
    }
//OutputStreamWriter
    //根據上面，會調用到這個重載版本的FileWriter構造器
    public OutputStreamWriter(OutputStream out) {
        super(out);
        try {
            se = StreamEncoder.forOutputStreamWriter(out, this, (String)null);//關鍵。這里為se變量賦值
        } catch (UnsupportedEncodingException e) {
            throw new Error(e);
        }
    }

再次追蹤到StreamEncoder的forOutputStreamWriter里：

//StreamEncoder
    public static StreamEncoder forOutputStreamWriter(OutputStream out,
                                                      Object lock,
                                                      String charsetName)//根據上面，這個參數為null
        throws UnsupportedEncodingException
    {
        String csn = charsetName;
        if (csn == null)//會進入此分支
            csn = Charset.defaultCharset().name();
        try {
            if (Charset.isSupported(csn))
                return new StreamEncoder(out, lock, Charset.forName(csn));
        } catch (IllegalCharsetNameException x) { }
        throw new UnsupportedEncodingException (csn);
    }

追蹤到Charset的defaultCharset方法：

//Charset
    public static Charset defaultCharset() {
        if (defaultCharset == null) {
            synchronized (Charset.class) {
                String csn = AccessController.doPrivileged(
                    new GetPropertyAction("file.encoding"));
                Charset cs = lookup(csn);
                if (cs != null)
                    defaultCharset = cs;
                else
                    defaultCharset = forName("UTF-8");
            }
        }
        return defaultCharset;
    }

終於真相大白，原來寫入文件編碼時用到的字符集是"file.encoding"（它一般就設置為UTF-8），如果jvm不支持該字符集，則再使用"UTF-8"。

總結一下：

new PrintWriter( new BufferedWriter( new FileWriter("BasicFileOutput.out")))這句代碼，外面的PrintWriter和BufferedWriter都只是為了裝飾，為了拓展功能，它們只是在和程序的內存打交道。
而FileWriter則真正與文件打交道，它將每個char字符按照某個字符集的標准進行encode，然后將encode得到的字節寫入到文件中。
多講一下PrintWriter和DataOutputStream，它們要寫入文件，就需要直接或間接地裝飾到別的FileReader。要寫入文件，就必須一個字節一個字節的存。
對於PrintWriter來說，它利用了字符集，因為字符集提供的映射關系就剛好是“字符<===>若干字節”；
對於DataOutputStream來說，它利用了映射關系“Java數據類型<===>Java數據類型在內存中的存儲”。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 JS對字符串編碼的幾種方式字符串和編碼 python中字符串編碼方式小結 golang——字符串與編碼 Python的字符串編碼 python字符串編碼 Java字符串編碼字符串編碼格式 Delphi與字符編碼（實戰篇）（MultiByteToWideChar會返回轉換后的寬字符串長度） Python中的字符串與字符編碼