String、StringBuffer、StringBuilder源碼解讀

本文轉載自查看原文 2016-09-24 19:55 2587 Java/ 源碼解讀

序

好長時間沒有認真寫博客了，過去的一年挺忙的。負責過數據庫、線上運維環境、寫代碼、Code review等等東西挺多。
學習了不少多方面的東西，不過還是需要回歸實際、加強內功，方能扛鼎。

去年學習Mysql列舉了大綱，書寫了一部分。后來進入到工作狀態，就沒有繼續書寫。當然其實沒有書寫的內容部分已經總結到了公司內部的wiki中，或者在工作過程中大半也應用過，也懶得書寫下來了。看什么時候又有心情，重新回顧總結一下吧。

下一步的學習計划

數據結構、算法、源代碼解讀、多線程（哎，學無止境）

為什么先說String呢？

其實絕大部分業務開發過程中String都是最常用的類。常常利用JProfiler這類工具做內存分析時，能看到char[]（為什么是char[]在接下來的源碼解讀中會有提現）能站到70%以上。

類關系圖

簡要對比

差別	String	StringBuffer	StringBuilder
常量 / 變量	常量	變量	變量
線程是否安全	安全	安全	非安全
所在內存區域	Constant String Pool(常量池)	heap	heap
是否能被繼承	否	否	否
代碼行數	3157	718	448
使用場景	在字符串不經常變化的場景	在頻繁進行字符串運算（如拼接、替換、刪除等），並且運行在多線程環境	在頻繁進行字符串運算（如拼接、替換、和刪除等），並且運行在單線程的環境
場景舉例	常量的聲明、少量的變量運算	XML 解析、HTTP 參數解析和封裝	SQL 語句的拼裝、JSON 封裝

從代碼行數來上說String類更大，其中大量的方法重載拓展了篇幅。同時注釋文檔詳細，注釋的行文風格常常看到一個簡短的定義之后，緊跟一個由that或the引導的定語從句（定語從句一般皆放在被它所修飾的名（代）詞之后）。
例:

1 /**
2      * Allocates a new {@code String} that contains characters from a subarray
3      * of the <a href="Character.html#unicode">Unicode code point</a> array
4      * argument.  The {@code offset} argument is the index of the first code
5      * point of the subarray and the {@code count} argument specifies the
6      * length of the subarray.  The contents of the subarray are converted to
7      * {@code char}s; subsequent modification of the {@code int} array does not
8      * affect the newly created string.
9    **/

View Code

AbstractStringBuilder ：StringBuffer類與StringBuilder類都繼承了AbstractStringBuilder，抽象父類里實現了除toString以外的所有方法。
StringBuilder：自己重寫了方法之后，全都在方法內super.function()，未做任何擴展。同時從類名語義上來說String構建者，所以沒有subString方法看來也合情合理；
StringBuffer：在重寫方法的同時，幾乎所有方法都添加了synchronized同步關鍵字；

常量與變量解釋

String類是依賴一個私有字符常量表實現的；

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

View Code

StringBuffer與StringBuilder都是繼承AbstractStringBuilder，然而AbstractStringBuilder類是依賴一個字符變量表實現的；

abstract class AbstractStringBuilder implements Appendable, CharSequence {
    /**
     * The value is used for character storage.
     */
    char[] value;

View Code

線程安全分析

為什么String是線程安全的？
首先，String是依賴字符常量表實現的；
其次，所有對String發生修改的方法返回值都是一個新的String對象，沒有修改原有對象；
示例：

    public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */
            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            if (i < len) {
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

View Code

為什么實現了以上提到的兩點就是線程安全的呢？

以StringBuilder類append方法為示例，第19行將需要添加的value，通過arraycopy方法復制到dst中。

 AbstractStringBuilder append(AbstractStringBuilder asb) {
        if (asb == null)
            return appendNull();
        int len = asb.length();
        ensureCapacityInternal(count + len);
        asb.getChars(0, len, value, count);//value為char [] value，StringBuilder依賴字符變量表實現
        count += len;
        return this;
    }
    public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)
    {
        if (srcBegin < 0)
            throw new StringIndexOutOfBoundsException(srcBegin);
        if ((srcEnd < 0) || (srcEnd > count))
            throw new StringIndexOutOfBoundsException(srcEnd);
        if (srcBegin > srcEnd)
            throw new StringIndexOutOfBoundsException("srcBegin > srcEnd");
        System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
    }

View Code

場景假設：

假設有A、B兩個線程，StringBuilder初始值為"1"；
A線程：執行append("2")；
B線程：執行append("3")；

過程分析：
CPU在執行了部分A線程的邏輯，剛好執行到第19行，此時B線程已經執行完畢；
導致A線程開始執行append("2")時，StringBuilder為"1"；
執行到一半StringBuilder變成了"13"；
最后結果得到為"132"；

過程圖示：

哎，感覺沒能選擇一個較好的例子解釋這個問題。肯定會有一部分同學懂這部分原理的覺得講得太淺，不懂的同學可能依然不明所以。在之后的篇幅中，會仔細講述線程安全這塊內容。

性能分析

常常來說在大家的印象中，String做字符串連接是比較低效的行為。甚至在很多性能優化的經典中，都提到過切莫在迭代中使用字符串拼接操作。
這是為什么呢？
在人們通常的認識中String為常量，對常量做更改時必然需要重新開辟內存空間，以容納新生成的String內容。如果在迭代場景中使用字符串拼接操作，那么就會大量無謂的開辟內存空間，然后在生成新的String對象后，又釋放已丟失引用的String對象。

但是事實真是如此么？

測試代碼：

import java.util.function.Supplier;
/**
 * @auth snifferhu
 * @date 16/9/24 18:50
 */
public class StrTest {
    private final static int TIMES = 30000;// 測試循環次數
    private static Supplier<CharSequence> sigleStringAppend = () -> {
        String tmp = "a" + "b" + "c";
        return tmp;
    };
    private static Supplier<CharSequence> stringAppend = () -> {
        String tmp = "1";
        for (int i = 0; i < TIMES; i++) {
            tmp+= "add";
        }
        return tmp;
    };
    private static Supplier<CharSequence> stringBufferAppend = () -> {
        StringBuffer tmp = new StringBuffer("1");
        for (int i = 0; i < TIMES; i++) {
            tmp.append("add");
        }
        return tmp;
    };
    private static Supplier<CharSequence> stringBuilderAppend = () -> {
        StringBuilder tmp = new StringBuilder("1");
        for (int i = 0; i < TIMES; i++) {
            tmp.append("add");
        }
        return tmp;
    };
    public static void main(String[] args) {
        timerWarpper(sigleStringAppend);
        timerWarpper(stringAppend);
        timerWarpper(stringBufferAppend);
        timerWarpper(stringBuilderAppend);
    }
    public static void timerWarpper(Supplier<CharSequence> supplier){
        Long start = System.currentTimeMillis();
        supplier.get();
        System.out.println(String.format("function [%s] time cost is %s" , 
                supplier.getClass().getCanonicalName() , 
                (System.currentTimeMillis() - start)));
    }
}

View Code

運行結果：

function [com.string.StrTest$$Lambda$1/1198108795] time cost is 0
function [com.string.StrTest$$Lambda$2/1706234378] time cost is 2339
function [com.string.StrTest$$Lambda$3/1867750575] time cost is 1
function [com.string.StrTest$$Lambda$4/2046562095] time cost is 1

從結果看來簡單的String拼接在1毫秒內完成，StringBuffer與StringBuilder耗時為1，String類在迭代拼接操作中消耗了極長的時間為2339毫秒。
能夠得出結論：迭代中使用字符串拼接操作確實是極為消耗時間的操作。

hashCode

String類中將hashCode緩存放在了私有變量hash，算是一種提升性能的手段，因為String本身是常量不會改變，也不擔心hashCode會出錯。

    /** Cache the hash code for the string */
    private int hash; // Default to 0
    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;
            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

View Code

StringBuffer與StringBuilder類並未重寫hashCode方法；

equals

String類先利用"=="比較內存地址，再判斷是否屬於String類型，最后逐一比較每一個字節內容；

    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

View Code

StringBuffer與StringBuilder類並未重寫equals方法；

toString

在toString方法實現上，它們各有千秋。String類直接返回自己。

 /**
     * This object (which is already a string!) is itself returned.
     *
     * @return  the string itself.
     */
    public String toString() {
        return this;
    }

View Code

StringBuffer類為了保障線程安全，添加了同步關鍵字；

同時為了提升性能利用私有變量緩存內容，並且本地緩存不能被序列化；
在每次修改StringBuffer時，都會將toStringCache置空。

/**
     * A cache of the last value returned by toString. Cleared
     * whenever the StringBuffer is modified.
     */
    private transient char[] toStringCache;
    @Override
    public synchronized String toString() {
        if (toStringCache == null) {
            toStringCache = Arrays.copyOfRange(value, 0, count);
        }
        return new String(toStringCache, true);
    }

View Code

valueOf

為什么可以挑出這個方法講述呢？
這是個靜態方法，對於很多類來說都有toString方法，亦能達到類似的效果；
在此做了一個容錯處理，判斷是否為null，保障不會報錯；

    public static String valueOf(Object obj) {
        return (obj == null) ? "null" : obj.toString();
    }

View Code

在StringBuffer類、StringBuilder類中，沒有valueOf方法，不過在insert方法中調用到了valueOf；
在這是有坑點的，當傳入的值為null時，它結果給我插入了"null"。大家伙切記。

    public synchronized StringBuffer insert(int offset, Object obj) {
        toStringCache = null;
        super.insert(offset, String.valueOf(obj));
        return this;
    }

View Code

subString

StringBuffer、StringBuilder類依然是繼承AbstractStringBuilder類實現，StringBuffer略有不同則是添加了同步關鍵字；值得細細品味的是異常處理，明確的語義能夠讓人准確定位問題。

public String substring(int start, int end) {
        if (start < 0)
            throw new StringIndexOutOfBoundsException(start);
        if (end > count)
            throw new StringIndexOutOfBoundsException(end);
        if (start > end)
            throw new StringIndexOutOfBoundsException(end - start);
        return new String(value, start, end - start);
    }

View Code

相對而言String類的實現，在最后拋出新對象時，做了判斷確定是否需要真的新生成對象，值得可取的性能優化點；
同時因為返回類型為String，AbstractStringBuilder類沒法學String一樣拋出this；
說來說去都需要新生成String對象所以就省去了這個判斷。

   public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

View Code

replace

String類實現replace方法，先判斷新舊是否一致提升效率，棒棒噠！
while循環查找第一個與oldChar相同的表地址；
為了提升性能做了本地緩存buf，同時因為value本身是常量也不用怕修改過程中被篡改了。

    public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */
            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            if (i < len) {
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

View Code

StringBuffer、StringBuilder對應的方法入參和出參都與String不同；
在校驗完長度之后，就調用ensureCapacityInternal做表擴展；
利用System.arraycopy的時候，因為StringBuilder沒做同步，會有arraycopy執行的同時value被篡改，導致長度不合適的情況；

    public AbstractStringBuilder replace(int start, int end, String str) {
        if (start < 0)
            throw new StringIndexOutOfBoundsException(start);
        if (start > count)
            throw new StringIndexOutOfBoundsException("start > length()");
        if (start > end)
            throw new StringIndexOutOfBoundsException("start > end");
        if (end > count)
            end = count;
        int len = str.length();
        int newCount = count + len - (end - start);
        ensureCapacityInternal(newCount);
        System.arraycopy(value, end, value, start + len, count - end);
        str.getChars(value, start);
        count = newCount;
        return this;
    }
    /**
     * This method has the same contract as ensureCapacity, but is
     * never synchronized.
     */
    private void ensureCapacityInternal(int minimumCapacity) {
        // overflow-conscious code
        if (minimumCapacity - value.length > 0)
            expandCapacity(minimumCapacity);
    }
    /**
     * This implements the expansion semantics of ensureCapacity with no
     * size check or synchronization.
     */
    void expandCapacity(int minimumCapacity) {
        int newCapacity = value.length * 2 + 2;
        if (newCapacity - minimumCapacity < 0)
            newCapacity = minimumCapacity;
        if (newCapacity < 0) {
            if (minimumCapacity < 0) // overflow
                throw new OutOfMemoryError();
            newCapacity = Integer.MAX_VALUE;
        }
        value = Arrays.copyOf(value, newCapacity);
    }

View Code

trim

String類在實現trim巧妙的地方在於用char直接做小於等於的比較，經過驗證他們底層會轉化為int類型，然后比較的是他們的ascii碼。

   public String trim() {
        int len = value.length;
        int st = 0;
        char[] val = value;    /* avoid getfield opcode */
        while ((st < len) && (val[st] <= ' ')) {
            st++;
        }
        while ((st < len) && (val[len - 1] <= ' ')) {
            len--;
        }
        return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
    }

View Code

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 String、StringBuffer和StringBuilder源碼解析 String,StringBuffer,StringBuilder的區別及其源碼分析(一) String,StringBuffer,StringBuilder的區別及其源碼分析(二) Java中String、StringBuilder、StringBuffer常用源碼分析及比較（二）：StringBuilder、StringBuffer源碼分析 string和stringBuffer，stringBuilder的區別 1 String、StringBuffer與StringBuilder區別 String,StringBuffer 和 StringBuilder 的區別 String、StringBuffer、與StringBuilder的區別 String、StringBuffer、StringBuilder詳解 String、StringBuilder、StringBuffer的區別