Java的String類詳解

本文轉載自查看原文 2019-11-22 20:08 696

Java的String類

String類是除了Java的基本類型之外用的最多的類, 甚至用的比基本類型還多. 同樣jdk中對Java類也有很多的優化

類的定義

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence{
   /** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;

    /**
     * Class String is special cased within the Serialization Stream Protocol.
     *
     * A String instance is written into an ObjectOutputStream according to
     * <a href="{@docRoot}/../platform/serialization/spec/output.html">
     * Object Serialization Specification, Section 6.2, "Stream Elements"</a>
     */
    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];

    /**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    public String() {
        this.value = "".value;
    }

    /**
     * Initializes a newly created {@code String} object so that it represents
     * the same sequence of characters as the argument; in other words, the
     * newly created string is a copy of the argument string. Unless an
     * explicit copy of {@code original} is needed, use of this constructor is
     * unnecessary since Strings are immutable.
     *
     * @param  original
     *         A {@code String}
     */
    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

Final 標識不允許集成重載. jdk中還多重要類都是final 標識, 防止應用程序繼承重載以影響jdk的安全
繼承Serializable 接口, 可以放心的序列化
Comparable 接口, 可以根據自然序排序.
CharSequence 字符串的重要接口
char數組 value . Final 修飾.
hash字段 int, 表示當前的hashCode值, 避免每次重復計算hash值

Comparable 接口的compareTo方法實現

public int compareTo(String anotherString) {
    int len1 = value.length;
    int len2 = anotherString.value.length;
    int lim = Math.min(len1, len2); 
    char v1[] = value;
    char v2[] = anotherString.value;

    int k = 0;
    while (k < lim) {  //也只是循環比較到長度短的那個字符串
        char c1 = v1[k];
        char c2 = v2[k];
        if (c1 != c2) {
            return c1 - c2;
        }
        k++;
    }
    return len1 - len2;  //如果前面的長度字符串都一樣, 則長度長的大
}

從左往右逐個char字符比較大小, 從代碼可以看出 "S" > "ASSSSSSSSSSSSSSS"
也只是循環比較到長度短的那個字符串
如果前面的長度字符串都一樣, 則長度長的大

構造方法

/**
 * Initializes a newly created {@code String} object so that it represents
 * an empty character sequence.  Note that use of this constructor is
 * unnecessary since Strings are immutable.
 */
public String() {
    this.value = "".value;
}

/**
 * Initializes a newly created {@code String} object so that it represents
 * the same sequence of characters as the argument; in other words, the
 * newly created string is a copy of the argument string. Unless an
 * explicit copy of {@code original} is needed, use of this constructor is
 * unnecessary since Strings are immutable.
 *
 * @param  original
 *         A {@code String}
 */
public String(String original) {
    this.value = original.value;
    this.hash = original.hash;
}
/**
*
*/
 public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
    }

空白構造方法其實是生成 "" 字符串
傳入其他字符串的構造方式其實只是把其他字符串的value 和hash 值的引用復制一份, 不用擔心兩個字符串的value和hash 互相干擾. 因為String類中沒有修改這兩個值的方法, 並且這兩個值是private final修飾的, 已經無法修改了
空白構造方法中沒有設置hash的值, 則使用 hash的默認值 // Default to 0
傳入字節數組的構造方法, 怎么將字節轉成字符串是使用StringCoding.decode(charset, bytes, offset, length);方法

StringCoding類的修飾符是default 並且里面都是default static 修飾的方法, 很遺憾, 我們無法直接使用其中的方法

StringCoding.decode 方法

static char[] decode(Charset cs, byte[] ba, int off, int len) {
    // (1)We never cache the "external" cs, the only benefit of creating
    // an additional StringDe/Encoder object to wrap it is to share the
    // de/encode() method. These SD/E objects are short-lifed, the young-gen
    // gc should be able to take care of them well. But the best approash
    // is still not to generate them if not really necessary.
    // (2)The defensive copy of the input byte/char[] has a big performance
    // impact, as well as the outgoing result byte/char[]. Need to do the
    // optimization check of (sm==null && classLoader0==null) for both.
    // (3)getClass().getClassLoader0() is expensive
    // (4)There might be a timing gap in isTrusted setting. getClassLoader0()
    // is only chcked (and then isTrusted gets set) when (SM==null). It is
    // possible that the SM==null for now but then SM is NOT null later
    // when safeTrim() is invoked...the "safe" way to do is to redundant
    // check (... && (isTrusted || SM == null || getClassLoader0())) in trim
    // but it then can be argued that the SM is null when the opertaion
    // is started...
    CharsetDecoder cd = cs.newDecoder();
    int en = scale(len, cd.maxCharsPerByte());
    char[] ca = new char[en];
    if (len == 0)
        return ca;
    boolean isTrusted = false;
    if (System.getSecurityManager() != null) {
        if (!(isTrusted = (cs.getClass().getClassLoader0() == null))) {
            ba =  Arrays.copyOfRange(ba, off, off + len);
            off = 0;
        }
    }
    cd.onMalformedInput(CodingErrorAction.REPLACE)
      .onUnmappableCharacter(CodingErrorAction.REPLACE)
      .reset();
    if (cd instanceof ArrayDecoder) {
        int clen = ((ArrayDecoder)cd).decode(ba, off, len, ca);
        return safeTrim(ca, clen, cs, isTrusted);
    } else {
        ByteBuffer bb = ByteBuffer.wrap(ba, off, len);
        CharBuffer cb = CharBuffer.wrap(ca);
        try {
            CoderResult cr = cd.decode(bb, cb, true);
            if (!cr.isUnderflow())
                cr.throwException();
            cr = cd.flush(cb);
            if (!cr.isUnderflow())
                cr.throwException();
        } catch (CharacterCodingException x) {
            // Substitution is always enabled,
            // so this shouldn't happen
            throw new Error(x);
        }
        return safeTrim(ca, cb.position(), cs, isTrusted);
    }
}

真正的byte[] 轉成char[] 是使用CharsetDecoder虛擬類, 而這個類的對象是你傳入的Charset字符編碼類中生成的.

看下UTF8的CharsetDecoder實現類.

UTF8的CharsetDecoder 類是內部靜態類, 實現了CharsetDecoder 和ArrayDecoder 接口, 接口中的方法很長,都是字節轉字符的一些換算, 如果要看懂, 需要一些編碼的知識. 追到這里結束

private static class Decoder extends CharsetDecoder implements ArrayDecoder {
    private Decoder(Charset var1) {
        super(var1, 1.0F, 1.0F);
    }
     // 此處省略無關方法.......
      /**
      * 真正的字節轉字符的方法
      */
      public int decode(byte[] var1, int var2, int var3, char[] var4) {
            int var5 = var2 + var3;
            int var6 = 0;
            int var7 = Math.min(var3, var4.length);

            ByteBuffer var8;
            for(var8 = null; var6 < var7 && var1[var2] >= 0; var4[var6++] = (char)var1[var2++]) {
            }

            while(true) {
                while(true) {
                    while(var2 < var5) {
                        byte var9 = var1[var2++];
                        if (var9 < 0) {
                            byte var10;
                            if (var9 >> 5 != -2 || (var9 & 30) == 0) {
                                byte var11;
                                if (var9 >> 4 == -2) {
                                    if (var2 + 1 < var5) {
                                        var10 = var1[var2++];
                                        var11 = var1[var2++];
                                        if (isMalformed3(var9, var10, var11)) {
                                            if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                                return -1;
                                            }

                                            var4[var6++] = this.replacement().charAt(0);
                                            var2 -= 3;
                                            var8 = getByteBuffer(var8, var1, var2);
                                            var2 += malformedN(var8, 3).length();
                                        } else {
                                            char var15 = (char)(var9 << 12 ^ var10 << 6 ^ var11 ^ -123008);
                                            if (Character.isSurrogate(var15)) {
                                                if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                                    return -1;
                                                }

                                                var4[var6++] = this.replacement().charAt(0);
                                            } else {
                                                var4[var6++] = var15;
                                            }
                                        }
                                    } else {
                                        if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                            return -1;
                                        }

                                        if (var2 >= var5 || !isMalformed3_2(var9, var1[var2])) {
                                            var4[var6++] = this.replacement().charAt(0);
                                            return var6;
                                        }

                                        var4[var6++] = this.replacement().charAt(0);
                                    }
                                } else if (var9 >> 3 != -2) {
                                    if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                        return -1;
                                    }

                                    var4[var6++] = this.replacement().charAt(0);
                                } else if (var2 + 2 < var5) {
                                    var10 = var1[var2++];
                                    var11 = var1[var2++];
                                    byte var12 = var1[var2++];
                                    int var13 = var9 << 18 ^ var10 << 12 ^ var11 << 6 ^ var12 ^ 3678080;
                                    if (!isMalformed4(var10, var11, var12) && Character.isSupplementaryCodePoint(var13)) {
                                        var4[var6++] = Character.highSurrogate(var13);
                                        var4[var6++] = Character.lowSurrogate(var13);
                                    } else {
                                        if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                            return -1;
                                        }

                                        var4[var6++] = this.replacement().charAt(0);
                                        var2 -= 4;
                                        var8 = getByteBuffer(var8, var1, var2);
                                        var2 += malformedN(var8, 4).length();
                                    }
                                } else {
                                    if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                        return -1;
                                    }

                                    int var14 = var9 & 255;
                                    if (var14 <= 244 && (var2 >= var5 || !isMalformed4_2(var14, var1[var2] & 255))) {
                                        ++var2;
                                        if (var2 >= var5 || !isMalformed4_3(var1[var2])) {
                                            var4[var6++] = this.replacement().charAt(0);
                                            return var6;
                                        }

                                        var4[var6++] = this.replacement().charAt(0);
                                    } else {
                                        var4[var6++] = this.replacement().charAt(0);
                                    }
                                }
                            } else {
                                if (var2 >= var5) {
                                    if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                        return -1;
                                    }

                                    var4[var6++] = this.replacement().charAt(0);
                                    return var6;
                                }

                                var10 = var1[var2++];
                                if (isNotContinuation(var10)) {
                                    if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                        return -1;
                                    }

                                    var4[var6++] = this.replacement().charAt(0);
                                    --var2;
                                } else {
                                    var4[var6++] = (char)(var9 << 6 ^ var10 ^ 3968);
                                }
                            }
                        } else {
                            var4[var6++] = (char)var9;
                        }
                    }

                    return var6;
                }
            }
        }

**結論: 字節轉換成字符串需要使用到工具類StringCoding 類的decode方法,此方法會依賴傳入的Charset 編碼類中的內部靜態類StringDecode的decode方法來真正的把字節轉成字符串. Java通過接口的定義很好的把具體的實現轉移到具體的編碼類中, 而String只要面向接口編程就可以了, 這樣也方便擴展不同的編碼 **

同樣的String的getBytes方法也是把主要的工作轉移到具體Charset 編碼類的StringEncode 來完成

hashCode方法

重寫了此方法, 並且值和每個字符有關

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];   //為何舊值要乘以31
        }
        hash = h;
		}
		return h;
}

字符串的拼接concat方法和join靜態方法

concat方法

public String concat(String str) {
    int otherLen = str.length();
    if (otherLen == 0) {
        return this;
    }
    int len = value.length;
    char buf[] = Arrays.copyOf(value, len + otherLen);
    str.getChars(buf, len);
    return new String(buf, true);
}

直接在內存中復制一份新的數組, 在new 一個String對象. 線程安全. 性能較低.
也可以直接是用 + 拼接.

參考 https://blog.csdn.net/youanyyou/article/details/78992978這個鏈接了解到. + 鏈接再編譯成字節碼后還是使用的StringBuiler 來拼接, 而concat 還是使用數組復制加上 new 新對象來拼接, 綜合得出還是使用 + 來拼接吧, 性能更好

join靜態方法

public static String join(CharSequence delimiter, CharSequence... elements) {
    Objects.requireNonNull(delimiter);
    Objects.requireNonNull(elements);
    // Number of elements not likely worth Arrays.stream overhead.
    StringJoiner joiner = new StringJoiner(delimiter);
    for (CharSequence cs: elements) {
        joiner.add(cs);
    }
    return joiner.toString();
}

具體的代碼需要追到StringJoiner類中

public final class StringJoiner {
    private final String prefix;
    private final String delimiter;
    private final String suffix;

    /*
     * StringBuilder value -- at any time, the characters constructed from the
     * prefix, the added element separated by the delimiter, but without the
     * suffix, so that we can more easily add elements without having to jigger
     * the suffix each time.
     */
    private StringBuilder value;
  
  /**
     * Adds a copy of the given {@code CharSequence} value as the next
     * element of the {@code StringJoiner} value. If {@code newElement} is
     * {@code null}, then {@code "null"} is added.
     *
     * @param  newElement The element to add
     * @return a reference to this {@code StringJoiner}
     */
    public StringJoiner add(CharSequence newElement) {
        prepareBuilder().append(newElement);
        return this;
    }

    private StringBuilder prepareBuilder() {
        if (value != null) {
            value.append(delimiter);
        } else {
            value = new StringBuilder().append(prefix);
        }
        return value;
    }

內部發現還是使用StringBuilder來實現, join 完全就是一個為了使用方便的一個工具方法

replace方法

public String replace(char oldChar, char newChar)

使用數組遍歷替換

public String replace(CharSequence target, CharSequence replacement)

使用正則表達式進行替換, 正則的源碼在接下來的文章分析

Format 靜態方法, 可以格式換字符串, 主要用於字符串的國際化,

內部使用了Formatter類, 而Formatter 中也是使用了正則表達式,

toLowerCase方法

public String toLowerCase(Locale locale)

遍歷char 數組, 每個字符使用Character.toLowerCase 來小寫

trim 方法

從前后遍歷空白字符, 判斷空白字符是使用的 char <=' ' 來判斷的(學到一點), 后面在使用substring來截取非空白字符

substring方法

內部使用public String(char value[], int offset, int count) 構造方法來生成新的字符串, 在這個構造方法內部會有數組的賦值

valueOf方法

public static String valueOf(Object obj) {
    return (obj == null) ? "null" : obj.toString();
}
// 內部使用傳入對象的自己的toString方法, 傳入對象如果沒有重載toString方法, 就使用默認的toString方法.

public static String valueOf(char data[]) {
    return new String(data);
}
// 根據傳入的數組來選擇合適的構造方法來生成String對象

public static String valueOf(boolean b) {
    return b ? "true" : "false";
}
// 根據傳入布爾值

static copyValueOf方法

public static String copyValueOf(char data[], int offset, int count) {
        return new String(data, offset, count);
    }
// 靜態工具方法, 默認使用合適構造方法來截取和生成新新的字符串

native intern方法

這個方法涉及到String的內存和常量池, 具體會在其他文章中詳解.

public native String intern();

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Java常用類（二）String類詳解 java中String是對象還是類？詳解java中的String Java中String類的split()方法詳解 JAVA——String類indexOf()和substring()用法詳解 Java API ——String類 Java String類為什么是final的？ Java String 類 C++ string 類詳解 String類的split方法詳解 java.lang.String類