Java的String類詳解


Java的String類

String類是除了Java的基本類型之外用的最多的類, 甚至用的比基本類型還多. 同樣jdk中對Java類也有很多的優化

類的定義

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence{
   /** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;

    /**
     * Class String is special cased within the Serialization Stream Protocol.
     *
     * A String instance is written into an ObjectOutputStream according to
     * <a href="{@docRoot}/../platform/serialization/spec/output.html">
     * Object Serialization Specification, Section 6.2, "Stream Elements"</a>
     */
    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];

    /**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    public String() {
        this.value = "".value;
    }

    /**
     * Initializes a newly created {@code String} object so that it represents
     * the same sequence of characters as the argument; in other words, the
     * newly created string is a copy of the argument string. Unless an
     * explicit copy of {@code original} is needed, use of this constructor is
     * unnecessary since Strings are immutable.
     *
     * @param  original
     *         A {@code String}
     */
    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }
  • Final 標識不允許集成重載. jdk中還多重要類都是final 標識, 防止應用程序繼承重載以影響jdk的安全

  • 繼承Serializable 接口, 可以放心的序列化

  • Comparable 接口, 可以根據自然序排序.

  • CharSequence 字符串的重要接口

  • char數組 value . Final 修飾.

  • hash字段 int, 表示當前的hashCode值, 避免每次重復計算hash值

Comparable 接口的compareTo方法實現

public int compareTo(String anotherString) {
    int len1 = value.length;
    int len2 = anotherString.value.length;
    int lim = Math.min(len1, len2); 
    char v1[] = value;
    char v2[] = anotherString.value;

    int k = 0;
    while (k < lim) {  //也只是循環比較到長度短的那個字符串
        char c1 = v1[k];
        char c2 = v2[k];
        if (c1 != c2) {
            return c1 - c2;
        }
        k++;
    }
    return len1 - len2;  //如果前面的長度字符串都一樣, 則長度長的大
}
  • 從左往右逐個char字符比較大小, 從代碼可以看出 "S" > "ASSSSSSSSSSSSSSS"

  • 也只是循環比較到長度短的那個字符串

  • 如果前面的長度字符串都一樣, 則長度長的大

    構造方法

/**
 * Initializes a newly created {@code String} object so that it represents
 * an empty character sequence.  Note that use of this constructor is
 * unnecessary since Strings are immutable.
 */
public String() {
    this.value = "".value;
}

/**
 * Initializes a newly created {@code String} object so that it represents
 * the same sequence of characters as the argument; in other words, the
 * newly created string is a copy of the argument string. Unless an
 * explicit copy of {@code original} is needed, use of this constructor is
 * unnecessary since Strings are immutable.
 *
 * @param  original
 *         A {@code String}
 */
public String(String original) {
    this.value = original.value;
    this.hash = original.hash;
}
/**
*
*/
 public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
    }

  • 空白構造方法其實是生成 "" 字符串

  • 傳入其他字符串的構造方式其實只是把其他字符串的value 和hash 值的引用復制一份, 不用擔心兩個字符串的value和hash 互相干擾. 因為String類中沒有修改這兩個值的方法, 並且這兩個值是private final修飾的, 已經無法修改了

  • 空白構造方法中沒有設置hash的值, 則使用 hash的默認值 // Default to 0

  • 傳入字節數組的構造方法, 怎么將字節轉成字符串是使用StringCoding.decode(charset, bytes, offset, length);方法

    StringCoding類的修飾符是default 並且里面都是default static 修飾的方法, 很遺憾, 我們無法直接使用其中的方法

StringCoding.decode 方法

static char[] decode(Charset cs, byte[] ba, int off, int len) {
    // (1)We never cache the "external" cs, the only benefit of creating
    // an additional StringDe/Encoder object to wrap it is to share the
    // de/encode() method. These SD/E objects are short-lifed, the young-gen
    // gc should be able to take care of them well. But the best approash
    // is still not to generate them if not really necessary.
    // (2)The defensive copy of the input byte/char[] has a big performance
    // impact, as well as the outgoing result byte/char[]. Need to do the
    // optimization check of (sm==null && classLoader0==null) for both.
    // (3)getClass().getClassLoader0() is expensive
    // (4)There might be a timing gap in isTrusted setting. getClassLoader0()
    // is only chcked (and then isTrusted gets set) when (SM==null). It is
    // possible that the SM==null for now but then SM is NOT null later
    // when safeTrim() is invoked...the "safe" way to do is to redundant
    // check (... && (isTrusted || SM == null || getClassLoader0())) in trim
    // but it then can be argued that the SM is null when the opertaion
    // is started...
    CharsetDecoder cd = cs.newDecoder();
    int en = scale(len, cd.maxCharsPerByte());
    char[] ca = new char[en];
    if (len == 0)
        return ca;
    boolean isTrusted = false;
    if (System.getSecurityManager() != null) {
        if (!(isTrusted = (cs.getClass().getClassLoader0() == null))) {
            ba =  Arrays.copyOfRange(ba, off, off + len);
            off = 0;
        }
    }
    cd.onMalformedInput(CodingErrorAction.REPLACE)
      .onUnmappableCharacter(CodingErrorAction.REPLACE)
      .reset();
    if (cd instanceof ArrayDecoder) {
        int clen = ((ArrayDecoder)cd).decode(ba, off, len, ca);
        return safeTrim(ca, clen, cs, isTrusted);
    } else {
        ByteBuffer bb = ByteBuffer.wrap(ba, off, len);
        CharBuffer cb = CharBuffer.wrap(ca);
        try {
            CoderResult cr = cd.decode(bb, cb, true);
            if (!cr.isUnderflow())
                cr.throwException();
            cr = cd.flush(cb);
            if (!cr.isUnderflow())
                cr.throwException();
        } catch (CharacterCodingException x) {
            // Substitution is always enabled,
            // so this shouldn't happen
            throw new Error(x);
        }
        return safeTrim(ca, cb.position(), cs, isTrusted);
    }
}
  • 真正的byte[] 轉成char[] 是使用CharsetDecoder虛擬類, 而這個類的對象是你傳入的Charset字符編碼類中生成的.

    看下UTF8的CharsetDecoder實現類.

    UTF8的CharsetDecoder 類是內部靜態類, 實現了CharsetDecoder 和ArrayDecoder 接口, 接口中的方法很長,都是字節轉字符的一些換算, 如果要看懂, 需要一些編碼的知識. 追到這里結束

    private static class Decoder extends CharsetDecoder implements ArrayDecoder {
        private Decoder(Charset var1) {
            super(var1, 1.0F, 1.0F);
        }
         // 此處省略無關方法.......
          /**
          * 真正的字節轉字符的方法
          */
          public int decode(byte[] var1, int var2, int var3, char[] var4) {
                int var5 = var2 + var3;
                int var6 = 0;
                int var7 = Math.min(var3, var4.length);
    
                ByteBuffer var8;
                for(var8 = null; var6 < var7 && var1[var2] >= 0; var4[var6++] = (char)var1[var2++]) {
                }
    
                while(true) {
                    while(true) {
                        while(var2 < var5) {
                            byte var9 = var1[var2++];
                            if (var9 < 0) {
                                byte var10;
                                if (var9 >> 5 != -2 || (var9 & 30) == 0) {
                                    byte var11;
                                    if (var9 >> 4 == -2) {
                                        if (var2 + 1 < var5) {
                                            var10 = var1[var2++];
                                            var11 = var1[var2++];
                                            if (isMalformed3(var9, var10, var11)) {
                                                if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                                    return -1;
                                                }
    
                                                var4[var6++] = this.replacement().charAt(0);
                                                var2 -= 3;
                                                var8 = getByteBuffer(var8, var1, var2);
                                                var2 += malformedN(var8, 3).length();
                                            } else {
                                                char var15 = (char)(var9 << 12 ^ var10 << 6 ^ var11 ^ -123008);
                                                if (Character.isSurrogate(var15)) {
                                                    if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                                        return -1;
                                                    }
    
                                                    var4[var6++] = this.replacement().charAt(0);
                                                } else {
                                                    var4[var6++] = var15;
                                                }
                                            }
                                        } else {
                                            if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                                return -1;
                                            }
    
                                            if (var2 >= var5 || !isMalformed3_2(var9, var1[var2])) {
                                                var4[var6++] = this.replacement().charAt(0);
                                                return var6;
                                            }
    
                                            var4[var6++] = this.replacement().charAt(0);
                                        }
                                    } else if (var9 >> 3 != -2) {
                                        if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                            return -1;
                                        }
    
                                        var4[var6++] = this.replacement().charAt(0);
                                    } else if (var2 + 2 < var5) {
                                        var10 = var1[var2++];
                                        var11 = var1[var2++];
                                        byte var12 = var1[var2++];
                                        int var13 = var9 << 18 ^ var10 << 12 ^ var11 << 6 ^ var12 ^ 3678080;
                                        if (!isMalformed4(var10, var11, var12) && Character.isSupplementaryCodePoint(var13)) {
                                            var4[var6++] = Character.highSurrogate(var13);
                                            var4[var6++] = Character.lowSurrogate(var13);
                                        } else {
                                            if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                                return -1;
                                            }
    
                                            var4[var6++] = this.replacement().charAt(0);
                                            var2 -= 4;
                                            var8 = getByteBuffer(var8, var1, var2);
                                            var2 += malformedN(var8, 4).length();
                                        }
                                    } else {
                                        if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                            return -1;
                                        }
    
                                        int var14 = var9 & 255;
                                        if (var14 <= 244 && (var2 >= var5 || !isMalformed4_2(var14, var1[var2] & 255))) {
                                            ++var2;
                                            if (var2 >= var5 || !isMalformed4_3(var1[var2])) {
                                                var4[var6++] = this.replacement().charAt(0);
                                                return var6;
                                            }
    
                                            var4[var6++] = this.replacement().charAt(0);
                                        } else {
                                            var4[var6++] = this.replacement().charAt(0);
                                        }
                                    }
                                } else {
                                    if (var2 >= var5) {
                                        if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                            return -1;
                                        }
    
                                        var4[var6++] = this.replacement().charAt(0);
                                        return var6;
                                    }
    
                                    var10 = var1[var2++];
                                    if (isNotContinuation(var10)) {
                                        if (this.malformedInputAction() != CodingErrorAction.REPLACE) {
                                            return -1;
                                        }
    
                                        var4[var6++] = this.replacement().charAt(0);
                                        --var2;
                                    } else {
                                        var4[var6++] = (char)(var9 << 6 ^ var10 ^ 3968);
                                    }
                                }
                            } else {
                                var4[var6++] = (char)var9;
                            }
                        }
    
                        return var6;
                    }
                }
            }
    

**結論: 字節轉換成字符串需要使用到工具類StringCoding 類的decode方法,此方法會依賴傳入的Charset 編碼類中的內部靜態類StringDecode的decode方法來真正的把字節轉成字符串. Java通過接口的定義很好的把具體的實現轉移到具體的編碼類中, 而String只要面向接口編程就可以了, 這樣也方便擴展不同的編碼 **

同樣的String的getBytes方法也是把主要的工作轉移到具體Charset 編碼類的StringEncode 來完成

hashCode方法

重寫了此方法, 並且值和每個字符有關

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];   //為何舊值要乘以31
        }
        hash = h;
		}
		return h;
}

字符串的拼接concat方法和join靜態方法

concat方法

public String concat(String str) {
    int otherLen = str.length();
    if (otherLen == 0) {
        return this;
    }
    int len = value.length;
    char buf[] = Arrays.copyOf(value, len + otherLen);
    str.getChars(buf, len);
    return new String(buf, true);
}
  • 直接在內存中復制一份新的數組, 在new 一個String對象. 線程安全. 性能較低.

  • 也可以直接是用 + 拼接.

    參考 https://blog.csdn.net/youanyyou/article/details/78992978這個鏈接了解到. + 鏈接再編譯成字節碼后還是使用的StringBuiler 來拼接, 而concat 還是使用數組復制加上 new 新對象來拼接, 綜合得出 還是使用 + 來拼接吧, 性能更好

join靜態方法

public static String join(CharSequence delimiter, CharSequence... elements) {
    Objects.requireNonNull(delimiter);
    Objects.requireNonNull(elements);
    // Number of elements not likely worth Arrays.stream overhead.
    StringJoiner joiner = new StringJoiner(delimiter);
    for (CharSequence cs: elements) {
        joiner.add(cs);
    }
    return joiner.toString();
}

具體的代碼需要追到StringJoiner類中

public final class StringJoiner {
    private final String prefix;
    private final String delimiter;
    private final String suffix;

    /*
     * StringBuilder value -- at any time, the characters constructed from the
     * prefix, the added element separated by the delimiter, but without the
     * suffix, so that we can more easily add elements without having to jigger
     * the suffix each time.
     */
    private StringBuilder value;
  
  /**
     * Adds a copy of the given {@code CharSequence} value as the next
     * element of the {@code StringJoiner} value. If {@code newElement} is
     * {@code null}, then {@code "null"} is added.
     *
     * @param  newElement The element to add
     * @return a reference to this {@code StringJoiner}
     */
    public StringJoiner add(CharSequence newElement) {
        prepareBuilder().append(newElement);
        return this;
    }

    private StringBuilder prepareBuilder() {
        if (value != null) {
            value.append(delimiter);
        } else {
            value = new StringBuilder().append(prefix);
        }
        return value;
    }


  • 內部發現還是使用StringBuilder來實現, join 完全就是一個為了使用方便的一個工具方法

replace方法

public String replace(char oldChar, char newChar) 

  • 使用數組遍歷替換
public String replace(CharSequence target, CharSequence replacement)

  • 使用正則表達式進行替換, 正則的源碼在 接下來的文章分析

Format 靜態方法, 可以格式換字符串, 主要用於字符串的國際化,

內部使用了Formatter類, 而Formatter 中也是使用了正則表達式,

toLowerCase方法

public String toLowerCase(Locale locale) 

  • 遍歷char 數組, 每個字符使用Character.toLowerCase 來小寫

trim 方法

從前后遍歷空白字符, 判斷空白字符是使用的 char <=' ' 來判斷的(學到一點), 后面在使用substring來截取非空白字符

substring方法

內部使用public String(char value[], int offset, int count) 構造方法來生成新的字符串, 在這個構造方法內部會有數組的賦值

valueOf方法

public static String valueOf(Object obj) {
    return (obj == null) ? "null" : obj.toString();
}
// 內部使用傳入對象的自己的toString方法, 傳入對象如果沒有重載toString方法, 就使用默認的toString方法. 

public static String valueOf(char data[]) {
    return new String(data);
}
// 根據傳入的數組來選擇合適的構造方法來生成String對象


public static String valueOf(boolean b) {
    return b ? "true" : "false";
}
// 根據傳入布爾值

static copyValueOf方法

public static String copyValueOf(char data[], int offset, int count) {
        return new String(data, offset, count);
    }
// 靜態工具方法, 默認使用合適構造方法來截取和生成新新的字符串

native intern方法

這個方法涉及到String的內存和常量池, 具體會在其他文章中詳解.

public native String intern();


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM