Java的String類
String類是除了Java的基本類型之外用的最多的類, 甚至用的比基本類型還多. 同樣jdk中對Java類也有很多的優化
類的定義
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence{
/** The value is used for character storage. */
private final char value[];
/** Cache the hash code for the string */
private int hash; // Default to 0
/** use serialVersionUID from JDK 1.0.2 for interoperability */
private static final long serialVersionUID = -6849794470754667710L;
/**
* Class String is special cased within the Serialization Stream Protocol.
*
* A String instance is written into an ObjectOutputStream according to
* <a href="{@docRoot}/../platform/serialization/spec/output.html">
* Object Serialization Specification, Section 6.2, "Stream Elements"</a>
*/
private static final ObjectStreamField[] serialPersistentFields =
new ObjectStreamField[0];
/**
* Initializes a newly created {@code String} object so that it represents
* an empty character sequence. Note that use of this constructor is
* unnecessary since Strings are immutable.
*/
public String() {
this.value = "".value;
}
/**
* Initializes a newly created {@code String} object so that it represents
* the same sequence of characters as the argument; in other words, the
* newly created string is a copy of the argument string. Unless an
* explicit copy of {@code original} is needed, use of this constructor is
* unnecessary since Strings are immutable.
*
* @param original
* A {@code String}
*/
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}
-
Final 標識不允許集成重載. jdk中還多重要類都是final 標識, 防止應用程序繼承重載以影響jdk的安全
-
繼承Serializable 接口, 可以放心的序列化
-
Comparable 接口, 可以根據自然序排序.
-
CharSequence 字符串的重要接口
-
char數組 value . Final 修飾.
-
hash字段 int, 表示當前的hashCode值, 避免每次重復計算hash值
Comparable 接口的compareTo方法實現
public int compareTo(String anotherString) {
int len1 = value.length;
int len2 = anotherString.value.length;
int lim = Math.min(len1, len2);
char v1[] = value;
char v2[] = anotherString.value;
int k = 0;
while (k < lim) { //也只是循環比較到長度短的那個字符串
char c1 = v1[k];
char c2 = v2[k];
if (c1 != c2) {
return c1 - c2;
}
k++;
}
return len1 - len2; //如果前面的長度字符串都一樣, 則長度長的大
}
-
從左往右逐個char字符比較大小, 從代碼可以看出 "S" > "ASSSSSSSSSSSSSSS"
-
也只是循環比較到長度短的那個字符串
-
如果前面的長度字符串都一樣, 則長度長的大
構造方法
/**
* Initializes a newly created {@code String} object so that it represents
* an empty character sequence. Note that use of this constructor is
* unnecessary since Strings are immutable.
*/
public String() {
this.value = "".value;
}
/**
* Initializes a newly created {@code String} object so that it represents
* the same sequence of characters as the argument; in other words, the
* newly created string is a copy of the argument string. Unless an
* explicit copy of {@code original} is needed, use of this constructor is
* unnecessary since Strings are immutable.
*
* @param original
* A {@code String}
*/
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}
/**
*
*/
public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charset, bytes, offset, length);
}
-
空白構造方法其實是生成 "" 字符串
-
傳入其他字符串的構造方式其實只是把其他字符串的value 和hash 值的引用復制一份, 不用擔心兩個字符串的value和hash 互相干擾. 因為String類中沒有修改這兩個值的方法, 並且這兩個值是private final修飾的, 已經無法修改了
-
空白構造方法中沒有設置hash的值, 則使用 hash的默認值 // Default to 0
-
傳入字節數組的構造方法, 怎么將字節轉成字符串是使用
StringCoding.decode(charset, bytes, offset, length);
方法StringCoding類的修飾符是default 並且里面都是default static 修飾的方法, 很遺憾, 我們無法直接使用其中的方法
StringCoding.decode 方法
static char[] decode(Charset cs, byte[] ba, int off, int len) {
// (1)We never cache the "external" cs, the only benefit of creating
// an additional StringDe/Encoder object to wrap it is to share the
// de/encode() method. These SD/E objects are short-lifed, the young-gen
// gc should be able to take care of them well. But the best approash
// is still not to generate them if not really necessary.
// (2)The defensive copy of the input byte/char[] has a big performance
// impact, as well as the outgoing result byte/char[]. Need to do the
// optimization check of (sm==null && classLoader0==null) for both.
// (3)getClass().getClassLoader0() is expensive
// (4)There might be a timing gap in isTrusted setting. getClassLoader0()
// is only chcked (and then isTrusted gets set) when (SM==null). It is
// possible that the SM==null for now but then SM is NOT null later
// when safeTrim() is invoked...the "safe" way to do is to redundant
// check (... && (isTrusted || SM == null || getClassLoader0())) in trim
// but it then can be argued that the SM is null when the opertaion
// is started...
CharsetDecoder cd = cs.newDecoder();
int en = scale(len, cd.maxCharsPerByte());
char[] ca = new char[en];
if (len == 0)
return ca;
boolean isTrusted = false;
if (System.getSecurityManager() != null) {
if (!(isTrusted = (cs.getClass().getClassLoader0() == null))) {
ba = Arrays.copyOfRange(ba, off, off + len);
off = 0;
}
}
cd.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE)
.reset();
if (cd instanceof ArrayDecoder) {
int clen = ((ArrayDecoder)cd).decode(ba, off, len, ca);
return safeTrim(ca, clen, cs, isTrusted);
} else {
ByteBuffer bb = ByteBuffer.wrap(ba, off, len);
CharBuffer cb = CharBuffer.wrap(ca);
try {
CoderResult cr = cd.decode(bb, cb, true);
if (!cr.isUnderflow())
cr.throwException();
cr = cd.flush(cb);
if (!cr.isUnderflow())
cr.throwException();
} catch (CharacterCodingException x) {
// Substitution is always enabled,
// so this shouldn't happen
throw new Error(x);
}
return safeTrim(ca, cb.position(), cs, isTrusted);
}
}
-
真正的byte[] 轉成char[] 是使用CharsetDecoder虛擬類, 而這個類的對象是你傳入的Charset字符編碼類中生成的.
看下UTF8的CharsetDecoder實現類.
UTF8的CharsetDecoder 類是內部靜態類, 實現了CharsetDecoder 和ArrayDecoder 接口, 接口中的方法很長,都是字節轉字符的一些換算, 如果要看懂, 需要一些編碼的知識. 追到這里結束
private static class Decoder extends CharsetDecoder implements ArrayDecoder { private Decoder(Charset var1) { super(var1, 1.0F, 1.0F); } // 此處省略無關方法....... /** * 真正的字節轉字符的方法 */ public int decode(byte[] var1, int var2, int var3, char[] var4) { int var5 = var2 + var3; int var6 = 0; int var7 = Math.min(var3, var4.length); ByteBuffer var8; for(var8 = null; var6 < var7 && var1[var2] >= 0; var4[var6++] = (char)var1[var2++]) { } while(true) { while(true) { while(var2 < var5) { byte var9 = var1[var2++]; if (var9 < 0) { byte var10; if (var9 >> 5 != -2 || (var9 & 30) == 0) { byte var11; if (var9 >> 4 == -2) { if (var2 + 1 < var5) { var10 = var1[var2++]; var11 = var1[var2++]; if (isMalformed3(var9, var10, var11)) { if (this.malformedInputAction() != CodingErrorAction.REPLACE) { return -1; } var4[var6++] = this.replacement().charAt(0); var2 -= 3; var8 = getByteBuffer(var8, var1, var2); var2 += malformedN(var8, 3).length(); } else { char var15 = (char)(var9 << 12 ^ var10 << 6 ^ var11 ^ -123008); if (Character.isSurrogate(var15)) { if (this.malformedInputAction() != CodingErrorAction.REPLACE) { return -1; } var4[var6++] = this.replacement().charAt(0); } else { var4[var6++] = var15; } } } else { if (this.malformedInputAction() != CodingErrorAction.REPLACE) { return -1; } if (var2 >= var5 || !isMalformed3_2(var9, var1[var2])) { var4[var6++] = this.replacement().charAt(0); return var6; } var4[var6++] = this.replacement().charAt(0); } } else if (var9 >> 3 != -2) { if (this.malformedInputAction() != CodingErrorAction.REPLACE) { return -1; } var4[var6++] = this.replacement().charAt(0); } else if (var2 + 2 < var5) { var10 = var1[var2++]; var11 = var1[var2++]; byte var12 = var1[var2++]; int var13 = var9 << 18 ^ var10 << 12 ^ var11 << 6 ^ var12 ^ 3678080; if (!isMalformed4(var10, var11, var12) && Character.isSupplementaryCodePoint(var13)) { var4[var6++] = Character.highSurrogate(var13); var4[var6++] = Character.lowSurrogate(var13); } else { if (this.malformedInputAction() != CodingErrorAction.REPLACE) { return -1; } var4[var6++] = this.replacement().charAt(0); var2 -= 4; var8 = getByteBuffer(var8, var1, var2); var2 += malformedN(var8, 4).length(); } } else { if (this.malformedInputAction() != CodingErrorAction.REPLACE) { return -1; } int var14 = var9 & 255; if (var14 <= 244 && (var2 >= var5 || !isMalformed4_2(var14, var1[var2] & 255))) { ++var2; if (var2 >= var5 || !isMalformed4_3(var1[var2])) { var4[var6++] = this.replacement().charAt(0); return var6; } var4[var6++] = this.replacement().charAt(0); } else { var4[var6++] = this.replacement().charAt(0); } } } else { if (var2 >= var5) { if (this.malformedInputAction() != CodingErrorAction.REPLACE) { return -1; } var4[var6++] = this.replacement().charAt(0); return var6; } var10 = var1[var2++]; if (isNotContinuation(var10)) { if (this.malformedInputAction() != CodingErrorAction.REPLACE) { return -1; } var4[var6++] = this.replacement().charAt(0); --var2; } else { var4[var6++] = (char)(var9 << 6 ^ var10 ^ 3968); } } } else { var4[var6++] = (char)var9; } } return var6; } } }
**結論: 字節轉換成字符串需要使用到工具類StringCoding 類的decode方法,此方法會依賴傳入的Charset 編碼類中的內部靜態類StringDecode的decode方法來真正的把字節轉成字符串. Java通過接口的定義很好的把具體的實現轉移到具體的編碼類中, 而String只要面向接口編程就可以了, 這樣也方便擴展不同的編碼 **
同樣的String的getBytes方法也是把主要的工作轉移到具體Charset 編碼類的StringEncode 來完成
hashCode方法
重寫了此方法, 並且值和每個字符有關
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i]; //為何舊值要乘以31
}
hash = h;
}
return h;
}
字符串的拼接concat方法和join靜態方法
concat方法
public String concat(String str) {
int otherLen = str.length();
if (otherLen == 0) {
return this;
}
int len = value.length;
char buf[] = Arrays.copyOf(value, len + otherLen);
str.getChars(buf, len);
return new String(buf, true);
}
-
直接在內存中復制一份新的數組, 在new 一個String對象. 線程安全. 性能較低.
-
也可以直接是用 + 拼接.
參考 https://blog.csdn.net/youanyyou/article/details/78992978這個鏈接了解到. + 鏈接再編譯成字節碼后還是使用的StringBuiler 來拼接, 而concat 還是使用數組復制加上 new 新對象來拼接, 綜合得出 還是使用 + 來拼接吧, 性能更好
join靜態方法
public static String join(CharSequence delimiter, CharSequence... elements) {
Objects.requireNonNull(delimiter);
Objects.requireNonNull(elements);
// Number of elements not likely worth Arrays.stream overhead.
StringJoiner joiner = new StringJoiner(delimiter);
for (CharSequence cs: elements) {
joiner.add(cs);
}
return joiner.toString();
}
具體的代碼需要追到StringJoiner類中
public final class StringJoiner {
private final String prefix;
private final String delimiter;
private final String suffix;
/*
* StringBuilder value -- at any time, the characters constructed from the
* prefix, the added element separated by the delimiter, but without the
* suffix, so that we can more easily add elements without having to jigger
* the suffix each time.
*/
private StringBuilder value;
/**
* Adds a copy of the given {@code CharSequence} value as the next
* element of the {@code StringJoiner} value. If {@code newElement} is
* {@code null}, then {@code "null"} is added.
*
* @param newElement The element to add
* @return a reference to this {@code StringJoiner}
*/
public StringJoiner add(CharSequence newElement) {
prepareBuilder().append(newElement);
return this;
}
private StringBuilder prepareBuilder() {
if (value != null) {
value.append(delimiter);
} else {
value = new StringBuilder().append(prefix);
}
return value;
}
- 內部發現還是使用StringBuilder來實現, join 完全就是一個為了使用方便的一個工具方法
replace方法
public String replace(char oldChar, char newChar)
- 使用數組遍歷替換
public String replace(CharSequence target, CharSequence replacement)
- 使用正則表達式進行替換, 正則的源碼在 接下來的文章分析
Format 靜態方法, 可以格式換字符串, 主要用於字符串的國際化,
內部使用了Formatter類, 而Formatter 中也是使用了正則表達式,
toLowerCase方法
public String toLowerCase(Locale locale)
- 遍歷char 數組, 每個字符使用Character.toLowerCase 來小寫
trim 方法
從前后遍歷空白字符, 判斷空白字符是使用的 char <=' '
來判斷的(學到一點), 后面在使用substring來截取非空白字符
substring方法
內部使用public String(char value[], int offset, int count)
構造方法來生成新的字符串, 在這個構造方法內部會有數組的賦值
valueOf方法
public static String valueOf(Object obj) {
return (obj == null) ? "null" : obj.toString();
}
// 內部使用傳入對象的自己的toString方法, 傳入對象如果沒有重載toString方法, 就使用默認的toString方法.
public static String valueOf(char data[]) {
return new String(data);
}
// 根據傳入的數組來選擇合適的構造方法來生成String對象
public static String valueOf(boolean b) {
return b ? "true" : "false";
}
// 根據傳入布爾值
static copyValueOf方法
public static String copyValueOf(char data[], int offset, int count) {
return new String(data, offset, count);
}
// 靜態工具方法, 默認使用合適構造方法來截取和生成新新的字符串
native intern方法
這個方法涉及到String的內存和常量池, 具體會在其他文章中詳解.
public native String intern();