☕【Java技術指南】「序列化系列」深入挖掘FST快速序列化壓縮內存的利器的特性和原理

本文轉載自查看原文 2021-11-04 12:43 149 【技術專區-Java】

FST的概念和定義

FST序列化全稱是Fast Serialization Tool，它是對Java序列化的替換實現。既然前文中提到Java序列化的兩點嚴重不足，在FST中得到了較大的改善，FST的特征如下：

JDK提供的序列化提升了10倍，體積也減少3-4倍多
支持堆外Maps，和堆外Maps的持久化
支持序列化為JSON

FST序列化的使用

FST的使用有兩種方式，一種是快捷方式，另一種需要使用ObjectOutput和ObjectInput。

直接使用FSTConfiguration提供的序列化和反序列化接口

public static void serialSample() {
    FSTConfiguration conf = FSTConfiguration.createAndroidDefaultConfiguration();
    User object = new User();
    object.setName("huaijin");
    object.setAge(30);
    System.out.println("serialization, " + object);
    byte[] bytes = conf.asByteArray(object);
    User newObject = (User) conf.asObject(bytes);
    System.out.println("deSerialization, " + newObject);
}

FSTConfiguration也提供了注冊對象的Class接口，如果不注冊，默認會將對象的Class Name寫入。這個提供了易用高效的API方式，不使用ByteArrayOutputStreams而直接得到byte[]。

使用ObjectOutput和ObjectInput，能更細膩控制序列化的寫入寫出：

static FSTConfiguration conf = FSTConfiguration.createAndroidDefaultConfiguration();
static void writeObject(OutputStream outputStream, User user) throws IOException {
    FSTObjectOutput out = conf.getObjectOutput(outputStream);
    out.writeObject(user);
    out.close();
}

static FstObject readObject(InputStream inputStream) throws Exception {
    FSTObjectInput input = conf.getObjectInput(inputStream);
    User fstObject = (User) input.readObject(User.class);
    input.close();
    return fstObject;
}

FST在Dubbo中的應用

Dubbo中對FstObjectInput和FstObjectOutput重新包裝解決了序列化和反序列化空指針的問題。
並且構造了FstFactory工廠類，使用工廠模式生成FstObjectInput和FstObjectOutput。其中同時使用單例模式，控制整個應用中FstConfiguration是單例，並且在初始化時將需要序列化的對象全部注冊到FstConfiguration。
對外提供了同一的序列化接口FstSerialization，提供serialize和deserialize能力。

FST序列化/反序列化

FST序列化存儲格式

基本上所有以Byte形式存儲的序列化對象都是類似的存儲結構，不管class文件、so文件、dex文件都是類似，這方面沒有什么創新的格式，最多是在字段內容上做了一些壓縮優化，包括我們最常使用的utf-8編碼都是這個做法。

FST的序列化存儲和一般的字節格式化存儲方案也沒有標新立異的地方，比如下面這個FTS的序列化字節文件

00000001:  0001 0f63 6f6d 2e66 7374 2e46 5354 4265
00000010:  616e f701 fc05 7630 7374 7200

格式：

Header|類名長度|類名String|字段1類型(1Byte) | [長度] | 內容|字段2類型(1Byte) | [長度] | 內容|…

0000：字節數組類型：00標識OBJECT
0001：類名編碼，00標識UTF編碼，01表示ASCII編碼
0002：Length of class name (1Byte) = 15
0003~0011：Class name string (15Byte)
0012：Integer類型標識 0xf7
0013：Integer的值=1
0014：String類型標識 0xfc
0015：String的長度=5
0016~001a：String的值"v0str"
001b~001c：END

從上面可以看到Integer類型序列化后只占用了一個字節（值等於1），並不像在內存中占用4Byte，所以可以看出是根據一定規則做了壓縮，具體代碼看FSTObjectInput#instantiateSpecialTag中對不同類型的讀取，FSTObjectInput也定義不同類型對應的枚舉值：

public class FSTObjectOutput implements ObjectOutput {
    private static final FSTLogger LOGGER = FSTLogger.getLogger(FSTObjectOutput.class);
    public static Object NULL_PLACEHOLDER = new Object() { 
    public String toString() { return "NULL_PLACEHOLDER"; }};
    public static final byte SPECIAL_COMPATIBILITY_OBJECT_TAG = -19; // see issue 52
    public static final byte ONE_OF = -18;
    public static final byte BIG_BOOLEAN_FALSE = -17;
    public static final byte BIG_BOOLEAN_TRUE = -16;
    public static final byte BIG_LONG = -10;
    public static final byte BIG_INT = -9;
    public static final byte DIRECT_ARRAY_OBJECT = -8;
    public static final byte HANDLE = -7;
    public static final byte ENUM = -6;
    public static final byte ARRAY = -5;
    public static final byte STRING = -4;
    public static final byte TYPED = -3; // var class == object written class
    public static final byte DIRECT_OBJECT = -2;
    public static final byte NULL = -1;
    public static final byte OBJECT = 0;
    protected FSTEncoder codec;
    ...
}

FST序列化和反序列化原理

對Object進行Byte序列化，相當於做了持久化的存儲，在反序列的時候，如果Bean的定義發生了改變，那么反序列化器就要做兼容的解決方案，我們知道對於JDK的序列化和反序列，serialVersionUID對版本控制起了很重要的作用。FST對這個問題的解決方案是通過@Version注解進行排序。

在進行反序列操作的時候，FST會先反射或者對象Class的所有成員，並對這些成員進行了排序，這個排序對兼容起了關鍵作用，也就是@Version的原理。在FSTClazzInfo中定義了一個defFieldComparator比較器，用於對Bean的所有Field進行排序：

public final class FSTClazzInfo {
    public static final Comparator<FSTFieldInfo> defFieldComparator = new Comparator<FSTFieldInfo>() {
        @Override
        public int compare(FSTFieldInfo o1, FSTFieldInfo o2) {
            int res = 0;

            if ( o1.getVersion() != o2.getVersion() ) {
                return o1.getVersion() < o2.getVersion() ? -1 : 1;
            }

            // order: version, boolean, primitives, conditionals, object references
            if (o1.getType() == boolean.class && o2.getType() != boolean.class) {
                return -1;
            }
            if (o1.getType() != boolean.class && o2.getType() == boolean.class) {
                return 1;
            }

            if (o1.isConditional() && !o2.isConditional()) {
                res = 1;
            } else if (!o1.isConditional() && o2.isConditional()) {
                res = -1;
            } else if (o1.isPrimitive() && !o2.isPrimitive()) {
                res = -1;
            } else if (!o1.isPrimitive() && o2.isPrimitive())
                res = 1;
//                if (res == 0) // 64 bit / 32 bit issues
//                    res = (int) (o1.getMemOffset() - o2.getMemOffset());
            if (res == 0)
                res = o1.getType().getSimpleName().compareTo(o2.getType().getSimpleName());
            if (res == 0)
                res = o1.getName().compareTo(o2.getName());
            if (res == 0) {
                return o1.getField().getDeclaringClass().getName().compareTo(o2.getField().getDeclaringClass().getName());
            }
            return res;
        }
    };
    ...
}

從代碼實現上可以看到，比較的優先級是Field的Version大小，然后是Field類型，所以總的來說Version越大排序越靠后，至於為什么要排序，看下FSTObjectInput#instantiateAndReadNoSer方法

public class FSTObjectInput implements ObjectInput {
	protected Object instantiateAndReadNoSer(Class c, FSTClazzInfo clzSerInfo, FSTClazzInfo.FSTFieldInfo referencee, int readPos) throws Exception {
        Object newObj;
        newObj = clzSerInfo.newInstance(getCodec().isMapBased());
        ...
        } else {
            FSTClazzInfo.FSTFieldInfo[] fieldInfo = clzSerInfo.getFieldInfo();
            readObjectFields(referencee, clzSerInfo, fieldInfo, newObj,0,0);
        }
        return newObj;
    }

    protected void readObjectFields(FSTClazzInfo.FSTFieldInfo referencee, FSTClazzInfo serializationInfo, FSTClazzInfo.FSTFieldInfo[] fieldInfo, Object newObj, int startIndex, int version) throws Exception {
        
        if ( getCodec().isMapBased() ) {
            readFieldsMapBased(referencee, serializationInfo, newObj);
            if ( version >= 0 && newObj instanceof Unknown == false)
                getCodec().readObjectEnd();
            return;
        }
        if ( version < 0 )
            version = 0;
        int booleanMask = 0;
        int boolcount = 8;
        final int length = fieldInfo.length;
        int conditional = 0;
        for (int i = startIndex; i < length; i++) {	// 注意這里的循環
            try {
                FSTClazzInfo.FSTFieldInfo subInfo = fieldInfo[i];
                if (subInfo.getVersion() > version ) {	 // 需要進入下一個版本的迭代
                    int nextVersion = getCodec().readVersionTag();	// 對象流的下一個版本
                    if ( nextVersion == 0 ) // old object read
                    {
                        oldVersionRead(newObj);
                        return;
                    }
                    if ( nextVersion != subInfo.getVersion() ) {	// 同一個Field的版本不允許變，並且版本變更和流的版本保持同步
                        throw new RuntimeException("read version tag "+nextVersion+" fieldInfo has "+subInfo.getVersion());
                    }
					readObjectFields(referencee,serializationInfo,fieldInfo,newObj,i,nextVersion);	// 開始下一個Version的遞歸
                    return;
                }
                if (subInfo.isPrimitive()) {
                	...
                } else {
                    if ( subInfo.isConditional() ) {
                    	...
                    }
                   	// object 把讀出來的值保存到FSTFieldInfo中
                    Object subObject = readObjectWithHeader(subInfo);
                    subInfo.setObjectValue(newObj, subObject);
				}
				...

從這段代碼的邏輯基本就可以知道FST的序列化和反序列化兼容的原理了，注意里面的循環，正是按照排序后的Filed進行循環，而每個FSTFieldInfo都記錄自己在對象流中的位置、類型等詳細信息：

序列化：

按照Version對Bean的所有Field進行排序（不包括static和transient修飾的member），沒有@Version注解的Field默認version=0；如果version相同，按照version, boolean, primitives, conditionals, object references排序
按照排序的Field把Bean的Field逐個寫到輸出流
@Version的版本只能加不能減小，如果相等的話，有可能因為默認的排序規則，導致流中的Filed順序和內存中的FSTFieldInfo[]數組的順序不一致，而注入錯誤

反序列化：

反序列化按照對象流的格式進行解析，對象流中保存的Field順序和內存中的FSTFieldInfo順序保持一致
相同版本的Field在對象流中存在，在內存Bean中缺失：可能拋異常（會有后向兼容問題）
對象流中包含內存Bean中沒有的高版本Field：正常（老版本兼容新）
相同版本的Field在對象流中缺失，在內存Bean中存在：拋出異常
相同的Field在對象流和內存Bean中的版本不一致：拋出異常
內存Bean增加了不高於最大版本的Field：拋出異常

所以從上面的代碼邏輯就可以分析出這個使用規則：@Version的使用原則就是，每新增一個Field，就對應的加上@Version注解，並且把version的值設置為當前版本的最大值加一，不允許刪除Field

另外再看一下@Version注解的注釋：明確說明了用於后向兼容

package org.nustaq.serialization.annotations;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.FIELD})

/**
 * support for adding fields without breaking compatibility to old streams.
 * For each release of your app increment the version value. No Version annotation means version=0.
 * Note that each added field needs to be annotated.
 *
 * e.g.
 *
 * class MyClass implements Serializable {
 *
 *     // fields on initial release 1.0
 *     int x;
 *     String y;
 *
 *     // fields added with release 1.5
 *     @Version(1) String added;
 *     @Version(1) String alsoAdded;
 *
 *     // fields added with release 2.0
 *     @Version(2) String addedv2;
 *     @Version(2) String alsoAddedv2;
 *
 * }
 *
 * If an old class is read, new fields will be set to default values. You can register a VersionConflictListener
 * at FSTObjectInput in order to fill in defaults for new fields.
 *
 * Notes/Limits:
 * - Removing fields will break backward compatibility. You can only Add new fields.
 * - Can slow down serialization over time (if many versions)
 * - does not work for Externalizable or Classes which make use of JDK-special features such as readObject/writeObject
 *   (AKA does not work if fst has to fall back to 'compatible mode' for an object).
 * - in case you use custom serializers, your custom serializer has to handle versioning
 *
 */
public @interface Version {
    byte value();
}

public class FSTBean implements Serializable {
    /** serialVersionUID */
    private static final long serialVersionUID = -2708653783151699375L;
    private Integer v0int
    private String v0str;
}

准備序列化和反序列化方法

public class FSTSerial {

    private static void serialize(FstSerializer fst, String fileName) {
        try {
            FSTBean fstBean = new FSTBean();
            fstBean.setV0int(1);
            fstBean.setV0str("v0str");
            byte[] v1 = fst.serialize(fstBean);

            FileOutputStream fos = new FileOutputStream(new File("byte.bin"));
            fos.write(v1, 0, v1.length);
            fos.close();

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private static void deserilize(FstSerializer fst, String fileName) {
        try {
            FileInputStream fis = new FileInputStream(new File("byte.bin"));
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            byte[] buf = new byte[256];
            int length = 0;
            while ((length = fis.read(buf)) > 0) {
                baos.write(buf, 0, length);
            }
            fis.close();
            buf = baos.toByteArray();
            FSTBean deserial = fst.deserialize(buf, FSTBean.class);
            System.out.println(deserial);
            System.out.println(deserial);

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        FstSerializer fst = new FstSerializer();
        serialize(fst, "byte.bin");
        deserilize(fst, "byte.bin");
    }
}

參考資料

https://github.com/RuedigerMoeller/fast-serialization

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 序列化 — FST序列化在Dubbo中使用高效的Java序列化（Kryo和FST） Dubbo中使用高效的Java序列化（Kryo和FST）深入理解JAVA序列化 java序列化框架（protobuf、thrift、kryo、fst、fastjson、Jackson、gson、hessian）性能對比 Java序列化技術與Protobuff Java反序列化漏洞的挖掘、攻擊與防御深入理解JAVA I/O系列五：對象序列化 FSTConfiguration 高性能序列化框架FST Java序列化機制原理