[.NET]使用十年股價對比各種序列化技術


1. 前言

上一家公司有搞股票,當時很任性地直接從服務器讀取一個股票10年份的股價(還有各種指標)在客戶端的圖表上顯示,而且因為是桌面客戶端,傳輸的數據也是簡單粗暴地使用Soap序列化。獲取報價的接口大概如下,通過symbol、beginDate和endDate三個參數獲取股票某個時間段的股價:

public IEnumerable<StockPrice> LoadStockPrices(string symbol,DateTime beginDate,DateTime endDate)
{
    //some code
}

后來用Xamarin.Forms做了移動客戶端,在手機上就不敢這么任性了,移動端不僅對流量比較敏感,而且顯示這么多數據也不現實,於是限制為不可以獲取這么長時間的股價,選擇一種新的序列化方式也被提上了日程。不過當時我也快離職了所以沒關心這件事。
上周看到這篇問文章:【開源】C#.NET股票歷史數據采集,【附18年歷史數據和源代碼】,一時興起就試試用各種常用的序列化技術實現以前的需求。

2. 數據結構

[Serializable]
[ProtoContract]
[DataContract]
public class StockPrice
{
	[ProtoMember(1)]
	[DataMember]
	public double ClosePrice { get; set; }

	[ProtoMember(2)]
	[DataMember]
	public DateTime Date { get; set; }

	[ProtoMember(3)]
	[DataMember]
	public double HighPrice { get; set; }

	[ProtoMember(4)]
	[DataMember]
	public double LowPrice { get; set; }

	[ProtoMember(5)]
	[DataMember]
	public double OpenPrice { get; set; }

	[ProtoMember(6)]
	[DataMember]
	public double PrvClosePrice { get; set; }

	[ProtoMember(7)]
	[DataMember]
	public string Symbol { get; set; }

	[ProtoMember(8)]
	[DataMember]
	public double Turnover { get; set; }

	[ProtoMember(9)]
	[DataMember]
	public double Volume { get; set; }
}

上面是股價的數據結構,包含股票代號、日期、OHLC、前收市價(PreClosePice),成交額(Turnover)和成交量(Volume),這里我已經把序列化要用到的Attribute加上了。

測試數據使用長和(00001)2003年開始10年的股價,共2717條數據。為了方便測試已經把它們從數據庫導出到文本文檔。其實大小也就200K而已。

3. 各種序列化技術

在.NET中要執行序列化有很多可以考慮的東西,如網絡傳輸、安全性、.NET Remoting的遠程對象等內容。但這里單純只考慮序列化本身。

3.1 二進制序列化

二進制序列化將對象的公共字段和私有字段以及類(包括含有該類的程序集)的名稱都轉換成字節流,對該對象進行反序列化時,將創建原始對象的准確克隆。除了.NET可序列化的類型,其它類型要想序列化,最簡單的方法是使用 SerializableAttribute 對其進行標記。

.NET中使用BinaryFormatter實現二進制序列化,代碼如下:

public override byte[] Serialize(List<StockPrice> instance)
{
	using (var stream = new MemoryStream())
	{
		IFormatter formatter = new BinaryFormatter();
		formatter.Serialize(stream, instance);
		return stream.ToArray();
	}
}


public override List<StockPrice> Deserialize(byte[] source)
{
	using (var stream = new MemoryStream(source))
	{
		IFormatter formatter = new BinaryFormatter();
		var target = formatter.Deserialize(stream);
		return target as List<StockPrice>;
	}
}


結果:

Name Serialize(ms) Deserialize(ms) Bytes
BinarySerializer 117 12 242,460

3.2 XML

XML序列化將對象的公共字段和屬性或者方法的參數及返回值轉換(序列化)為符合特定 XML架構定義語言 (XSD) 文檔的 XML 流。由於 XML 是開放式的標准,因此可以根據需要由任何應用程序處理 XML流,而與平台無關。

.NET中執行Xml序列化可以使用XmlSerializer:

public override byte[] Serialize(List<StockPrice> instance)
{
    using (var stream = new MemoryStream())
    {
        var serializer = new System.Xml.Serialization.XmlSerializer(typeof(List<StockPrice>));
        serializer.Serialize(stream, instance);
        return stream.ToArray();
    }
}

public override List<StockPrice> Deserialize(byte[] source)
{
	using (var stream = new MemoryStream(source))
	{
		var serializer = new System.Xml.Serialization.XmlSerializer(typeof(List<StockPrice>));
		var target = serializer.Deserialize(stream);
		return target as List<StockPrice>;
	}
}

結果如下,因為XML格式為了有較好的可讀性引入了一些冗余的文本信息,所以體積膨脹了不少:

Name Serialize(ms) Deserialize(ms) Bytes
XmlSerializer 133 26 922,900

3.3 SOAP

XML 序列化還可用於將對象序列化為符合 SOAP 規范的 XML 流。 SOAP 是一種基於 XML 的協議,它是專門為使用 XML 來傳輸過程調用而設計的,熟悉WCF的應該不會對SOAP感到陌生。

.NET中使用SoapFormatter實現序列化,代碼如下:

public override byte[] Serialize(List<StockPrice> instance)
{
    using (var stream = new MemoryStream())
    {
        IFormatter formatter = new SoapFormatter();
        formatter.Serialize(stream, instance.ToArray());
        return stream.ToArray();
    }
}

public override List<StockPrice> Deserialize(byte[] source)
{
    using (var stream = new MemoryStream(source))
    {
        IFormatter formatter = new SoapFormatter();
        var target = formatter.Deserialize(stream);
        return (target as StockPrice[]).ToList();
    }
}

結果如下,由於它本身的特性,體積膨脹得更可怕了(我記得WCF默認就是使用SOAP?):

Name Serialize(ms) Deserialize(ms) Bytes
SoapSerializer 105 123 2,858,416

3.4 JSON

JSON(JavaScript Object Notation)是一種由道格拉斯·克羅克福特構想和設計、輕量級的資料交換語言,該語言以易於讓人閱讀的文字為基礎,用來傳輸由屬性值或者序列性的值組成的數據對象。

雖然.NET提供了DataContractJsonSerializer,但Json.NET更受歡迎,代碼如下:

public override byte[] Serialize(List<StockPrice> instance)
{
    using (var stream = new MemoryStream())
    {
        var serializer = new DataContractJsonSerializer(typeof(List<StockPrice>));
        serializer.WriteObject(stream, instance);
        return stream.ToArray();
    }
}

public override List<StockPrice> Deserialize(byte[] source)
{
	using (var stream = new MemoryStream(source))
	{
		var serializer = new DataContractJsonSerializer(typeof(List<StockPrice>));
		var target = serializer.ReadObject(stream);
		return target as List<StockPrice>;
	}
}

結果如下,JSON的體積比XML小很多:

Name Serialize(ms) Deserialize(ms) Bytes
JsonSerializer 40 60 504,320

3.5 Protobuf

其實一開始我和我的同事就清楚用Protobuf最好。

Protocol Buffers 是 Google提供的數據序列化機制。它性能高,壓縮效率好,但是為了提高性能,Protobuf采用了二進制格式進行編碼,導致可讀性較差。

使用protobuf-net需要將序列化的對象使用ProtoContractAttribute和ProtoMemberAttribute進行標記。序列化和反序列化代碼如下:

public override byte[] Serialize(List<StockPrice> instance)
{
    using (var stream = new MemoryStream())
    {
        Serializer.Serialize(stream, instance);
        return stream.ToArray();
    }
}

public override List<StockPrice> Deserialize(byte[] source)
{
	using (var stream = new MemoryStream(source))
	{
		return Serializer.Deserialize<List<StockPrice>>(stream);
	}
}

結果十分優秀:

Name Serialize(ms) Deserialize(ms) Bytes
ProtobufSerializer 93 18 211,926

3.6 結果對比

Name Serialize(ms) Deserialize(ms) Bytes
BinarySerializer 117 12 242,460
XmlSerializer 133 26 922,900
SoapSerializer 105 123 2,858,416
JsonSerializer 40 60 504,320
ProtobufSerializer 93 18 211,926

將上述方案的結果列出來對比,Protobuf序列化后體積最少。不過即使是Protobuf,壓縮后的數據仍然比文本文檔的200K還大,那還不如直接傳輸這個文本文檔。

4. 優化數據結構

其實傳輸的數據結構上有很大的優化空間。

首先是股票代號Symbol,前面提到獲取股價的接口大概是這樣:IEnumerable LoadStockPrices(string symbol,DateTime beginDate,DateTime endDate) 。既然都知道要獲取的股票代號,StockPrice中Symbol這個屬性完全就是多余的。

其次是OHLC和PreClosePrice,港股(不記得其它Market是不是這樣)的報價肯定是4位有效數字(如95.05和102.4),用float精度也夠了,不必用 double。

最后是Date,反正只需要知道日期,不必知道時分秒,直接用與1970-01-01相差的天數作為存儲應該就可以了。

private static DateTime _beginDate = new DateTime(1970, 1, 1);

public DateTime Date
{
	get => _beginDate.AddDays(DaysFrom1970);
	set => DaysFrom1970 = (short) Math.Floor((value - _beginDate).TotalDays);
}

[ProtoMember(2)]
[DataMember]
public short DaysFrom1970 { get; set; }

不要以為Volume可以改為int,有些仙股有時會有幾十億的成交量,超過int的最大值2147483647(順便一提Int32的最大值是2的31次方減1,有時面試會考)。

這樣修改后的類結構如下:

[Serializable]
[ProtoContract]
[DataContract]
public class StockPriceSlim
{
	[ProtoMember(1)]
	[DataMember]
	public float ClosePrice { get; set; }

    private static DateTime _beginDate = new DateTime(1970, 1, 1);

    public DateTime Date
	{
		get => _beginDate.AddDays(DaysFrom1970);
		set => DaysFrom1970 = (short) Math.Floor((value - _beginDate).TotalDays);
	}

	[ProtoMember(2)]
	[DataMember]
	public short DaysFrom1970 { get; set; }

	[ProtoMember(3)]
	[DataMember]
	public float HighPrice { get; set; }

	[ProtoMember(4)]
	[DataMember]
	public float LowPrice { get; set; }

	[ProtoMember(5)]
	[DataMember]
	public float OpenPrice { get; set; }

	[ProtoMember(6)]
	[DataMember]
	public float PrvClosePrice { get; set; }

	[ProtoMember(8)]
	[DataMember]
	public double Turnover { get; set; }

	[ProtoMember(9)]
	[DataMember]
	public double Volume { get; set; }
}

序列化的體積大幅減少:

Name Serialize(ms) Deserialize(ms) Bytes
BinarySerializer 11 12 141,930
XmlSerializer 42 24 977,248
SoapSerializer 48 89 2,586,720
JsonSerializer 17 33 411,942
ProtobufSerializer 7 3 130,416

其實之所以有這么大的優化空間,一來是因為傳輸的對象本身就是ORM生成的對象沒針對網絡傳輸做優化,二來各個券商的數據源差不多都是這樣傳輸數據的,最后,本來這個接口是給桌面客戶端用的根本就懶得考慮傳輸數據的大小。

5. 自定義的序列化

由於股票的數據結構相對穩定,而且這個接口不需要通用性,可以自己實現序列化。StockPriceSlim所有屬性加起來是38個字節,測試數據是2717條報價,共103246字節,少於Protobuf的130416字節。要達到每個報價只存儲38個字節,只需將每個屬性的值填入固定的位置:


public override byte[] SerializeSlim(List<StockPriceSlim> instance)
{
	var list = new List<byte>();
	foreach (var item in instance)
	{
		var bytes = BitConverter.GetBytes(item.DaysFrom1970);
		list.AddRange(bytes);

		bytes = BitConverter.GetBytes(item.OpenPrice);
		list.AddRange(bytes);

		bytes = BitConverter.GetBytes(item.HighPrice);
		list.AddRange(bytes);

		bytes = BitConverter.GetBytes(item.LowPrice);
		list.AddRange(bytes);

		bytes = BitConverter.GetBytes(item.ClosePrice);
		list.AddRange(bytes);

		bytes = BitConverter.GetBytes(item.PrvClosePrice);
		list.AddRange(bytes);

		bytes = BitConverter.GetBytes(item.Volume);
		list.AddRange(bytes);

		bytes = BitConverter.GetBytes(item.Turnover);
		list.AddRange(bytes);
	}

	return list.ToArray();
}


public override List<StockPriceSlim> DeserializeSlim(byte[] source)
{
	var result = new List<StockPriceSlim>();
	var index = 0;
	using (var stream = new MemoryStream(source))
	{
		while (index < source.Length)
		{
			var price = new StockPriceSlim();
			var bytes = new byte[sizeof(short)];
			stream.Read(bytes, 0, sizeof(short));
			var days = BitConverter.ToInt16(bytes, 0);
			price.DaysFrom1970 = days;
			index += bytes.Length;

			bytes = new byte[sizeof(float)];
			stream.Read(bytes, 0, sizeof(float));
			var value = BitConverter.ToSingle(bytes, 0);
			price.OpenPrice = value;
			index += bytes.Length;

			stream.Read(bytes, 0, sizeof(float));
			value = BitConverter.ToSingle(bytes, 0);
			price.HighPrice = value;
			index += bytes.Length;

			stream.Read(bytes, 0, sizeof(float));
			value = BitConverter.ToSingle(bytes, 0);
			price.LowPrice = value;
			index += bytes.Length;

			stream.Read(bytes, 0, sizeof(float));
			value = BitConverter.ToSingle(bytes, 0);
			price.ClosePrice = value;
			index += bytes.Length;

			stream.Read(bytes, 0, sizeof(float));
			value = BitConverter.ToSingle(bytes, 0);
			price.PrvClosePrice = value;
			index += bytes.Length;

			bytes = new byte[sizeof(double)];
			stream.Read(bytes, 0, sizeof(double));
			var volume = BitConverter.ToDouble(bytes, 0);
			price.Volume = volume;
			index += bytes.Length;

			bytes = new byte[sizeof(double)];
			stream.Read(bytes, 0, sizeof(double));
			var turnover = BitConverter.ToDouble(bytes, 0);
			price.Turnover = turnover;
			index += bytes.Length;

			result.Add(price);
		}
		return result;
	}
}


結果如下:

Name Serialize(ms) Deserialize(ms) Bytes
CustomSerializer 5 1 103,246

這種方式不僅序列化后的體積最小,而且序列化和反序列化的速度都十分優秀,不過代碼十分難看而且沒有擴展性。嘗試用反射改進一下:

public override byte[] SerializeSlim(List<StockPriceSlim> instance)
{
    var result = new List<byte>();
    foreach (var item in instance)
        foreach (var property in typeof(StockPriceSlim).GetProperties())
        {
            if (property.GetCustomAttribute(typeof(DataMemberAttribute)) == null)
                continue;

            var value = property.GetValue(item);
            byte[] bytes = null;
            if (property.PropertyType == typeof(int))
                bytes = BitConverter.GetBytes((int)value);
            else if (property.PropertyType == typeof(short))
                bytes = BitConverter.GetBytes((short)value);
            else if (property.PropertyType == typeof(float))
                bytes = BitConverter.GetBytes((float)value);
            else if (property.PropertyType == typeof(double))
                bytes = BitConverter.GetBytes((double)value);
            result.AddRange(bytes);
        }

    return result.ToArray();
}

public override List<StockPriceSlim> DeserializeSlim(byte[] source)
{
	using (var stream = new MemoryStream(source))
	{
		var result = new List<StockPriceSlim>();
		var index = 0;

		while (index < source.Length)
		{
			var price = new StockPriceSlim();
			foreach (var property in typeof(StockPriceSlim).GetProperties())
			{
				if (property.GetCustomAttribute(typeof(DataMemberAttribute)) == null)
					continue;

				byte[] bytes = null;
				object value = null;

				if (property.PropertyType == typeof(int))
				{
					bytes = new byte[sizeof(int)];
					stream.Read(bytes, 0, bytes.Length);
					value = BitConverter.ToInt32(bytes, 0);
				}
				else if (property.PropertyType == typeof(short))
				{
					bytes = new byte[sizeof(short)];
					stream.Read(bytes, 0, bytes.Length);
					value = BitConverter.ToInt16(bytes, 0);
				}
				else if (property.PropertyType == typeof(float))
				{
					bytes = new byte[sizeof(float)];
					stream.Read(bytes, 0, bytes.Length);
					value = BitConverter.ToSingle(bytes, 0);
				}
				else if (property.PropertyType == typeof(double))
				{
					bytes = new byte[sizeof(double)];
					stream.Read(bytes, 0, bytes.Length);
					value = BitConverter.ToDouble(bytes, 0);
				}

				property.SetValue(price, value);
				index += bytes.Length;
			}


			result.Add(price);
		}
		return result;
	}
}

Name Serialize(ms) Deserialize(ms) Bytes
ReflectionSerializer 413 431 103,246

好像好了一些,但性能大幅下降。我好像記得有人說過.NET會將反射緩存讓我不必擔心反射帶來的性能問題,看來我的理解有出入。索性自己緩存些反射結果:

private readonly IEnumerable<PropertyInfo> _properties;

public ExtendReflectionSerializer()
{
	_properties = typeof(StockPriceSlim).GetProperties().Where(p => p.GetCustomAttribute(typeof(DataMemberAttribute)) != null).ToList();
}

Name Serialize(ms) Deserialize(ms) Bytes
ExtendReflectionSerializer 11 11 103,246

這樣改進后性能還可以接受。

6. 最后試試壓縮

最后試試在序列化的基礎上再隨便壓縮一下:

public byte[] SerializeWithZip(List<StockPriceSlim> instance)
{
    var bytes = SerializeSlim(instance);

    using (var memoryStream = new MemoryStream())
    {
        using (var deflateStream = new DeflateStream(memoryStream, CompressionLevel.Fastest))
        {
            deflateStream.Write(bytes, 0, bytes.Length);
        }
        return memoryStream.ToArray();
    }
}

public List<StockPriceSlim> DeserializeWithZip(byte[] source)
{
	using (var originalFileStream = new MemoryStream(source))
	{
		using (var memoryStream = new MemoryStream())
		{
			using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
			{
				decompressionStream.CopyTo(memoryStream);
			}
			var bytes = memoryStream.ToArray();
			return DeserializeSlim(bytes);
		}
	}
}

結果看來不錯:

Name Serialize(ms) Deserialize(ms) Bytes Serialize With Zip(ms) Deserialize With Zip(ms) Bytes With Zip
BinarySerializer 11 12 141,930 22 12 72,954
XmlSerializer 42 24 977,248 24 28 108,839
SoapSerializer 48 89 2,586,720 61 87 140,391
JsonSerializer 17 33 411,942 24 35 90,125
ProtobufSerializer 7 3 130,416 7 6 65,644
CustomSerializer 5 1 103,246 9 3 57,697
ReflectionSerializer 413 431 103,246 401 376 59,285
ExtendReflectionSerializer 11 11 103,246 13 14 59,285

7. 結語

滿足了好奇心,順便復習了一下各種序列化的方式。

因為原來的需求就很單一,沒有測試各種數據量下的對比。

雖然Protobuf十分優秀,但在本地存儲序列化文件時為了可讀性我通常都會選擇XML或JSON。

8. 參考

二進制序列化
XML 和 SOAP 序列化
Json.NET
Protocol Buffers - Google's data interchange format

9. 源碼

StockDataSample


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM