雲區域（region)，可用區（AZ），跨區域數據復制（Cross-region replication）與災備（Disaster Recovery）（部分1）

本文轉載自查看原文 2018-04-22 09:23 16248 雲Cloud/ 基礎知識/ 原理

本文分兩部分：部分1 和部分2。部分1 介紹 AWS，部分2 介紹阿里雲和OpenStack雲。

1. AWS

1.1 AWS 地理組件概況

AWS 提供三種地理性組件：

Regions：區域，即AWS提供雲服務的一個區域，其目的是為了用戶能就近接入，降低網絡延遲。通常是一個城市的若干個AZ組成一個region。2016年，AWS 宣布在其全球region之間建設了100GbE 私有環網。
Availability Zones：一個 region 內至少兩個通常三個可用區，其用途是為了搭建高可用架構。一個比較常見的看法是一個AZ是一個數據中心。其實這不盡然，有時候靠得非常近的幾個數據中心也可以組成一個AZ。最多一個AZ有8個數據中心。部分AZ 超過30萬台服務器。AZ擁有獨立的包括電力和網絡在內的基礎設施等。AZ 之間利用低延遲光纖網絡互聯。
Edge Locations：指往往部署在大城市，以及主要人口匯聚區域的AWS 站點。它的主要作用是緩存數據，降低延遲。它們獨立於region 和 AZ，數量比AZ多很多。它被多個AWS服務利用，比如AWS CloudFront 和 AWS Lambda@Edge。CloudFront 利用它來作為提供給用戶分布在全球的接入點，通常稱為Edge POP 點。

AWS 基礎組件：

圖示：AWS 全球regions 采用100GbE 環網互聯（除中國region外）

圖示：AZ 與AZ之間使用低延遲光纖網絡互聯

圖：AWS 區域與可用區之間的關系（所有region 都有 2+ AZ，新建region 有 3+ AZ，最多一個region有5個AZ）

圖：AZ之間和region之間的網絡互聯

這以AWS最大的region North Virginia region (us-east-1) 為例，它有5個AZ，每個AZ 有2到8個IDC，每個AZ最多可以放30萬台服務器。

橙色線表示一個AZ內部的IDC之間的互聯網絡。
藍色線表示AZ之間的互聯網絡。最大的AZ與其它大的AZ之間都有雙重連接，但是中等大小的AZ 只與大的AZ 之間有網絡連接，而與其他中等規模的AZ之間並沒有網絡連接。
黃色線表示AWS Region 之間以及AWS region 之外的網絡互聯。每個 AWS region 有兩個 Transit Points 用於這種網絡連接。

圖：AWS region 與 Edge POP（截至2017.06，全球77個PoP點，11個區域性 Edge Cache 點）

1.2 AWS 各服務與地理性組件的關系

AWS 中有大量的服務，每種服務有不同的特性：

AWS 少量服務是全局性的，也就是不限於特定region，比如下圖中的 IAM、SES、Route 53 和 CloudFront
部分服務是區域性的，也就是其作用范圍在某個特定區域內，比如下圖中的S3、AMI
部分服務是可用區性的，也就是其作用范圍在某可用區內，比如下圖中的 EC2和EBS

服務	子服務	全局性	區域性	可用區性	備注
IAM（AWS Identity and Access Management）	Users, Groups, Roles, Accounts	Y			Same AWS accounts, users, groups and roles can be used in all regions IAM users 是與 AWS account 綁定的，不受限於某個region。
	Key Pairs		Y		Amazon EC2 created key pairs are specific to the region
	RSA key pair	Y			RSA key pair can be created and uploaded that can be used in all regions
Virtual Private Cloud	VPC		Y		VPC are created within a region VPC 位於一個reigon內，且分布與該region的所有AZ內。 VPC不能遷移至其它region，而只能新建。
	Subnet			Y	Subnet can span only a single Availability Zone
	Security groups		Y		A security group is tied to a region and can be assigned only to instances in the same region.
	VPC Endpoints		Y		You cannot create an endpoint between a VPC and an AWS service in a different region
	VPC Peering		Y		2017年底前：VPC Peering can be performed across VPC in the same account or different AWS accounts but only within the same region. They cannot span across regions。 2017年12月：Amazon 發布了跨region的 VPC Peering，目前只在少數幾個region發布了。來源。
	NAT gateway			Y	A NAT gateway operates in a single Availability Zone
	Virtual private gateway (VGW)		Y		virtual private gateways that are highly available across a region without additional configuration. However, high availability for the VPN service and Direct Connect is configurable and managed by the user.
	Internet gateway		Y		A single Internet gateway is considered highly available within a region without any other action, just as if you had multiple, equal cost routes to the same destination.
	Elastic IP Address		Y		Elastic IP address created within the region can be assigned to instances within the region only 每個region有它自己的地址池，EIP 從該池中分配。
EC2
	Resource Identifiers		Y		Each resource identifier, such as an AMI ID, instance ID, EBS volume ID, or EBS snapshot ID, is tied to its region and can be used only in the region where you created the resource.
	Instances			Y	An instance is tied to the Availability Zones in which you launched it. However, note that its instance ID is tied to the region.
	EBS Volumes			Y	Amazon EBS volume is tied to its Availability Zone and can be attached only to instances in the same Availability Zone.
	EBS Snapshot		Y		An EBS snapshot is tied to its region and can only be used to create volumes in the same region and has to be copied from One region to other if needed 可利用 Snapshot Copy 功能將其拷貝至其它region
	AMIs （Aamzon Machine Images）		Y		AMI provides templates to launch EC2 instances AMI is tied to the Region where its files are located with Amazon S3. For using AMI in different regions, the AMI can be copied to other regions AWS 提供 AMI Copy 功能來將某AMI 拷貝至其它region。
	Auto Scaling		Y		Auto Scaling spans across multiple Availability Zones within the same region but cannot span across regions
	ELB(Elastic Load Balancer)		Y		Elastic Load Balancer distributes traffic across instances in multiple Availability Zones in the same region 無法將 ELB 遷移至其它region，你只能在其它region中新建ELB實例。
	SSH Public Keys		Y		保存在region內，AWS不跨region復制或同步keys。
	Placement Groups			Y	Placement groups can be span across Instances within the same Availability Zones
S3			Y		S3 buckets are created within the selected region Bucket 中的數據物理地位於一個region內，但是可以從其它region上訪問它，此時需要考慮到延時問題。 Objects stored are replicated across Availability Zones to provide high durability but are not cross region replicated unless done explicitly
Glacier			Y		要遷移 Glacier 中的數據的話，需要經過幾個步驟：1. 將 Glacier 中的數據restore到 S3 中。2. 利用 S3 Copy 功能將數據拷貝至另一個region 3. 利用 S3 lifecycle policy 將 S3 中的數據轉移到新的region的 Glacier 內 4. 將原region的 Glacier 中的數據刪除。
EFS(Elastic File System)			Y		有兩種數據在region間的遷移途徑。1. 將EFS中的數據拷貝至 EBS，然后利用 EBS Snapshot Copy 功能將數據拷貝至另一個region內，再將數據從 EBS 拷貝到 EFS 內。 2. 將 EFS 中的數據拷貝到 S3 中，然后將利用 S3 Cross-region Replication 功能將數據拷貝至另一個region，再從S3 拷貝到EFS。
Route53		Y			Route53 services are offered at AWS edge locations and are global
RDS			Y	Y	RDS 實例有單可用區的，也有跨多AZ 的可利用 AWS Database Migration Serivce 進行跨區域遷移跨區域的手工數據遷移步驟：1. 停止transactions 2. 在一個臨時 EC2 將 DB 中的數據導出為文件 3. 利用工具將文件拷貝至遠端region的EC2內 4. 創建RDS實例 5. 導入數據文件
ElastiCache			Y		支持 Redis 和 Memcached Redis 遷移方法：1. 給集群手工創建一個 backup 2. 將backup 導入 S3. 4. S3 bucket 復制到另一個region。 5. 在新的region 內從 S3 restore 數據，其過程包括創建一個新的 Redis cluster 然后導入數據。 Memcached 數據跨region 遷移方法：在新的region 內創建一個 cluster，然后從應用層做數據復制。
RedShift			Y		集群遷移：利用 RedShift cross-region snapshot 功能創建snapshot 並將它拷貝到新的region內，然后將snapshot restore 到集群。表遷移：利用 RedShift Upload 功能將數據導入 S3，再利用 S3 Cross-region Replication 功能將數據復制到另一個region，再在另一個region內創建 RedShift 集群並利用 COPY 功能從S3 中導入數據。
EMR			Y		跨region 遷移 EMR：在新的 region 內新建 EMR Cluster，然后導入數據如果數據在 S3 中，則利用 S3 cross-region replication 功能將數據遷移到新的 region 內如果數據在 HDFS 內，擇利用 S3DistCp 命令將HDFS 內的數據拷貝到 S3，然后再利用 S3DistCp 命令將S3 中的數據拷貝到目標 HDFS 內。
Elasticsearch			Y		為 ES domain 創建一個 snapshot，它會被保存到 S3 內。再利用 S3 做跨region 復制。再在新region內將數據從S3 恢復到 Elasticsearch 中。
SQS(Simple Queue Service)			Y		SQS queues 位於region內。需要利用應用，將消息從源region 的 queues 中導入目的 region的 queues 內。
SNS(Simple Notification Service)			Y		SNS topics 位於region 內
Auroa			Y		在另一個region 內創建一個 Aurora Cluster 作為 Read Replica。一旦創建后，Amazon RDS 對原 Aurora cluster 做snapshot，然后將 snapshot 發送只 Read Replica。
DynamoDb			Y		All data objects are stored within the same region and replicated across multiple Availability Zones in the same region Data objects can be explicitly replicated across regions using cross-region replication
WAF		Y			Web Application Firewall (WAF) services protects web applications from common web exploits are offered at AWS edge locations and are global
CloudFront		Y			CloudFront is the global content delivery network (CDN) services are offered at AWS edge locations
Storage Gateway			Y		AWS Storage Gateway stores volume, snapshot, and tape data in the AWS region in which the gateway is activated
SES(Simple Email Service)		Y			SES 有 regional endpoint。你的應用既可以使用與它相同region內的 SES服務，也可以利用其它region內的SES服務。當然了，這里面需要考慮跨region延遲問題。

1.3 區域性和可用區性的實例的跨區域復制

AWS S3 的數據位於某個區域內，但是可以進行跨任意區域遷移。因此，很多區域性和可用區性的數據都利用S3該功能做跨區域數據遷移。

1.3.1 S3 跨區域復制

當數據發送到S3 以后，數據會以對象形式在區域內的多個可用區內保存。但是，每個區域的S3依然有單點故障風險。當一個region故障后，該區域的S3服務將變得不可用。要避免該問題，AWS提供了 Amazon S3 Cross-Region replication (CRR) 功能。它能夠在不同的可用區之間異步地同步S3 bucket 中的數據。

1.3.2 區域之間的數據復制

下圖中的 AMI、EBS snapshot 和 RDS snapshot 都是保存在 S3 之中，因此都能夠利用 S3 的跨區域復制能力復制到其它區域。

1.3.2.1 EBS 的跨區域遷移

EBS 是可用區性的。要將某個 EBS 實例拷貝到另一個region，需要利用 S3 的跨區域復制能力。

（1）為 EBS 創建 snapshot，它會被保存在 S3 內。

（2）利用 EBS snapshot copy 功能將 EBS 快照拷貝到另一個region中

（3）在新的region 中從該snapshot 上創建一個新的 EBS 實例

1.3.2.2 EC2 實例的跨可用區遷移

（1）為 ECS 實例創建 AMI。AMI 在整個區域內可見，因此肯定可以在另一個可用區內使用

（2）在另一個可用區內，利用該 AMI 創建一個新的實例。

1.3.4 AWS RDS復制

（1）AWS Multi-AZ RDS （AZ 之間數據同步復制）

AWS 的 Multi-AZ RDS 功能就是利用AZ 之間的低延遲網絡的一個例子。它的主要特性如下：

支持 MySQL, MariaDB, PostgreSQL, Oracle, 和 SQL Server database (DB) RDS 實例
包括一主一備兩個數據庫進程，分別位於同一個region內的兩個可用區內
主備之間數據同步復制
只有主提供讀寫服務，備不對外提供服務
DB 應用通過 DNS 來訪問主節點
主備自動切換，通常切換時間為 60~120 秒，自動切換時會更新DSN 記錄，對數據庫應用透明
與單AZ RDS 相比，延遲大概會增加 2~5 ms
該方案提供 99.95% 的 SLA
該方案只能用於HA和DR，不能用於提高性能和擴容

（2）AWS RDS Read Replica （region 之間異步數據復制）

AWS RDS Read Replica 是在區域之間做數據復制的一個例子。它的主要特性如下：

Read Replica 可以在一個region 內，也可以在另一個region內（當前只支持MySQL 和 MariaDB）。
Primary 和 Read Replica 之間采用異步數據復制，跨區域時會加密
Read Replica 只能用於讀，所有的寫都到Primary。
每個Read Replica 都有各自的 DSN endpoint，每個Primary 最多5個 Read Replica
Read Replica 的主要用途包括：分流Primary 上的大量讀工作負載，不能用於HA；在Primary 發生故障時可將Read Replica提升為Master 來提供服務；在靠近用戶的region內部署Read Replica 來降低用戶訪問延遲。
當Primary 有 Standy 節點時，在發生自動切換之后，Read Replica 的soure 會自動切換到原來的 Standy（也就是現在的Primary）

兩者對比：

（3）AWS Oracle 和 MSSQL 服務的數據復制能力

1.4 災備

災備（Disaster Recovery）包括災備方案和環境准備，以及從災難中恢復兩部分。任何對企業的業務持續性或財務有負面影響的事件都可成為災難。災難包括硬件或軟件故障、斷網、斷電、火災、水災、人為錯誤等等。為了減少災難帶來的損失，企業往往會投入時間和金錢來計划和准備、訓練員工、定義和更新流程。為DR 計划而做的投資往往有很大不同。災難恢復往往有兩個指標：