二、nova-api
步驟3:nova-api接收請求
nova-api接收請求,也不是隨便怎么來都接收的,而是需要設定rate limits,默認的實現是在ratelimit的middleware里面實現的。
然而有時候,我們希望實現distributed rate-limiting,從而Turnstile是一個不錯的選擇。
https://github.com/klmitch/turnstile
http://pypi.python.org/pypi/turnstile
步驟4:對Token的驗證
步驟5:查看Policy
這兩步已經在keystone的時候研究過
步驟6:檢查quota
nova, neutron, Cinder各有各的quota,並且可以從命令行進行管理
# nova -h | grep quota
quota-class-show List the quotas for a quota class.
quota-class-update Update the quotas for a quota class.
quota-defaults List the default quotas for a tenant.
quota-delete Delete quota for a tenant/user so their quota will
quota-show List the quotas for a tenant/user.
quota-update Update the quotas for a tenant/user.
# nova quota-show
+-----------------------------+-------+
| Quota | Limit |
+-----------------------------+-------+
| instances | 10 |
| cores | 20 |
| ram | 51200 |
| floating_ips | 10 |
| fixed_ips | -1 |
| metadata_items | 128 |
| injected_files | 5 |
| injected_file_content_bytes | 10240 |
| injected_file_path_bytes | 255 |
| key_pairs | 100 |
| security_groups | 10 |
| security_group_rules | 20 |
+-----------------------------+-------+
# cinder -h | grep quota
quota-class-show List the quotas for a quota class.
quota-class-update Update the quotas for a quota class.
quota-defaults List the default quotas for a tenant.
quota-show List the quotas for a tenant.
quota-update Update the quotas for a tenant.
quota-usage List the quota usage for a tenant.
# cinder quota-show 1779b3bc725b44b98726fb0cbdc617b1
+-----------+-------+
| Property | Value |
+-----------+-------+
| gigabytes | 1000 |
| snapshots | 10 |
| volumes | 10 |
+-----------+-------+
# neutron -h | grep quota
quota-delete Delete defined quotas of a given tenant.
quota-list List quotas of all tenants who have non-default quota values.
quota-show Show quotas of a given tenant
quota-update Define tenant's quotas not to use defaults.
# neutron quota-show 1779b3bc725b44b98726fb0cbdc617b1
+---------------------+-------+
| Field | Value |
+---------------------+-------+
| floatingip | 50 |
| network | 10 |
| port | 50 |
| router | 10 |
| security_group | 10 |
| security_group_rule | 100 |
| subnet | 10 |
+---------------------+-------+
推薦下面的文章
openstack nova 基礎知識——Quota(配額管理)
http://www.sebastien-han.fr/blog/2012/09/19/openstack-play-with-quota/
步驟7:在數據庫中創建Instance實例
有關nova的database schema參考下面的文章
http://www.prestonlee.com/2012/05/03/openstack-nova-essex-mysql-database-schema-diagram-and-sql/
MySQL是Openstack中最重要的組件之一,所以在生產環境中High Availability是必須的。
MySQL的HA有下面幾種方式:
http://dev.mysql.com/doc/mysql-ha-scalability/en/index.html
Requirement | MySQL Replication | MySQL with DRBD with Corosync and Pacemaker | MySQL Cluster |
Availability | |||
Platform Support | All Supported by MySQL Server | Linux | All Supported by MySQL Cluster |
Automated IP Failover | No | Yes | Depends on Connector and Configuration |
Automated Database Failover | No | Yes | Yes |
Automatic Data Resynchronization | No | Yes | Yes |
Typical Failover Time | User / Script Dependent | Configuration Dependent, 60 seconds and Above | 1 Second and Less |
Synchronous Replication | No, Asynchronous and Semisynchronous | Yes | Yes |
Shared Storage | No, Distributed | No, Distributed | No, Distributed |
Geographic redundancy support | Yes | Yes, via MySQL Replication | Yes, via MySQL Replication |
Update Schema On-Line | No | No | Yes |
Scalability | |||
Number of Nodes | One Master, Multiple Slaves | One Active (primary), one Passive (secondary) Node | 255 |
Built-in Load Balancing | Reads, via MySQL Replication | Reads, via MySQL Replication | Yes, Reads and Writes |
Supports Read-Intensive Workloads | Yes | Yes | Yes |
Supports Write-Intensive Workloads | Yes, via Application-Level Sharding | Yes, via Application-Level Sharding to Multiple Active/Passive Pairs | Yes, via Auto-Sharding |
Scale On-Line (add nodes, repartition, etc.) | No | No | Yes |
要想系統的學習Mysql replication,推薦下面的這本書
《MySQL High Availability Tools for Building Robust Data Centers》
還有一種方式是Mysql + galera,可以搭建Active + Active的Mysql應用
參考下面的兩篇文章
http://www.sebastien-han.fr/blog/2012/04/08/mysql-galera-cluster-with-haproxy/
http://www.sebastien-han.fr/blog/2012/04/01/mysql-multi-master-replication-with-galera/
還有一種常見的HA的技術,就是pacemaker
最底層是通信層corosync/openais
負責cluster中node之間的通信
上一層是Resource Allocation Layer,包含下面的組件:
CRM Cluster Resouce Manager
是總管,對於resource做的任何操作都是通過它。每個機器上都有一個CRM。
CIB Cluster Information Base
CIB由CRM管理,是在內存中的XML數據庫,保存了cluster的配置和狀態。我們查詢出來的configuration都是保存在CIB里面的。nodes, resources, constraints, relationship.
DC Designated Coordinator
每個node都有CRM,會有一個被選為DC,是整個Cluster的大腦,這個DC控制的CIB是master CIB,其他的CIB都是副本。
PE Policy Engine
當DC需要進行一些全局配置的時候,首先由PE根據當前的狀態和配置,計算出將來的狀態,並生成一系列的action,使得cluster從初始狀態變為結果狀態。PE僅僅在DC上運行。
LRM Local Resource Manager
本地的resource管理,調用resource agent完成操作,啟停resource,將結果返回給CRM
再上一層是Resource Layer
包含多個resource agent。resource agent往往是一些shell script,用來啟動,停止,監控resource的狀態。
要弄懂Pacemaker,推薦讀《SUSE high availability guide》
https://www.suse.com/documentation/sle_ha/singlehtml/book_sleha/book_sleha.html
本人做了一些筆記和實驗,請參考
步驟8:創建filter_properties,用於nova scheduler
步驟9:發送RPC給nova-conductor
有關nova-conductor的文章
http://cloudystuffhappens.blogspot.com/2013/04/understanding-nova-conductor-in.html
在Openstack中,RPC的發送是通過RabbitMQ
RabbitMQ可以通過Pacemaker進行HA,當然也可以搭建自己的RabbitMQ Cluster
學習RabbitMQ當然首推《RabbitMQ in Action》
本人也做了一些筆記
RabbitMQ in Action (1): Understanding messaging
RabbitMQ in Action (2): Running and administering Rabbit
RabbitMQ in Action(5): Clustering and dealing with failure
還沒完全讀完,敬請諒解
當然Openstack中對於RabbitMQ的使用,一篇很好的文章是
本人也對RPC的調用過程進行了代碼分析
步驟10:nova-condutor創建request_spec,用於scheduler
步驟11:nova-conductor發送RPC給nova-scheduler
三、nova-scheduler
選擇一個物理機來創建虛擬機,我們稱為schedule的過程
nova scheduler的一個經典的圖如下
就是先Filter再Weighting,其實scheduler的過程在很早就參與了。
步驟13:對Host進行Filtering
Filtering主要通過兩個變量進行,request_spec和filter_properties,而這些變量在前面的步驟,都已經准備好了。
而不同的Filter只是利用這些信息,然后再根據從HostManager統計上來的HostState信息,選出匹配的Host。
request_spec中的第一個信息就是image的properties信息,尤其是當你想支持多種Hypervisor的時候,Xen的image, KVM的image, Hyper-V的image各不相同,如何保證image跑在正確的Hypervisor上?在image里面這種hypervisor_type property就很必要。
請閱讀下面的文章
http://www.cloudbase.it/filtering-glance-images-for-hyper-v/
image properties還會有min_ram, min_disk,只有內存和硬盤夠大才可以。
Flavor里面可以設置extra_specs,這是一系列key-value值,在數據結構中,以instance_type變量實現,可以在里面輸入這個Flavor除了資源需求的其他參數,從而在Filter的時候,可以使用。
host aggregates將所有的Host分成多個Group,當然不同的Group可以根據不同的屬性Metadata划分,一種是高性能和低性能。
在Openstack文檔中,這個例子很好的展示了host aggregates和Flavor extra_specs的配合使用
http://docs.openstack.org/trunk/config-reference/content/section_compute-scheduler.html
Example: Specify compute hosts with SSDs
This example configures the Compute service to enable users to request nodes that have solid-state drives (SSDs). You create a fast-io
host aggregate in the nova
availability zone and you add the ssd=true
key-value pair to the aggregate. Then, you add the node1
, and node2
compute nodes to it.
$ nova aggregate-create fast-io nova +----+---------+-------------------+-------+----------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+----------+ | 1 | fast-io | nova | | | +----+---------+-------------------+-------+----------+ $ nova aggregate-set-metadata 1 ssd=true +----+---------+-------------------+-------+-------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+-------------------+ | 1 | fast-io | nova | [] | {u'ssd': u'true'} | +----+---------+-------------------+-------+-------------------+ $ nova aggregate-add-host 1 node1 +----+---------+-------------------+-----------+-------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+------------+-------------------+ | 1 | fast-io | nova | [u'node1'] | {u'ssd': u'true'} | +----+---------+-------------------+------------+-------------------+ $ nova aggregate-add-host 1 node2 +----+---------+-------------------+---------------------+-------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+----------------------+-------------------+ | 1 | fast-io | nova | [u'node1', u'node2'] | {u'ssd': u'true'} | +----+---------+-------------------+----------------------+-------------------+
Use the nova flavor-create command to create the ssd.large
flavor called with an ID of 6, 8 GB of RAM, 80 GB root disk, and four vCPUs.
$ nova flavor-createssd.large
6
8192
80
4
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+-------------+ | ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public | extra_specs | +----+-----------+-----------+------+-----------+------+-------+-------------+-----------+-------------+ | 6 | ssd.large | 8192 | 80 | 0 | | 4 | 1 | True | {} | +----+-----------+-----------+------+-----------+------+-------+-------------+-----------+-------------+
Once the flavor is created, specify one or more key-value pairs that match the key-value pairs on the host aggregates. In this case, that is the ssd=true
key-value pair. Setting a key-value pair on a flavor is done using the nova flavor-key command.
$ nova flavor-key ssd.large
set ssd=true
Once it is set, you should see the extra_specs
property of the ssd.large
flavor populated with a key of ssd
and a corresponding value of true
.
$ nova flavor-show ssd.large +----------------------------+-------------------+ | Property | Value | +----------------------------+-------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 80 | | extra_specs | {u'ssd': u'true'} | | id | 6 | | name | ssd.large | | os-flavor-access:is_public | True | | ram | 8192 | | rxtx_factor | 1.0 | | swap | | | vcpus | 4 | +----------------------------+-------------------+
Now, when a user requests an instance with the ssd.large
flavor, the scheduler only considers hosts with the ssd=true
key-value pair. In this example, these are node1
and node2
.
另一個作用是Xen和KVM的POOL分開,有利於XEN進行Live Migration
另一個作用是Windows和Linux的POOL分開,因為Windows是需要收費的,而Linux大多不需要,Windows的收費是按照物理機,而非虛擬機來收費的,所有需要盡量的將windows的虛擬機放到一個物理機上。
Filter_properties的里面scheduler_hints是一個json,里面可以設置任何值,用於Filter的時候使用。
例如JsonFilter
The JsonFilter allows a user to construct a custom filter by passing a scheduler hint in JSON format. The following operators are supported:
- =
- <
- >
- in
- <=
- >=
- not
- or
- and
The filter supports the following variables:
- $free_ram_mb
- $free_disk_mb
- $total_usable_ram_mb
- $vcpus_total
- $vcpus_used
Using the nova command-line tool, use the --hint
flag:
$ nova boot --image 827d564a-e636-4fc4-a376-d36f7ebe1747 --flavor 1 --hint query='[">=","$free_ram_mb",1024]' server1
With the API, use the os:scheduler_hints
key:
{
"server":
{
"name":
"
s
e
r
v
e
r
-
1
"
,
"imageRef":
"
c
e
d
e
f
4
0
a
-
e
d
6
7
-
4
d
1
0
-
8
0
0
e
-
1
7
4
5
5
e
d
c
e
1
7
5
"
,
"flavorRef":
"
1
"
}
,
"os:scheduler_hints":
{
"query":
"
[
&
g
t
;
=
,
$
f
r
e
e
_
r
a
m
_
m
b
,
1
0
2
4
]
"
}
}
我們可以指定某個物理機,用下面的命令--availability-zone <zone-name>:<host-name>
步驟14:對合適的Hosts進行weighting並且排序
選出了Hosts,接下來就是進行Weighting的操作
Weighting可以根據很多變量來,一般來說Memory和disk是最先需要滿足的,CPU和network io則需要次要考慮,一般來說,對於付錢較少的Flavor,能滿足memory和disk就可以了,對於付錢較多的Flavor,則需要保證其CPU和network io.