Android/Linux Thermal Governor之IPA分析與使用

本文轉載自查看原文 2017-02-10 22:14 2755 Linux電源管理

IPA（Intelligent Power Allocator）模型的核心是利用PID控制器，Thermal Zone的溫度作為輸入，可分配功耗值作為輸出，調節Allocator的頻率和電壓值。

由Power Management一般開發模型可知，包括模型建立，模型實現，驗證。

1 IPA模型

PID控制器在Sustainable Power基礎上，根據當前溫度和Control Temp之間的差值，來調節可分配功耗值的大小，進而調節Cooling設備的狀態，也即調整OPP（Voltage和Frequency組合）。

所謂Sustainable Power是在不同OPP情境下，某一個最大OPP的溫度保持基本穩定。比其大者，溫度上升明顯；比其小者溫度保持不變或者下降。這可以通過監測不同OPP對應的溫度值，得到一個Sustainable Power。

另一個就是根據當前環境預估下一個場景功耗值。一般認為包括兩部分Dynamic Power和Static Leakage，這是由實測過程中得出的經驗。Dynamic Power可以認為跟Voltage和Frequency相關；Static Leakage跟Voltage和Temperature有關。根據實測得到的數據，進行分析得到最吻合數據的一組算式。由於的HiKey實測中，Static Leakage比較小，就被忽略了。所以最終Power值就只跟Voltage和Frequency相關，據此就可以算出OPP對應的功耗值。OPP和功耗之間就建立了聯系。

在一個重要參數就是PID控制器的參數P、I、D的確定，這部分也存在一定的經驗值。需要測試幾組不同參數，然后看溫度控制效果。

2 IPA測試環境

1. 在最靠近CPU的地方引出測試點。

2. 接出Ground、V+、V-到ARM Energy Probe。

3. 通過軟件設置特殊狀態：

1. 對於sustainable power需要將8核跑在100%workload。

2. 對於測試Cluster Power和CPU Power就比較復雜，下面單列。

4. 使用Ipython腳本讀取Thermal Zone溫度和測試點功耗。

HiKey對應的Cluster和CPU功耗狀態如下：

Power State	PD_CPUx/CLKIN	PDCORTEXA53	PD_L2	LinuxKernel
CPU	CPU P-State	On	On	On	P-State
WFI	On, internal clock gating	On	On	C-State
CPU Off	Off	On	On	C-State
Cluster	Cluster P-State	On or Off	On	On	P-State
Cluster L2 Retention	Off	Off	Retention	C-State
Cluster Off	Off	Off	Off	C-State

圖表 1 HiKey Cluster和CPU狀態

3 IPA重要參數

sustainable-power

OPP(MHz)	Sustainable power
729	2155
960	3326
1200	5285

圖表 2 Sustainable power

sustainable-power在thermal-zone里面，是因為測量的溫度是基於thermal-sensors的，然后每個thermal-zone包含若干trips和cooling-maps。

通過觀察溫度，在729MHz的時候溫度不會增加，在960MHz的時候溫度緩慢增加，在1200MHz的時候溫度增加很快。所以確定sustainable-power在960MHz。

在Thermal框架中有一個work queue會去輪詢thermal_zone_device_check，根據Trip類型不同會執行不同的delay，passive模式100ms，其他1000ms。

control_temp

IPA模型有兩個溫度參數很重要，當溫度低於65C的時候IPA處於關閉模式，reset PID控制器。當溫度高於65C，IPA開始起作用；75C是IPA的control_temp，也即高於75C，IPA就會考慮降低可分配功耗，以達到降低溫度的目的。

圖表 3 Thermal Zones DTS

對於cooling-maps，需要上下兩張圖結合理解。trip表示在target開始啟動cooling；contribution是針對對個Allocator進行權重分配；cooling-device參數是<設備 min max>。這里面設置的min和max需要在cooling-min-level和cooling-max-level之間。cpufreq會將對應值轉換成OPP對應的voltage和frequency進行設置。

dynamic-power-coefficient

echo 0 > /sys/devices/system/cpu/cpu[1…7]/online，關閉CPU1-CPU7，只保留CPU0。

echo mem > /sys/power/state，通過對內核代碼hack使SoC相對於CPU0工作狀態，逐漸關閉CPU0，Cluster0，整個SoC。得到如下數據：

OPP(MHz)	Voltage(V)	Cluster Power Off State (mW)	Cluster P-State (mW)	Cluster Power (mW)	CPU WFI (mW)	CPU P-State (mW)	CPU Dynamic Power(mW)
208	1.04	344	360	16	379	429	69
432	1.04	345	374	29	387	498	124
729	1.09	346	393	47	408	617	224
960	1.18	352	427	75	442	794	367
1200	1.33	367	479	112	508	1149	670

圖表 4 HiKey功耗測試數據

功耗計算公式：

power = dyn_coeff * (freq * volt^2) + static_coeff * F(volt) * F(Temp)

Dynamic power = capacitance * (freq * volt^2)

Cluster model

Freq	Voltage	*F V^2**	Power	Model power	Zero model
208	1.04	224.9728	16	16	12
432	1.04	467.2512	29	29	25
729	1.09	866.1249	47	49	47
960	1.18	1336.704	75	73	72
1200	1.33	2122.68	112	113	115

	Gradient (capacitance)	Intercept (staic power)
Linear regression	0.051	4.716716513
L.R. thru zero	0.054	0

圖表 5 Cluster系數計算

圖表 6 Cluster線性圖表

CPU model

Freq	Voltage	*F V^2**	Power	Model power	Zero model
208	1.04	224.9728	69	44	67
432	1.04	467.2512	124	121	139
729	1.09	866.1249	224	247	258
960	1.18	1336.704	367	396	399
1200	1.33	2122.68	670	645	633

	Gradient (capacitance)	Intercept (staic power)
Linear regression	0.317	-27.12625497
L.R. thru zero	0.298	0

圖表 7 CPU功耗系數計算

圖表 8 CPU線性圖標

由以上Cluster和CPU的coefficient得到，dynamic-power-coefficient = (0.298 + (0.054/4 CPUs)) * 1000 = 311。

LINEST：使用最小二乘法對已知數據進行最佳直線擬合，然后返回描述此直線的數組。

LINEST(known_y's,known_x's,const,stats)

Known_y's 是關系表達式 y = mx + b 中已知的 y 值集合。

如果數組 known_y's 在單獨一列中，則 known_x's 的每一列被視為一個獨立的變量。

如果數組 known_y's 在單獨一行中，則 known_x's 的每一行被視為一個獨立的變量。

Known_x's 是關系表達式 y = mx + b 中已知的可選 x 值集合。

數組 known_x's 可以包含一組或多組變量。如果僅使用一個變量，那么只要 known_x's 和 known_y's 具有相同的維數，則它們可以是任何形狀的區域。如果用到多個變量，則 known_y's 必須為向量（即必須為一行或一列）。

如果省略 known_x's，則假設該數組為 {1,2,3,...}，其大小與 known_y's 相同。

Const 為一邏輯值，用於指定是否將常量 b 強制設為 0。

如果 const 為 TRUE 或省略，b 將按正常計算。

如果 const 為 FALSE，b 將被設為 0，並同時調整 m 值使 y = mx。

Stats 為一邏輯值，指定是否返回附加回歸統計值。

如果 stats 為 TRUE，則 LINEST 函數返回附加回歸統計值，這時返回的數組為 {mn,mn-1,...,m1,b;sen,sen-1,...,se1,seb;r2,sey;F,df;ssreg,ssresid}。

如果 stats 為 FALSE 或省略，LINEST 函數只返

4 IPA實現

static struct thermal_governor thermal_gov_power_allocator = {

.name = "power_allocator",

.bind_to_tz = power_allocator_bind,

.unbind_from_tz = power_allocator_unbind,

.throttle = power_allocator_throttle,

};

static int power_allocator_bind(struct thermal_zone_device *tz)

Power Allocator的結構體，包括三個核心函數power_allocator_bind、power_allocator_unbind、power_allocator_throttle。

初始化PID控制器的參數並且將power_allocator_params綁定到tz->governor_data。

struct power_allocator_params {

bool allocated_tzp;

s64 err_integral; //accumulated error in the PID controller

s32 prev_err; //error in the previous iteration of the PID controller

int trip_switch_on; //first passive trip point of the thermal zone. The governor switches on when this trip point is crossed.

int trip_max_desired_temperature; //last passive trip point of the thermal zone. The temperature we are controlling for.

};

PID參數

if (!tz->tzp->k_po || force)

tz->tzp->k_po = int_to_frac(sustainable_power) / temperature_threshold;

if (!tz->tzp->k_pu || force)

tz->tzp->k_pu = int_to_frac(2 * sustainable_power) / temperature_threshold;

if (!tz->tzp->k_i || force)

tz->tzp->k_i = int_to_frac(10) / 1000;

從DTS獲得的參數可知，temperature_threshold = control_temp - switch_on_temp = 75000-65000 = 10000。

tz->tzp->k_po = int_to_frac(sustainable_power) /temperature_threshold =3326*1024/10000=340.5824

tz->tzp->k_pu = int_to_frac(2 * sustainable_power) /temperature_threshold =3326*2*1024/10000=681.1648

tz->tzp->k_i = int_to_frac(10) / 1000 = 10*1024/1000=10.24

另兩個參數tz->tzp->k_d、tz->tzp->integral_cutoff默認為0。

PID控制器

圖表 9 power_allocator_throttle流程

power_allocator_throttle作為IPA的調節功能，首先判斷當前溫度是否小於switch_on_temp。如果小於的話，就不進入PID調節，分配最大可用功耗。反之，則使用PID進行功耗分配。當PID調節一段時間后，如果溫度低於switch_on_temp時，PID控制器的所有參數也會被重啟，所以PID控制器也會得到糾正。

圖表 10 allocate_power流程

allocate_power作為IPA的核心，遍歷所有thermal_instances，獲得actor數目及其權重；然后計算每個actor的max_power、weighted_req_power和所有actor的max_allocatable_power、total_weighted_req_power。

pid_controller根據control_temp、max_allocatable_power即pid參數計算出power_range作為下一次分配的功耗預算。

divvy_up_power基於weighted_req_power、max_power、num_actors、total_weighted_req_power、power_range在每個actor之間分配可用功耗，得出granted_power。

power_actor_set_power根據分配到的功耗設置cooling設備。cdev->ops->power2state將功耗值轉換成cooling設備狀態值，thermal_cdev_update的cdev->ops->set_cur_state對cooling進行設置。至此完成整個Thermal Zone的調節。

有幾個重要的概念，thermal_instance指的是特定thermal_zone中特定trip上的cooling設備；power actor是一個功耗消耗實體，並且可進行功耗狀態轉換，能通過調節狀態達到調節功耗的目的；actor的權重，默認是1024，如果比較重要可以增加weight值，反之可以減小。功耗分配不是基於req_power而是weighted_req_power。

IPA的缺陷：PID控制器在周期性tick環境下效果比較好，如果不規則重復則可能表現不太好，比如中斷觸發。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Android/Linux Thermal框架分析及其Governor對比 Linux Thermal Framework分析及實施 Linux thermal framework Linux Thermal 學習筆記 Linux電源管理（五）thermal【轉】 Android Thermal HAL 降龍十八掌 Android/Linux下CGroup框架分析及其使用 Linux動態頻率調節系統CPUFreq之三：governor【轉】 Android OkHttp使用與分析 Android PopupWindow的使用和分析