EAS in Android common kernel
● EAS r1.3 is merged into Androidcommon kernel and Hikey kernel
○ Many optimizations have beenmerged in this version!
○ EAS r1.3 “hasdone a large refactoring of find_best_target” to rework wakeup, simplify the decision making and make heuristics clearer
○ EAS r1.3 makes ‘schedutil’ the recommended CPUFreq governor
○ EAS r1.3 will optimize smalltasks by placing them on LITTLE CPUs most of the time
Workloadautomation
● “Workload Automation (WA) is a framework for executing workloads and collecting measurements on Android and Linux devices.” [1]
○ WA captures energy instrumentdata
○ WA captures performance metrics
○ WA captures ftrace log
○ WA captures kernel statistics(interrupt numbers during test case running)
● WA can run test cases for powerscenarios/interactive performance/intensive performance testing for multiple iterations in single one testing configuration
● WA absents some features but wecan easily extend it [2]
○ Support multiple kernel burning
○ Support kernel schedulerstatistics
○ Support result comparison withdifferent configurations
● Co-works with LISA for detailedscheduler analysis
[1]https://github.com/ARM-software/workload-automation
[2] https://github.com/Leo-Yan/workload-automation
Testsuites
● Powersaving cases
○ “idle” - sleep for 120s
○ “audio” - playback mp3 for 120s
○“video” - playbackvideo (1080p) for 120s with hardware codec
● Interactivecases
○ UiBench
■ “InflatingListActivity” - Scroll thescreen for list widget
■ “InvalidateActivity” - Reversecolor for small grids, invalidate and redraw view
■“ActivityTransition” - Clickscreen to zoom in and zoom out picture
■“TrivialAnimationActivity” - Animationfor rendering whole screen color
○ “Galleryfling” - Launchgallery with hundreds pictures and fling pictures
○ “Recentfling” - Openrecents screen and fling recent tasks list
○ “Emailfling” - Open emailapp and fling emails list
○ “Browserfling” - Openbrowser icecat and fling the webpage
Controlgroups (CGroups) in Android system
● “Control Groups provide a mechanism for aggregating/partitioningsets of tasks, and all their future children,into hierarchical groups with specialized behaviour…mainly for resource-tracking
purposes”:
https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
● A cgroup associates a set oftasks with a set of parameters
○ CPUCtl: restrict CPU time andkeeps background tasks to only small portion bandwidth
■ cpu.shares
■ cpu.rt_runtime_us
■ cpu.rt_period_us
○ CPUSet: pin tasks to specifiedCPUs
■ cpus: CPUaffinity for tasks
○ Schedtune: speedup thetime-to-completion for a task activation
■ boost: inflate task and CPU utilization and impact OPP selection
■ prefer_idle: select idle CPUs for task to avoid scheduling latency
● Init scripts can statically setcgroups parameters
● Android PowerHAL candynamically set cgroups parameters for different scenarios
● PowerHAL is disabled inthis slides and use WA script to statically set parameters
Understanding‘prefer_idle
● ‘prefer_idle’ isindicator to ask scheduler to select idle CPU for ‘prefer_idle’ tasks aspossible; it’s mainly used for multi-threadingconcurrency performance boosting
● Selectidle CPUs with specific orders and impactspower heavily
○ Whenboost > 0, the big cluster is firstly iterated so ‘prefer_idle’ tasks have much more chance to bemigrated on big CPUs
○ Whenboost = 0, tries to find IDLE CPU from LITTLE cluster firstly; ‘prefer_idle’ tasks can be well spreaded within LITTLE cluster, this results in power saving for middle workload scenario (e.g.video playback)
○ Iterationstarts from low ID for CPU, so low ID CPU has higher priority
● UsingCPUSET ‘prefer_idle’ canguarantee latency for important tasks
(top-app)
The first step:
We can startto analyze the CPU frequency selection for some power sensitive cases (e.g. video/audio playbacketc); if there have some scenarios with CPU frequency is overrated,it’s good start point for analysis.
The second step:
Based onthe first step optimization, we can explore optimization for task placement crossing different clusters;
this meanwe don’t merely optimize small tasks placed on LITTLE CPUs,but alsoneed find better occasions to
place big workload tasks to big CPUs.
Function calc_total_util() introduced fortotal utilization estimation
staticunsigned long calc_total_util(int cpu, struct task_struct *p, unsigned longcpu_util,
unsignedlong tsk_util)
{
unsignedlong cfs_util, rt_util, dl_util, total;
intboost;
inttsk_boost = p ? schedtune_task_boost(p) : 0;
intcpu_boost = schedtune_cpu_boost(cpu);
/*
*Estimate cfs utilization with boostmargin
*/
boost= max(tsk_boost, cpu_boost);
cfs_util= cpu_util tsk_util;
cfs_util = schedtune_margin(cfs_util, boost);
/*Convert rt class utilization with CPU capacity */
rt_util= get_rt_cpu_capacity(cpu) * capacity_orig_of(cpu);
rt_util= rt_util / SCHED_CAPACITY_SCALE;
/*Convert dl class utilization with CPU capacity */
dl_util= get_dl_cpu_capacity(cpu) * capacity_orig_of(cpu);
dl_util= dl_util / SCHED_CAPACITY_SCALE;
total= cfs_util rt_util dl_util;
returntotal;
}
Totalutil = CFS util
=CFS boost margin
=RT util
=DL util
Conclusion
●EASon Hikey960 is working well for EAS development; suggest to use UEFI ARM-TFfirmwares and latest MCU firmware
●CPUSET/Sched Tune is the recommended interface for simple EAS tuning by devicemanufacturers for quick optimization
● Investigationof EAS on Hikey960 has shown some good
ideas for further improvementsand some of these are being proposed on Google code review for afuture AOSP common
● Proposedoptimization
○ Firstlywe need consistent method for CPU utilization estimation which includesutilization contributed by RT and DL tasks
○ The power optimization can be achieved by optimization the active CPU frequency;due it has side effects for spreading tasks to more spare capacity CPU so also can benefit some interactivecases (Optimize frequency for the IDLE CPU?)
○ Using cpumask to give more chance for energy calculation, it can significantly savepower for interactivecases and sustainable cases