Thursday, December 4, 2014

Another symmetric octa-core CPU-based SoC announced (HiSilicon Kirin 620)

Huawei has just announced a new SoC, Kirin 620, with an octa-core Cortex-A53 CPU. The chip is the latest in a series of newly introduced octa-core Cortex-A53-based SoCs from companies such as MediaTek and Qualcomm as well as other players.

New Kirin 620 chip appears to target cost-sensitive segment


HiSilicon shows some smart design choices with this chip. It is clearly designed to be relatively cheap to manufacture (with a relatively limited chip die area) while still providing good performance for low/mid-range devices.

In the past, HiSilicon has been using CPU cores with a relatively large die area such Cortex-A9 and Cortex-A15, which do not result in a particularly cheap or power-efficient chip. However, the Cortex-A53 is the direct successor to the very power-efficient and extremely small Cortex-A7 core, which means even with eight cores the chip will still be relatively small as well as power-efficient.

The maximum CPU clock speed of 1.2 GHz is significantly lower than most other announced Cortex-A53-based SoCs, illustrating that the chip is intended for the cost-sensitive segment. Possibly, it is manufactured on TSMC's relatively economical 28LP process technology, which limits maximum performance.

Compared to MediaTek’s and Qualcomm’s new octa-core Cortex-A53-based chips, the Mali-450 MP4 GPU is notable because it does not support the OpenGL ES 3.0 API. However, OpenGL ES 2.0 is still the standard in the mobile market, and HiSilicon can probably improve cost and performance this way (especially since Mali-T62x and Mali-T760 are not cheap in terms of die size). Mali-T760 would have been faster and more power-efficient, but Mali-450 MP4 saves cost while still providing reasonable performance.

The new chip has several similarities with MediaTek’s MT6592, which is almost a year old, and has eight Cortex-A7 cores instead of Cortex-A53 and also a Mali-450 MP4 GPU.

Octa-core Cortex-A53 core CPU provides benefits for performance/Watt and performance/dollar


Because the Cortex-A53 (like its predecessor, the Cortex-A7) has a very small die size in comparison to higher-performance cores like Cortex-A57 and Cortex-A15, the use of eight cores instead of four does not very significantly raise the cost of the chip, while greatly increasing multi-core performance. Although not quite true for HiSilicon's chip due to the relatively low clock speed, several other Cortex-A53-based chips are also clocked at a relatively high frequency (in excess of 2 GHz for MT6795), resulting in respectable single-core performance as well, and making such a configuration suitable for the performance segment.

An octa-core configuration can provide real benefits in practice in a multi-threaded OS such as Android. Applications that can readily take advantage of eight cores include the Chrome browser and software video decoding and encoding libraries, all of which can improve the user experience. Because the eight cores are usually physically split into two clusters with a separate L2 cache, there is also room for further optimizations by the kernel scheduler in order to maximize performance and power efficiency.

For example, it might be possible for the scheduler to disable one of the two clusters of four CPU cores and its associated L2 cache during normal operation (when the load is not high), resulting in low power consumption. When more CPU power is needed, the second cluster comes online. Even when there are only a few threads, the scheduler might be able to detect the need for more L2 cache memory in a particular workload and move one or more threads to the second cluster. MediaTek's CorePilot technology, with which it has had experience since the MT6592, probably involves heuristics of this kind.

Overview of symmetric octa-core Cortex-A7 and Cortex-A53-based SoCs


The following table shows an overview of currently announced octa-core Cortex-A7 and Cortex-A53-based SoCs, starting with MediaTek's MT6592 which has been available for about a year.

(Click to enlarge)
Note that Qualcomm's Snapdragon 615 is not really a symmetric octa-core because it uses a pseudo-big.LITTLE configuration with four Cortex-A53 cores clocked higher and four cores clocked lower.

Performance comparison of octa-core Cortex-A7 and Cortex-A53-based SoCs


The following tables show CPU performance (using a representative Geekbench subtest result) as well GPU performance based on GFXBench for relevant SoCs and devices for which benchmark data is available. It includes both octa-core Cortex-A7 and Cortex-A53-based SoCs, as well as other existing SoCs from different market segments, for reference.

(Click to enlarge)
The first few columns of the table show a description of the SoC with CPU configuration, the name of a representative device model using the SoC and the maximum CPU clock speed. Then comes the Geekbench JPEG Compression benchmark test, both single-core and multi-core. This Geekbench subtest has been found to be relatively sensitive to CPU performance without being very sensitive to other factors such as L2 cache size.

The rightmost columns show information about the GPU. First listed are the GPU type and off-screen performance for the GFXBench T-Rex (OpenGL ES 2.0) and Manhattan (OpenGL ES 3.0) benchmarks. The offscreen tests always render into a 1920x1080 off-screen buffer, making results comparable between devices with different screen resolutions. The actual resolution used on the device comes next, followed by on-screen T-Rex benchmark benchmark performance and information relevant for battery life and long-term performance (which is affected by thermal throttling). This includes average long-term performance of the T-Rex on-screen benchmark, the battery size of the device and the battery life in minutes when running T-Rex on-screen long-term.

Mali-T760 appears to be highly efficient


Notable is that GPU performance of the MT6752 with Mali-T760 MP2 GPU as represented by the Lenovo A70-A entry in the GFXBench database is comparable with the Snapdragon 615-based HTC Desire 820, despite the latter's higher low-level pixel processing performance (such as evident in the ALU and Alpha Blending scores) provided by the Adreno 405 GPU.

This strongly suggests that ARM has made a big leap in terms of performance efficiency with the Mali-T760 GPU core in conjunction with compression-based bandwidth optimization technologies such as ARM Framebuffer Compression, Transaction Elimination and Smart Composition as well as good integration with the Cortex-A53 CPU architecture (which already shows memory performance improvements).

Based on GFXBench power efficiency data, none of the listed SoCs appears to be particularly power-efficient with a full GPU load with the complex T-Rex benchmark, but data for the Mali-T760 MP2-based MT6752 has yet to come in. However, the best battery life entries in the GFXBench database for the Samsung Galaxy Note 4 with Mali-T760 MP6-based Exynos 7 Octa shows the ability to run the on-screen T-Rex benchmark for more than 300 minutes with reasonable sustained performance on the very high resolution screen of the Note 4, which is compatible with relatively high power efficiency of the Mali-T760 GPU.

Note that power efficiency is likely to be better for typical GPU applications that are less demanding than GFXBench's T-Rex benchmark (this affects lower-end SoCs/GPUs more than higher-end ones).

Sources: CNXSoftware (Kirin 620 announcement), GFXBench results database, Geekbench browser

Updated December 25, 2014 (Correct memory interface information for Snapdragon 615).

No comments: