ARM, RISC, and Understanding the Ecosystem

Before I can build custom binaries for my RG351, I must learn more about modern ARM architecture

As a gadget geek, I've always loved small embedded handheld computers. From the palm pilots, gameboys, and pocket pcs of my youth, to the Retroid Pocket, GPD and Steam Decks of today - I've always had a special place in my heart for pocket computing. Recently, I decided to see just exactly what my RG351m could do if I targeted it natively, rather than only running games on emulated hardware. This is the story of that ill conceived notion.

References:

I have really grown fond of my cheap retro handheld collection. While the Nintendo Switch is clearly a better piece of kit, it's just not pocketable enough to be a truly mobile gaming console. My phone? Even with emulators, no hardware input other than a touchscreen kills the experience for me. Not only have these retro handhelds gotten inexpensive (under 150 dollars for even the most premium device) they've gotten a lot more powerful. Keep in mind, emulating an console is way more computationally expensive than just running an application, so getting silky smooth PSone era roms to work means the device is capable of a lot more!

It really got me thinking - how would a native binary built for one of these devices run? Could I get ... openXcom running? How about OpenMorrowind? Jedi Outcast in the palm of my hand? How well would one of Godot's demos run?

The easiest approach would have been to target libRetro and RetroArch, load up Lakka, and give it a whirl. I might end up doing that yet, but as I started digging deeper I started to realize that there was a lot of fun to be had exploring this particular end of game development.

So, looking at my Abernic RG351m (and RG351v) I decided I'd risk bricking it in order to see just what I could do with this hardware if I tried to target it directly. Of course, it wasn't long before I realized just exactly how little I really understood about the undertaking. But hey, gotta start somewhere.

The first step for me was building a binary - even a little hello world app - that could run on the RG351's RockChip RK3326 SoC. Well, it's an ARM processor, I knew that. But ... what does that even mean?

A bit About ARM

Like many lifelong x86 developers, my understanding of ARM was basically that it was a RISC based CPU used on low powered devices like mobile phones and network routers, and Raspberry Pis. As I dug into the guides covering cross compiling, I realized that it was a little bit more complicated than that. It wasn't sufficient just to know that I was building for ARM, but there were multiple layers of compatability and specific manufactuerer implementations of the ARM architecture that I would have to manage in order to successfully build for my device.

I needed to have a better understanding of ARM architecture.

Before we can understand ARM, though, we first need to talk about RISC. It's even in their name: Advanced RISC Machines. So, any understanding of the ARM architecture must start with an understanding of RISC.

What is RISC

RISC stands for "Reduced Instruction Set Computer" and it is a set of design principles when it comes to creating CPU instruction sets. In contrast to x86 which is a CISC or "Complex Instruction Set Computer", RISC designers aim to have only the most minimal set of general purpose instructions, and rely on compilers to form more complex logic through them. A CISC designer, on the other hand, sees value in additional specialized instructions.

As an illustrative example: a CISC designer might analyze a code base, and realize that loading a variable from a register, incrementing it, then storing it again is a very common of operations. They might then decide to package these 3 operations into a single instruction. Maybe they can even optimize it in silicon. A real world example of this mentality is AVX (Advanced Vector Extensions). Most modern Intel and AMD chips support 128 or even 256 bit single instruction vector operations built directly into the instruction set! Performing these kinds of operations on these huge chunks of data opens up a lot of interesting use cases (and formed the basis of more than a few optimizations at my job).

By contrast, a RISC designer instead would argue that not only does adding all these specialty instructions add complexity for the users and the compiler, but it has ramifications on the cost of manufacturing, power consumption, and general flexibility of an architecture. There's a reason you don't see many i3 powered smartphones, much less hard drive controllers.

So since RISC is a design goal, more than a specific standard, when we talk about ARM architectures, we are referring to - in part - a RISC styled instruction set standard developed by the ARM corporation.

The University of California at Berkley also curates an open RISC instruction set standard known as RISC-V (risk 5, not vee - which is what I always called it prior to this research).

Similarly, the newly released Apple M1 series of chips (and Apples previous PowerPC chips) were also built with RISC principles in mind.

Okay, so now that we have an understanding of RISC, and we know how it relates to the ARM processor we want to target. What other information do we need in order to start building binaries that will work with it?

Understanding Your Target

So, I want to target the Rockchip RK3326 that is in my Anbernic RG351m. What do I need to know about this chip in order to start building binaries for it?

We know it is an ARM device, but is that enough to configure and compile? Unfortunately it turns out it's not quite so simple. This RK3326 is a lot more than just an "arm cpu", so we'll need a little more information in order to start targetting it successfully.

Here are some additional considerations I need to make:

  • ARM Architecture
  • IP Core
  • System on Chip (SoC)
  • Peripheral Support

Architecture

ARM (the company) also publishes a specification for ARM (the architecture).

These specificatons go through various revisions and are appropraitely named:

  • ARMv4
  • ARMv5
  • ARMv6
  • ...

The Rockchip RG3326 implements the ARMv8-A architecture.

In addition to specifying a specific instruction set (A64 for us), the -A designation denotes that it is designed specifically for general purpose computing (tablets, mobile phones, handhelds, etc). The architecture may also specify things like the application binary interface, execution state (AArch64). You can read more on the ARMv8 instruction set architecture here

IP Cores

Just like how NVIDIA or ATI develop a new graphics card specification, and then license that technology out to other manufacturers (think: Gigbyte RTX3080 vs EVGA RTX3080), ARM operates on a similar business model. They develop IP Cores, which describe an architecture and configuration for that architecture that manufacturers can license and develop either as general purpose chips, SoC, boards, or devices.

There are many IP Core classes, but the ones we that I find most recognizable - and the one that is in our RK3326 - are those in the Cortex line.

The Cortex IP cores come in 3 broad flavors:

  • CortexA Application core, for running operating systems like linux or android. (things like handhelds, phones, etc - the kind of hardware I see the most)
  • CortexR super small embedded core for hardware that has realtime requirements such as routers, media players, or hard disk controllers
  • CortexM for microcontrollers.

Each of these cores have several configurations, each with a specific set of segmented tradeoffs. These may range from cores like the CortexA5 - the smallest and most energy efficient configuration, to the mid sized and high power usage CortexA55.

The RK3326 is a Quad Core ARM Cortex-A35 (pg7). The Cortex-A35 core was introduced back in 2015.

There may be overlap in the architectures targeted in the differnt cortex flavors. CortexA5 through A17 all are all based on the ARMv7-A architecture, while the A32 through A73 cores implement ARMv8-A

Note this naming convention should not be confused with apples A-series named architectures.

System on Chip (SoC)

Manufacturers may also add additional functionality to augment the capabilities of a specific IP Core. For example, the CortexA7 has several chips made by different manufacturers that have significant differences:

  • AllwinnerA20 uses a dual core CortexA7 ip core, paired with a Mali-400 gpu in the same chip.
  • The Qualcomm Snapdron chips are CortexA7 ip cores paired with an Adreno 305 gpu
  • The Samsung Exynos chips uses both a quad core CortexA7 and quad core CortexA15s in the same chip, giving it the ability to switch between the two to trade off between high power and low energy consumption.

These additions may have implications for you as a cross compiling developer.

The RK3326 has, in addition to that Quad Core Cortex-A35, embedded GPU support compatible with OpenGL up to 3.2, OpenCL 2.0, and Vulkan 1.0. It also has additional embedded hardware engines for H264 and H265 decoding, Power management, DMAC, and more.

Peripheral Support

Lastly, though not specifically a CPU concern, is the need to be aware of what additional peripherals we will need to support on the device - and how we access them. From SDCards and WiFi interfaces, to screens and joysticks - we will need to know exactly what drivers, APIs, and ABIs are available to us at the application level.

Conclusion

In order to be deployable to such a vast array of devices, sizes, power profiles, and use cases compiling for the ARM ecosystem requires a little more effort than just targeting "linux on x86". There's still a lot more for me to learn, but this was definitely very enlightening.