Blue Fox Arm Assembly Internals & Reverse Engineering PDF
Document Details
Uploaded by LuckyBay
Basic Education High School No. 1 Dagon
2023
Maria Markstedter
Tags
Summary
This book, "Blue Fox Arm Assembly Internals & Reverse Engineering," by Maria Markstedter, provides a comprehensive explanation of Arm assembly, reverse engineering techniques, and associated topics. It dives into file formats, OS fundamentals, and the Arm architecture. It's suitable for those interested in advanced software engineering and system analysis.
Full Transcript
Markstedter745303_bindex.indd 451 3/15/2023 10:01:15 AM Blue Fox Arm Assembly Internals & Reverse Engineering Blue Fox Arm Assembly Internals & Reverse Engineering Maria Markstedter Copyright © 2023 by John Wiley & Sons, Inc. All rights reserved. Published b...
Markstedter745303_bindex.indd 451 3/15/2023 10:01:15 AM Blue Fox Arm Assembly Internals & Reverse Engineering Blue Fox Arm Assembly Internals & Reverse Engineering Maria Markstedter Copyright © 2023 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada and the United Kingdom. ISBN: 978-1-119-74530-3 ISBN: 978-1-119-74673-7 (ebk) ISBN: 978-1-119-74672-0 (ebk) No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clear- ance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at www.wiley.com/go/permission. Trademarks: WILEY and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permis- sion. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associ- ated with any product or vendor mentioned in this book. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materi- als. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Cus- tomer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. If you believe you’ve found a mistake in this book, please bring it to our attention by emailing our Reader Support team at [email protected] with the subject line “Possible Book Errata Submission.” Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: 2023933796 Cover illustration: © Jonas Jödicke Cover design: Maria Markstedter and Wiley To my mother, who made countless sacrifices to provide me with the opportunities that enabled me to pursue my dreams. About the Author Maria Markstedter is the founder and CEO of Azeria Labs, which provides training courses on Arm reverse engineering and exploitation. Previously, she worked in the fields of pentesting and threat intelligence and served as the chief product officer of the virtualization startup Corellium, Inc. She has a bachelor’s degree in corporate security and a master’s degree in enterprise security and worked on exploit mitigation research alongside Arm in Cambridge. Maria has been recognized for her contributions to the field, having been selected for Forbes’ “30 under 30” list for technology in Europe in 2018 and named Forbes Person of the Year in Cybersecurity in 2020. She has also been a member of the Black Hat® EU and US Trainings and Briefings Review Board since 2017. vii Acknowledgments First and foremost, I would like to thank my technical reviewers for spending endless hours patiently reviewing every chapter. Daniel Cuthbert, who has always been a great friend, supporter, and the best mentor I could ask for Jon Masters, an Arm genius whose technical knowledge has always inspired me Maddie Stone, who is a brilliant security researcher and a wonderful person I look up to Matthias Boettcher, who patiently served as supervisor for my master’s thesis at Arm and became a valuable technical reviewer for this book Thanks to Patrick Wardle for contributing the malware analysis chapter (Chapter 12, “Reversing arm64 macOS Malware”) to this book. Thanks to my editors, Jim Minatel and Kelly Talbot, for pushing me to complete this book during the pandemic and for being so patient with my insufferable perfectionism. I would also like to thank Runa Sandvik for being the best friend anyone could ask for and for giving me strength and support in difficult times. Most important, I want to thank all the readers for putting their faith in me. — Maria Markstedter ix Contents at a Glance Introductionxxi Part I Arm Assembly Internals1 Chapter 1 Introduction to Reverse Engineering 3 Chapter 2 ELF File Format Internals 21 Chapter 3 OS Fundamentals 69 Chapter 4 The Arm Architecture 93 Chapter 5 Data Processing Instructions 129 Chapter 6 Memory Access Instructions 195 Chapter 7 Conditional Execution 243 Chapter 8 Control Flow 275 Part II Reverse Engineering305 Chapter 9 Arm Environments 307 Chapter 10 Static Analysis 321 Chapter 11 Dynamic Analysis 363 Chapter 12 Reversing arm64 macOS Malware 405 Index437 xi Contents Introductionxxi Part I Arm Assembly Internals1 Chapter 1 Introduction to Reverse Engineering 3 Introduction to Assembly 3 Bits and Bytes 3 Character Encoding 5 Machine Code and Assembly 6 Assembling9 Cross-Assemblers 13 High-Level Languages 15 Disassembling16 Decompilation17 Chapter 2 ELF File Format Internals 21 Program Structure 21 High-Level vs. Low-Level Languages 22 The Compilation Process 24 Cross-Compiling for Other Architectures 25 Assembling and Linking 27 The ELF File Overview 30 The ELF File Header 31 The ELF File Header Information Fields 32 The Target Platform Fields 33 The Entry Point Field 34 The Table Location Fields 34 ELF Program Headers 34 The PHDR Program Header 36 The INTERP Program Header 36 xiii xiv Contents The LOAD Program Headers 36 The DYNAMIC Program Header 37 The NOTE Program Header 37 The TLS Program Header 38 The GNU_EH_FRAME Program Header 38 The GNU_STACK Program Header 39 The GNU_RELRO Program Header 41 ELF Section Headers 43 The ELF Meta-Sections 45 The String Table Section 46 The Symbol Table Section 46 The Main ELF Sections 46 The.text Section 47 The.data Section 47 The.bss Section 47 The.rodata Section 47 The.tdata and.tbss Sections 48 Symbols48 Global vs. Local Symbols 50 Weak Symbols 50 Symbol Versions 51 Mapping Symbols 51 The Dynamic Section and Dynamic Loading 52 Dependency Loading (NEEDED) 53 Program Relocations 54 Static Relocations 55 Dynamic Relocations 56 The Global Offset Table (GOT) 57 The Procedure Linkage Table (PLT) 57 The ELF Program Initialization and Termination Sections 58 Initialization and Termination Order 60 Thread-Local Storage 60 The Local-Exec TLS Access Model 65 The Initial-Exec TLS Access Model 65 The General-Dynamic TLS Access Model 66 The Local-Dynamic TLS Access Model 67 Chapter 3 OS Fundamentals 69 OS Architecture Overview 69 User Mode vs. Kernel Mode 70 Processes70 System Calls 72 Objects and Handles 77 Threads79 Process Memory Management 80 Contents xv Memory Pages 82 Memory Protections 82 Anonymous and Memory-Mapped Memory 84 Memory-Mapped Files and Modules 84 Address Space Layout Randomization 87 Stack Implementations 90 Shared Memory 91 Chapter 4 The Arm Architecture 93 Architectures and Profiles 93 The Armv8-A Architecture 95 Exception Levels 96 Armv8-A TrustZone Extension 97 Exception Level Changes 99 Armv8-A Execution States 101 The AArch64 Execution State 102 The A64 Instruction Set 103 AArch64 Registers 104 The Program Counter 106 The Stack Pointer 107 The Zero Register 107 The Link Register 108 The Frame Pointer 109 The Platform Register (x18) 109 The Intraprocedural Call Registers 110 SIMD and Floating-Point Registers 110 System Registers 111 PSTATE112 The AArch32 Execution State 114 A32 and T32 Instruction Sets 114 The A32 Instruction Set 114 The T32 Instruction Set 115 Switching Between Instruction Sets 115 AArch32 Registers 118 The Program Counter 119 The Stack Pointer 120 The Frame Pointer 120 The Link Register 121 The Intraprocedural Call Register (IP, r12) 121 The Current Program Status Register 121 The Application Program Status Register 122 The Execution State Registers 124 The Instruction Set State Register 124 The IT Block State Register (ITSTATE) 125 xvi Contents Endianness state 126 Mode and Exception Mask Bits 126 Chapter 5 Data Processing Instructions 129 Shift and Rotate Operations 131 Logical Shift Left 132 Logical Shift Right 133 Arithmetic Shift Right 133 Rotate Right 134 Rotate Right with Extend 134 Instruction Forms 135 Shift by a Constant Immediate Form 136 Shift by Register Form 138 Bitfield Manipulation Operations 140 Bitfield Move 141 Sign-and Zero-Extend Operations 145 Bitfield Extract and Insert 150 Logical Operations 153 Bitwise AND153 The TST Instruction 154 Bitwise Bit Clear 155 Bitwise OR 155 Bitwise OR NOT 156 Bitwise Exclusive OR 158 The TEQ instruction 158 Exclusive OR NOT 159 Arithmetic Operations 159 Addition and Subtraction 159 Reverse Subtract 161 Compare162 CMP Instruction Operation Behavior 163 Multiplication Operations 165 Multiplications on A64 166 Multiplications on A32/T32 167 Least Significant Word Multiplications 169 Most Significant Word Multiplications 171 Halfword Multiplications 173 Vector (Dual) Multiplications 176 Long (64-Bit) Multiplications 179 Division Operations 186 Move Operations 187 Move Constant Immediate 188 Move Immediate and MOVT on A32/T32 188 Move Immediate, MOVZ, and MOVK on A64 189 Move Register 190 Move with NOT 192 Contents xvii Chapter 6 Memory Access Instructions 195 Instructions Overview 195 Addressing Modes and Offset Forms 197 Offset Addressing 200 Constant Immediate Offset 201 Register Offsets 207 Pre-Indexed Mode 209 Pre-Indexed Mode Example 210 Post-Indexed Addressing 212 Post-Indexed Addressing Example 213 Literal (PC-Relative) Addressing 214 Loading Constants 215 Loading an Address into a Register 218 Load and Store Instructions 222 Load and Store Word or Doubleword 222 Load and Store Halfword or Byte 224 Example Using Load and Store 226 Load and Store Multiple (A32) 228 Example for STM and LDM 235 A More Complicated Example Using STM and LDM 237 Load and Store Pair (A64) 238 Chapter 7 Conditional Execution 243 Conditional Execution Overview 243 Conditional Codes 244 The NZCV Condition Flags 245 Signed vs. Unsigned Integer Overflows 246 Condition Codes 248 Conditional Instructions 249 The If-Then (IT) Instruction in Thumb 250 Flag-Setting Instructions 252 The Instruction “S” Suffix 253 The S Suffix on Add and Subtract Instructions 253 The S Suffix on Logical Shift Instructions 256 The S Suffix on Multiply Instructions 257 The S Suffix on Other Instructions 257 Test and Comparison Instructions 257 Compare (CMP) 258 Compare Negative (CMN) 260 Test Bits (TST) 261 Test Equality (TEQ) 264 Conditional Select Instructions 265 Conditional Comparison Instructions 268 Boolean AND Conditionals Using CCMP 269 Boolean OR Conditionals Using CCMP 272 xviii Contents Chapter 8 Control Flow 275 Branch Instructions 275 Conditional Branches and Loops 277 Test and Compare Branches 281 Table Branches (T32) 282 Branch and Exchange 284 Subroutine Branches 288 Functions and Subroutines 290 The Procedure Call Standard 291 Volatile vs. Nonvolatile Registers 293 Arguments and Return Values 293 Passing Larger Values 295 Leaf and Nonleaf Functions 298 Leaf Functions 298 Nonleaf Functions 299 Prologue and Epilogue 299 Part II Reverse Engineering305 Chapter 9 Arm Environments 307 Arm Boards 308 Emulation with QEMU 310 QEMU User-Mode Emulation 310 QEMU Full-System Emulation 314 Firmware Emulation 315 Chapter 10 Static Analysis 321 Static Analysis Tools 322 Command-Line Tools 322 Disassemblers and Decompilers 322 Binary Ninja Cloud 323 Call-By-Reference Example 328 Control Flow Analysis 334 Main Function 336 Subroutine336 Converting to char 341 if Statement 343 Quotient Division 345 for Loop 347 Analyzing an Algorithm 349 Chapter 11 Dynamic Analysis 363 Command-Line Debugging 364 GDB Commands 365 GDB Multiuser 366 GDB Extension: GEF 368 Installation369 Interface370 Contents xix Useful GEF Commands 370 Examine Memory 374 Watch Memory Regions 376 Vulnerability Analyzers 377 checksec379 Radare2381 Debugging382 Remote Debugging 385 Radare2386 IDA Pro 388 Debugging a Memory Corruption 390 Debugging a Process with GDB 398 Chapter 12 Reversing arm64 macOS Malware 405 Background406 macOS arm64 Binaries 407 macOS Hello World (arm64) 410 Hunting for Malicious arm64 Binaries 413 Analyzing arm64 Malware 419 Anti-Analysis Techniques 420 Anti-Debugging Logic (via ptrace) 421 Anti-Debugging Logic (via sysctl) 425 Anti-VM Logic (via SIP Status and the Detection of VM Artifacts) 429 Conclusion435 Index437 Introduction Let’s address the elephant in the room: why “Blue Fox”? This book was originally supposed to contain an overview of the Arm instruction set, chapters on reverse engineering, and chapters on exploit miti- gation internals and bypass techniques. The publisher and I soon realized that covering these topics to a satisfactory extent would make this book about 1,000 pages long. For this reason, we decided to split it into two books: Blue Fox and Red Fox. The Blue Fox edition covers the analyst view; teaching you everything you need to know to get started in reverse engineering. Without a solid under- standing of the fundamentals, you can’t move to more advanced topics such as vulnerability analysis and exploit development. The Red Fox edition will cover the offensive security view: understanding exploit mitigation internals, bypass techniques, and common vulnerability patterns. As of this writing, the Arm architecture reference manual for the Armv8-A architecture (and Armv9-A extensions) contains 11,952 pages1 and continues to expand. This reference manual was around 8,000 pages2 long when I started writing this book two years ago. Security researchers who are used to reverse engineering x86/64 binaries but want to adopt to the new era of Arm-powered devices are having a hard time finding digestible resources on the Arm instruction set, especially in the context of reverse engineering or binary analysis. Arm’s architecture reference manual can be both overwhelming and discouraging. In this day and age, nobody has time to read a 12,000-page deeply technical document, let alone identify 1 (version I.a.) https://developer.arm.com/documentation/ddi0487/latest 2 (version F.a.) https://developer.arm.com/documentation/ddi0487/latest xxi xxii Introduction the most relevant or most commonly used instructions and memorize them. The truth is that you don’t need to know every single Arm instruction to be able to reverse engineer an Arm binary. Many instructions have very specific use cases that you may or may not ever encounter during your analysis. The purpose of this book is to make it easier for people to get familiar with the Arm instruction set and gain enough knowledge to apply it in their professional lives. I spent countless hours dissecting the Arm reference manual and cate- gorizing the most common instruction types and their syntax patterns so you don’t have to. But this book isn’t a list of the most common Arm instructions. It contains explanations you won’t find anywhere else, not even in the Arm manual itself. The basic descriptions of a given instruction in the Arm manual are rather brief. That is fine for trivial instructions like MOV or ADD. However, many common instructions perform complex operations that are difficult to understand from their descriptions alone. For this reason, many of the instruc- tions you will encounter in this book are accompanied by graphical illustrations explaining what is actually happening under the hood. If you’re a beginner in reverse engineering, it is important to understand the binary’s file format, its sections, how it compiles from source code into machine code, and the environment it depends on. Because of limited space and time, this book cannot cover every file format and operating system. It instead focuses on Linux environments and the ELF file format. The good news is, regardless of platform or file format, Arm instructions are Arm instructions. Even if you reverse engineer an Arm binary compiled for macOS or Windows, the meaning of the instructions themselves remains the same. This book begins with an introduction explaining what instructions are and where they come from. In the second chapter, you will learn about the ELF file format and its sections, along with a basic overview of the compilation process. Since binary analysis would be incomplete without understanding the con- text they are executed in, the third chapter provides an overview of operating system fundamentals. With this background knowledge, you are well prepared to delve into the Arm architecture in Chapter 4. You can find the most common data processing instructions in Chapter 5, followed by an overview of memory access instructions in Chapter 6. These instructions are a significant part of the Arm architecture, which is also referred to as a Load/Store architecture. Chapters 7 and 8 dis- cuss conditional execution and control flow, which are crucial components of reverse engineering. Chapter 9 is where it starts to get particularly interesting for reverse engineers. Knowing the different types of Arm environments is crucial, especially when you perform dynamic analysis and need to analyze binaries during execution. With the information provided so far, you are already well equipped for your next reverse engineering adventure. To get you started, Chapter 10 includes an Introduction xxiii overview of the most common static analysis tools, followed by small practical static analysis examples you can follow step-by-step. Reverse engineering would be boring without dynamic analysis to observe how a program behaves during execution. In Chapter 11, you will learn about the most common dynamic analysis tools as well as examples of useful com- mands you can use during your analysis. This chapter concludes with two practical debugging examples: debugging a memory corruption vulnerability and debugging a process in GDB. Reverse engineering is useful for a variety of use cases. You can use your knowledge of the Arm instruction set and reverse engineering techniques to expand your skill set into different areas, such as vulnerability analysis or malware analysis. Reverse engineering is an invaluable skill for malware analysts, but they also need to be familiar with the environment a given malware sample was compiled for. To get you started in this area, this book includes a chapter on analyzing arm64 macOS malware (Chapter 12) written by Patrick Wardle, who is also the author of The Art of Mac Malware.3 Unlike previous chapters, this chapter does not focus on Arm assembly. Instead, it introduces you to common anti-analysis techniques that macOS malware uses to avoid being analyzed. The purpose of this chapter is to provide an introduction to macOS malware compatible with Apple Silicon (M1/M2) so that anyone interested in hunting and analyzing Arm-based macOS malware can get a head start. This book took a little over two years to write. I began writing in March 2020, when the pandemic hit and put us all in quarantine. Two years and a lot of sweat and tears later, I’m happy to finally see it come to life. Thank you for putting your faith in me. I hope that this book will serve as a useful guide as you embark on your reverse engineering journey and that it will make the process smoother and less intimidating. 3 https://taomm.org Par t I Arm Assembly Internals If you’ve just picked up this book from the shelf, you’re probably interested in learning how to reverse engineer compiled Arm binaries because major tech vendors are now embracing the Arm architecture. Perhaps you’re a seasoned veteran of x86-64 reverse engineering but want to stay ahead of the curve and learn more about the architecture that is starting to take over the processor market. Perhaps you’re looking to get started on security analysis to find vul- nerabilities in Arm-based software or analyze Arm-based malware. Or perhaps you’re just getting started in reverse engineering and have hit a point where a deeper level of detail is required to achieve your goal. Wherever you are on your journey into the Arm-based universe of reverse engineering, this book is about preparing you, the reader, to understand the language of Arm binaries, showing you how to analyze them, and, more impor- tantly, preparing you for the future of Arm devices. Learning assembly language and how to analyze compiled software is useful in a wide variety of applications. As with every skill, learning the syntax can seem difficult and complicated at first, but it eventually becomes easier with practice. In the first part of this book, we’ll look at the fundamentals of Arm’s main Cortex-A architecture, specifically the Armv8-A, and the main instructions you’ll encounter when reverse engineering software compiled for this platform. In the second part of the book, we’ll look at some common tools and techniques for reverse engineering. To give you inspiration for different applications of Arm- based reverse engineering, we will look at practical examples, including how to analyze malware compiled for Apple’s M1 chip. CHAPTER 1 Introduction to Reverse Engineering Introduction to Assembly If you’re reading this book, you’ve probably already heard about this thing called the Arm assembly language and know that understanding it is the key to analyzing binaries that run on Arm. But what is this language, and why does it exist? After all, programmers usually write code in high-level languages such as C/C++, and hardly anyone programs in assembly directly. High-level languages are, after all, far more convenient for programmers to program in. Unfortunately, these high-level languages are too complex for processors to interpret directly. Instead, programmers compile these high-level programs down into the binary machine code that the processor can run. This machine code is not quite the same as assembly language. If you were to look at it directly in a text editor, it would look unintelligible. Processors also don’t run assembly language; they run only machine code. So, why is it so important in reverse engineering? To understand the purpose of assembly, let’s do a quick tour of the history of computing to see how we got to where we are and how everything connects. Bits and Bytes Back in the mists of time when it all started, people decided to create com- puters and have them perform simple tasks. Computers don’t speak our human 3 4 Part I Arm Assembly Internals languages—they are just electronic devices after all—and so we needed a way to communicate with them electronically. At the lowest level, computers operate on electrical signals, and these signals are formed by switching electrical volt- ages between one of two levels: on and off. The first problem is that we need a way to describe these “ons” and “offs” for communication, storage, and simply describing the state of the system. Since there are two states, it was only natural to use the binary system for encoding these values. Each binary digit (or bit) could be 0 or 1. Although each bit can store only the smallest amount of information possible, stringing multiple bits together allows representation of much larger numbers. For example, the number 30,284,334,537 could be represented in just 35 bits as the following: 11100001101000101100100010111001001 Already this system allows for encoding large numbers, but now we have a new problem: where does one number in memory (or on a magnetic tape) end and the next one begin? This is perhaps a strange question to ask modern readers, but back when computers were first being designed, this was a serious problem. The simplest solution here would be to create fixed-size groupings of bits. Computer scientists, never wanting to miss out on a good naming pun, called this group of binary digits or bits a byte. So, how many bits should be in a byte? This might seem like a blindingly obvious question to our modern ears, since we all know that a modern byte is 8 bits. But it was not always so. Originally, different systems made different choices for how many bits would be in their bytes. The predecessor of the 8-bit byte we know today is the 6-bit Binary Coded Decimal Interchange Code (BCDIC) format for representing alphanumeric information used in early IBM computers, such as the IBM 1620 in 1959. Before that, bytes were often 4 bits long, and before that, a byte stood for an arbitrary number of bits greater than 1. Only later, with IBM’s 8-bit Extended Binary Coded Decimal Interchange Code (EBCDIC), introduced in the 1960s in its mainframe computer product line System/360 and which had byte-addressable memory with 8-bit bytes, did the byte start to standardize around having 8 bits. This then led to the adoption of the 8-bit storage size in other widely used computer systems, including the Intel 8080 and Motorola 6800. The following excerpt is from a book titled Planning a Computer System, pub- lished 1962, listing three main reasons for adopting the 8-bit byte1: 1. Its full capacity of 256 characters was considered to be sufficient for the great majority of applications. 1 Planning a Computer System, Project Stretch, McGraw-Hill Book Company, Inc., 1962. (http://archive.computerhistory.org/resources/text/IBM/Stretch/pdfs/ Buchholz_102636426.pdf) Chapter 1 Introduction to Reverse Engineering 5 2. Within the limits of this capacity, a single character is represented by a single byte, so that the length of any particular record is not dependent on the coincidence of characters in that record. 3. 8-bit bytes are reasonably economical of storage space. An 8-bit byte can hold one of 256 uniquely different values from 00000000 to 11111111. The interpretation of those values, of course, depends on the software using it. For example, we can store positive numbers in those bytes to represent a positive number from 0 to 255 inclusive. We can also use the two’s complement scheme to represent signed numbers from –128 to 127 inclusive. Character Encoding Of course, computers didn’t just use bytes for encoding and processing integers. They would also often store and process human-readable letters and numbers, called characters. Early character encodings, such as ASCII, had settled on using 7 bits per byte, but this gave only a limited set of 128 possible characters. This allowed for encoding English-language letters and digits, as well as a few symbol charac- ters and control characters, but could not represent many of the letters used in other languages. The EBCDIC standard, using its 8-bit bytes, chose a different character set entirely, with code pages for “swapping” to different languages. But ultimately this character set was too cumbersome and inflexible. Over time, it became clear that we needed a truly universal character set, sup- porting all the world’s living languages and special symbols. This culminated in the creation of the Unicode project in 1987. A few different Unicode encodings exist, but the dominant encoding used on the Web is UTF-8. Characters within the ASCII character -set are included verbatim in UTF-8, and “extended char- acters” can spread out over multiple consecutive bytes. Since characters are now encoded as bytes, we can represent characters using two hexadecimal digits. For example, the characters A, R, and M are normally encoded with the octets shown in Figure 1.1. Figure 1.1: Letters A, R, and M and their hexadecimal values Each hexadecimal digit can be encoded with a 4-bit pattern ranging from 0000 to 1111, as shown in Figure 1.2. 6 Part I Arm Assembly Internals Figure 1.2: Hexadecimal ASCII values and their 8-bit binary equivalents Since two hexadecimal values are required to encode an ASCII character, 8 bits seemed like the ideal for storing text in most written languages around the world, or a multiple of 8 bits for characters that cannot be represented in 8 bits alone. Using this pattern, we can more easily interpret the meaning of a long string of bits. The following bit pattern encodes the word Arm: 0100 0001 0101 0010 0100 1101 Machine Code and Assembly One uniquely powerful aspect of computers, as opposed to the mechanical cal- culators that predated them, is that they can also encode their logic as data. This code can also be stored in memory or on disk and be processed or changed on demand. For example, a software update can completely change the operating system of a computer without the need to purchase a new machine. We’ve already seen how numbers and characters are encoded, but how is this logic encoded? This is where the processor architecture and its instruction set comes into play. If we were to create our own computer processor from scratch, we could design our own instruction encoding, mapping binary patterns to machine codes that our processor can interpret and respond to, in effect, creating our own “machine language.” Since machine codes are meant to “instruct” the circuitry to perform an “operation,” these machine codes are also referred to as instruction codes, or, more commonly, operation codes (opcodes). In practice, most people use existing computer processors and therefore use the instruction encodings defined by the processor manufacturer. On Arm, instruction encodings have a fixed size and can be either 32-bit or 16-bit, depending on the instruction set in use by the program. The processor fetches and interprets each instruction and runs each in turn to perform the logic of the program. Each instruction is a binary pattern, or instruction encoding, which follows specific rules defined by the Arm architecture. By way of example, let’s assume we’re building a tiny 16-bit instruction set and are defining how each instruction will look. Our first task is to designate part of the encoding as specifying exactly what type of instruction is to be run, called the opcode. For example, we might set the first 7 bits of the instruction to be an opcode and specify the opcodes for addition and subtraction, as shown in Table 1.1. Chapter 1 Introduction to Reverse Engineering 7 Table 1.1: Addition and Subtraction Opcodes OPERATION OPCODE Addition 0001110 Subtraction 0001111 Writing machine code by hand is possible but unnecessarily cumbersome. In practice, we’ll want to write assembly in some human-readable “assembly language” that will be converted into its machine code equivalent. To do this, we should also define the shorthand for the instruction, called the instruction mnemonic, as shown in Table 1.2. Table 1.2: Mnemonics OPERATION OPCODE MNEMONIC Addition 0001110 ADD Subtraction 0001111 SUB Of course, it’s not sufficient to tell a processor to just do an “addition.” We also need to tell it what two things to add and what to do with the result. For example, if we write a program that performs “a = b + c,” the values of b and c need to be stored somewhere before the instruction begins, and the instruction needs to know where to write the result a to. In most processors, and Arm processors in particular, these temporary values are usually stored in registers, which store a small number of “working” values. Programs can pull data in from memory (or disk) into registers ready to be processed and can spill result data back to longer-term storage after processing. The number and naming conventions of registers are architecture-dependent. As software has become more and more complex, programs must often juggle larger numbers of values at the same time. Storing and operating on these values in registers is faster than doing so in memory directly, which means that registers reduce the number of times a program needs to access memory and result in faster execution. Going back to our earlier example, we were designing a 16-bit instruction to per- form an operation that adds a value to a register and writes the result into another register. Since we use 7 bits for the operation (ADD/SUB) itself, the remaining 9 bits can be used for encoding the source and the destination registers and a constant value we want to add or subtract. In this example, we split the remaining bits evenly and assign the shortcuts and respective machine codes shown in Table 1.3. 8 Part I Arm Assembly Internals Table 1.3: Manually Assigning the Machine Codes OPERATION MNEMONIC MACHINE CODE Addition ADD 0001110 Subtraction SUB 0001111 Integer value 2 #2 010 Operand register R0 000 Destination register R1 001 Instead of generating these machine codes by hand, we could instead write a little program that converts the syntax ADD R1, R0, #2 (R1 = R0 + 2) into the corresponding machine-code pattern and hand that machine-code pattern to our example processor. See Table 1.4. Table 1.4: Programming the Machine Codes HEXADECIMAL INSTRUCTION BINARY MACHINE CODE ENCODING ADD R1, R0, #2 0001110 010 000 001 0x1C81 SUB R1, R0, #2 0001111 010 000 001 0x1E81 The bit pattern we constructed represents one of the instruction encodings for 16-bit ADD and SUB instructions that are part of the T32 instruction set. In Figure 1.3 you can see its components and how they are ordered in the instruction encoding. Figure 1.3: 16-bit Thumb encoding of ADD and SUB immediate instruction Of course, this is just a simplified example. Modern processors provide hundreds of possible instructions, often with more complex subencodings. For example, Arm defines the load register instruction (with the LDR mnemonic) that loads a 32-bit value from memory into a register, as illustrated in Figure 1.4. In this instruction, the “address” to load is specified in register 2 (called R2), and the read value is written to register 3 (called R3). Chapter 1 Introduction to Reverse Engineering 9 Figure 1.4: LDR instruction loading a value from the address in R2 to register R3 The syntax of writing brackets around R2 indicates that the value in R2 is to be interpreted as an address in memory, rather than an ordinary value. In other words, we do not want to copy the value in R2 into R3, but rather fetch the con- tents of memory at the address given by R2 and load that value into R3. There are many reasons for a program to reference a memory location, including calling a function or loading a value from memory into a register. This is, in essence, the difference between machine code and assembly code. Assembly language is the human-readable syntax that shows how each encoded instruction should be interpreted. Machine code, by contrast, is the actual binary data ingested and processed by the actual processor, with its encoding specified precisely by the processor designer. Assembling Since processors understand only machine code, and not assembly language, how do we convert between them? To do this we need a program to convert our handwritten assembly instructions into their machine-code equivalents. The programs that perform this task are called assemblers. In practice, assemblers are capable not only of understanding and translating individual instructions into machine code but also of interpreting assembler direc- tives2 that direct the assembler to do other things, such as switch between data and code or assemble different instruction sets. Therefore, the terms assembly language and assembler language are just two ways of looking at the same thing. The syntax and meaning of individual assembler directives and expressions depend on the specific assembler. 2 https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html 10 Part I Arm Assembly Internals These directives and expressions are useful shortcuts that can be used in an assembly program; however, they are not strictly part of the assembly language itself, but rather are directions for how the assembler itself should operate. There are different assemblers available on different platforms, such as the GNU assembler as, which is also used to assemble the Linux kernel, the ARM Toolchain assembler armasm, or the Microsoft assembler with the same name (armasm) included in Visual Studio. Suppose, by way of example, we want to assemble the following two 16-bit instructions written in a file named myasm.s:.section.text.global _start _start:.thumb movs r1, #5 ldr r3, [r2] In this program, the first three lines are assembler directives. These tell the assembler information about where the data should be assembled (in this case, placed in the.text section), define the label of the entry point of our code (in this case, called _start) as a global symbol, and finally specify that the instruction encoding it should use should be Thumb. The Thumb instruction set (T32) is part of the Arm architecture and allows instructions to be 16-bit wide. We can use the GNU assembler, as, to compile this program on a Linux operating system machine running on an Arm processor. $ as myasm.s - o myasm.o The assembler reads the assembly language program myasm.s and creates an object file called myasm.o. This file contains 4 bytes of machine code corresponding to our two 2-byte instructions in hexadecimal. 05 10 a0 e3 00 30 92 e5 Another particularly useful feature of assemblers is the concept of a label, which references a specific address in memory, such as the address of a branch target, function, or global variable. Let’s take the assembly program as an example..section.text.global _start _start: mov r1, #5 mov r2, #6 b mylabel result: mov r0, r4 Chapter 1 Introduction to Reverse Engineering 11 b _exit mylabel: add r4, r1, r2 b result _exit: mov r7, #0 svc #0 This program starts by filling two registers with values and branches, or jumps, to the label mylabel to execute the ADD instruction. After the ADD instruction is executed, the program branches to the result label, executes the move instruction, and ends with a branch to the _exit label. The assembler will use these labels to provide hints to the linker that assigns relative memory locations to them. Figure 1.5 illustrates the program flow. Figure 1.5: Program flow of an example assembly program Labels are not only useful for referencing instructions to jump to but can also be used to fetch the contents of a memory location. For instance, the following assembly code snippet uses labels to fetch the contents from a memory location or jump to different instructions in the code:.section.text.global _start _start: 12 Part I Arm Assembly Internals mov r1, #5 // 1. fill r1 with value 5 adr r2, myvalue // 2. fill r2 with address of mystring ldr r3, [r2] // 3. fill r3 with value at address in r2 b mylabel // 4. jump to address of mylabel result: mov r0, r4 // 7. fill r0 with value in r4 b _exit // 8. Branch to address of _exit mylabel: add r4, r1, r3 // 5. fill r4 with result of r1 + r3 b result // 6. jump to result myvalue:.word 2 // word- sized value containing value 2 The ADR instruction loads the address of variable myvalue into register R2 and uses an LDR instruction to load the contents of that address into register R3. The program then branches to the instruction referenced by the label mylabel, executes an ADD instruction, and branches to the instruction referenced by the label result, as illustrated in Figure 1.6. Figure 1.6: Illustration of ADR and LDR instruction logic Chapter 1 Introduction to Reverse Engineering 13 As a slightly more interesting example, the following assembly code prints Hello World! to the console and then exits. It uses a label to reference the string hello by putting the relative address of its label mystring into register R1 with an ADR instruction..section.text.global _start _start: mov r0, #1 // STDOUT adr r1, mystring // R1 = address of string mov r2, #6 // R2 = size of string mov r7, #4 // R7 = syscall number for 'write()' svc #0 // invoke syscall _exit: mov r7, #0 svc #0 mystring:.string "Hello\n" After assembling and linking this program on a processor that supports the Arm architecture and the instruction set we use, it prints out Hello when executed. $ as myasm2.s - o myasm2.o $ ld myasm2.o - o myasm2 $./myasm2 Hello Modern assemblers are often incorporated into compiler toolchains and are designed to output files that can be combined into larger executable programs. For this reason, assembly programs usually don’t just convert assembly instructions directly into machine code, but rather create an object file, including the assembly instructions, symbol information, and hints for the compiler’s linker program, which is ultimately responsible for creating full executable files to be run on modern operating systems. Cross-Assemblers What happens if we run our Arm program on a different processor architecture? Executing our myasm2 program on an Intel x86-64 processor will result in an error telling us that the binary file cannot be executed due to an error in the executable format. user@ubuntu:~$./myasm bash:./myasm: cannot execute binary file: Exec format error 14 Part I Arm Assembly Internals We can’t run our Arm binary on an x64 machine because instructions are encoded differently on the two platforms. Even if we want to perform the same operation on different architectures, the assembly language and assigned machine codes can differ significantly. Let’s say you want to execute an instruction to move the decimal number 1 into the first register on three different processor archi- tectures. Even though the operation itself is the same, the instruction encoding and assembly language depends on the architecture. Take the following three general architecture types as an example: Armv8-A: 64-Bit Instruction Set (AArch64) d2 80 00 20 mov x0, #1 // move value 1 into register r0 Armv8-A: 32-Bit Instruction Set (AArch32) e3 a0 00 01 mov r0, #1 // move value 1 into register r0 Intel x86-64 Instruction Set b8 01 00 00 00 mov rax, 1 // move value 1 into register rax Not only is the syntax different, but also the corresponding machine code bytes differ significantly between different instruction sets. This means that machine code bytes assembled for the Arm 32-bit instruction set have an entirely different meaning on an architecture with a different instruction set (such as x64 or A64). The same is true in reverse. The same sequence of bytes can have significantly different interpretations on different processors, for example: Armv8-A: 64-Bit Instruction Set (AArch64) d2 80 00 20 mov x0, #1 // move value 1 into register x0 Armv8-A: 32-Bit Instruction Set (AArch32) d2 80 00 20 addle r0, r0, #32 // add value 32 to r0 if LE = true In other words, our assembly program needs to be written in the assembly language of the architecture we want it to run on and must be assembled with an assembler that supports this instruction set. Perhaps counterintuitively, however, it is possible to create Arm binaries without using an Arm machine. The assembler itself will need to know about the Arm syntax, of course, but if that assembler is itself compiled for x64, then running it on an x64 machine will let you create Arm binaries. This is called a cross-assembler and allows you to assemble your code for a different target architecture than the one you are currently working on. Chapter 1 Introduction to Reverse Engineering 15 For example, you can download an assembler for AArch32 on an x86-64 Ubuntu machine and assemble your code from there. user@ubuntu:~$ arm-linux-gnueabihf-as myasm.s -o myasm.o user@ubuntu:~$ arm-linux-gnueabihf-ld myasm.o -o myasm Using the Linux command “file,” we can see that we created a 32-bit ARM executable file. user@ubuntu:~$ file myasm myasm: ELF 32- bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped High-Level Languages So, why has assembly language not become the dominant programming language for writing software? One major reason is that assembly language is not portable. Imagine having to rewrite your entire application codebase for each processor architecture you want to support! That’s a lot of work. Instead, newer languages have evolved that abstract such processor-specific details away, allowing the same program to be easily compiled for multiple different architectures. These languages are often called higher-level languages, in contrast to the low-level language of assembly that is closer to the hardware and architecture of a specific computer. The term high-level here is inherently relative. Originally, C and C++ were considered high-level languages, and assembly was considered the low-level language. Since newer, more abstract languages have emerged, such as Visual Basic or Python, C/C++ is often referred to as low-level. Ultimately, it depends on the perspective and who you ask. As with assembly language, processors do not understand high-level source code directly. Programmers need to convert their high-level programs into machine code using a compiler. As before, we still need to specify which architecture the binary will run on, and as before we can create Arm-binaries from non-Arm systems by making use of a cross-compiler. The output of a compiler is typically an executable file that can be run on a given operating system, and it is these binary executable files, rather than the source code of the program, that are typically distributed to customers. For this reason, often when we want to analyze a program, all we have is the compiled executable file itself. Unfortunately for reverse engineers, it is usually not possible to reverse the compilation process back to the original source code. Not only are compilers hideously complex programs with many layers of iteration and abstraction 16 Part I Arm Assembly Internals between the original source code and the resulting binary, but also many of these steps drop the human-readable information that makes the program easy for programmers to reason about. Without the source code of the software we want to analyze, we have broadly two options depending on the level of detail our analysis requires: decompiling or disassembling the executable file. Disassembling The process of disassembling a binary includes reconstructing the assembly instructions that the binary would run from their machine-code format into a human-readable assembly language. The most common use cases for disassembly include malware analysis, validation of compiler performance and output accu- racy, and vulnerability analysis and exploit or proof-of-concept development against defects in closed-source software. Of these, exploit development is perhaps the most sensitive to needing anal- ysis of the actual assembly code. Where vulnerability discovery can often be done with techniques such as fuzzing, building exploits from detected crashes or discovering why certain areas of code are not being reached by fuzzers often requires significant assembly knowledge. Here, intimate knowledge of the exact conditions of the vulnerability by reading assembly code is critical. The exact choices of how compilers allocate variables and data structures are often critical to developing exploits, and it is here that in-depth assembly knowledge truly is required. Often a seemingly “unexploitable” vulnerability might, in fact, be exploitable with a bit of crea- tivity and hard work invested in truly understanding the inner mechanics of how a vulnerable function works. Disassembling an executable file can be done in multiple ways, and we will look at this in more detail in the second part of this book. But, for now, one of the simplest tools to quickly look at the disassembly output of an executable file is the Linux tool objdump.3 Let’s compile and disassemble the following write() program: #include int main(void) { write(1, "Hello!\n", 7); } We can compile this code with GCC and specify the -c option. This option tells GCC to create the object file without invoking the linking process, so we 3 https://web.mit.edu/gnu/doc/html/binutils_5.html Chapter 1 Introduction to Reverse Engineering 17 can then run objdump on just our compiled code without seeing the disassembly of all the surrounding object files such as a C runtime. The disassembly output of the main function is as follows: user@arm32:~$ gcc - c hello.c user@arm32:~$ objdump - d hello.o Disassembly of section.text: 00000000 : 0:b580 push{r7, lr} 2:af00 addr7, sp, #0 4:2207 movsr2, #7 6:4b04 ldrr3, [pc, #16]; (18 ) 8:447b addr3, pc a:4619 movr1, r3 c:2001 movsr0, #1 e:f7ff fffe bl0 12:2300 movsr3, #0 14:4618 movr0, r3 16:bd80 pop{r7, pc} 18:0000000c.word0x0000000c While Linux utilities like objdump are useful for quickly disassembling small programs, larger programs require a more convenient solution. Various disas- semblers exist to make reverse engineering more efficient, ranging from free open source tools, such as Ghidra,4 to expensive solutions like IDA Pro.5 These will be discussed in the second part of this book in more detail. Decompilation A more recent innovation in reverse engineering is the use of decompilers. Decompilers go a step further than disassemblers. Where disassemblers simply show the human-readable assembly code of the program, decompilers try to regenerate equivalent C/C++ code from a compiled binary. One value of decompilers is that they significantly reduce and simplify the disas- sembled output by generating pseudocode. This can make it easier to read when skimming over a function to see at a broad-strokes level what the program is up to. The flipside to this, of course, is that important details can also get lost in the process. Additionally, since compilers are inherently lossy in their conversion from source code to executable file, decompilers cannot fully reconstruct the 4 https://ghidra-sre.org 5 https://hex-rays.com/ida-pro 18 Part I Arm Assembly Internals original source code. Symbol names, local variables, comments, and much of the program structure are inherently destroyed by the compilation process. Similarly, attempts to automatically name or relabel local variables and parameters can be misleading if storage locations are reused by an aggressively optimizing compiler. Let’s look at an example C function, compile it with GCC, and then decom- pile it with both IDA Pro’s and Ghidra’s decompilers to show what this looks like in practice. Figure 1.7 shows a function called file_record in the ihex2fw.c6 file from the Linux source code repository. Figure 1.7: Source code of file_record function in the ihex2fw.c source file After compiling the C file on an Armv8-A architecture (without any specific compiler options) and loading the executable file into IDA Pro 7.6, Figure 1.8 shows the pseudocode for the previous function generated by the decompiler. Figure 1.8: IDA 7.6 decompilation output of the compiled file_record function 6 https://gitlab.arm.com/linux-arm/linux-dm/-/blob/ 56299378726d5f2ba8d3c8cbbd13cb280ba45e4f/firmware/ihex2fw.c Chapter 1 Introduction to Reverse Engineering 19 In Figure 1.9 you can see the same function decompiled by Ghidra 10.0.4. In both cases we can sort of see the ghost of the original code if we squint hard enough at it, but the code is vastly less readable and far less intuitive than the original. In other words, while there are certainly many cases when decompilers can give us a quick high-level overview of a program, it is certainly no panacea and is no substitute for being able to dive in to the assembly code of a given program. Figure 1.9: Ghidra 10.0.4. decompilation output of the compiled file_record function That said, decompilers are constantly evolving and are becoming better at reconstructing source code, especially for simple functions. Using decompiler output of functions you want to reverse engineer at a higher level is a useful aid, but don’t forget to peek into the disassembly output when you are trying to get a more in-depth view of what’s going on. CHAPTER 2 ELF File Format Internals This chapter serves as a reference for understanding the basic compilation pro- cess and ELF file format internals. If you are already familiar with its concepts, you can skip this chapter and use it as a reference for details you might need during your analysis. Program Structure Before diving into assembly instructions and how to reverse engineer program binaries, it’s worth looking at where those program binaries come from in the first place. Programs start out as source code written by software developers. The source code describes to a computer how the program should behave and what com- putations the program should take under various input conditions. The programming language used by the programmer is, to a large extent, a preference choice by the programmer. Some languages are well suited to mathematical and machine learning problems. Some are optimized for website development or building smartphone applications. Others, like C and C++, are flexible enough to be used for a wide range of possible application types, from low-level systems software such as device drivers and firmware, through system services, right up to large-scale applications like video games, web-browsers, 21 22 Part I Arm Assembly Internals and operating systems. For this reason, many of the programs we encounter in binary analysis start life as C/C++ code. Computers do not execute source code files directly. Before the program can be run, it must first be translated into the machine instructions that the pro- cessor knows how to execute. The programs that perform this translation are called compilers. On Linux, GCC is a commonly used collection of compilers, including a C compiler for converting C code into ELF binaries that Linux can load and run directly. G++ is its counterpart for compiling C++ code. Figure 2.1 shows a compilation overview. Figure 2.1: Overview of compilation Reverse engineering is, in a sense, performing the inverse task of the compiler. In reverse engineering, we start with a program binary and work backwards, trying to reverse engineer what the programmer intended the program to do at a higher level. For this reason, it is useful to understand the components of the ELF file format and their purpose. High-Level vs. Low-Level Languages C and C++ are often described as high-level languages because they allow a pro- grammer to define the program’s structure and behavior without direct reference to the machine architecture itself. A programmer can write their C/C++ code in terms of abstract programming concepts like if-else blocks, while loops, and programmer-named local variables, without thinking about how those variables will eventually be mapped to machine registers, memory locations, or specific machine instructions in the resulting code. This abstraction is usually very beneficial to programmers. These programmer abstractions and high-level program flow concepts often make programming in C/C++ far faster and less error-prone than writing equivalent programs directly in assembly code. Additionally, because C and C++ are not strongly coupled to a specific machine architecture, it is possible to compile the same C/C++ code to run on multiple different target processors. The C++ programming language differs from C through the addition of large amounts of new syntax, programming features, and high-level abstractions that make writing large-scale programs easier and faster. For example, C++ adds direct language support for object-orientated programming and makes constructors, Chapter 2 ELF File Format Internals 23 destructors, and object creation a direct part of the language itself. C++ also introduces programming abstractions such as interfaces, C++ exceptions, and operator overloading, as well as introducing additional compile-time checking of program correctness with a stronger type checking system and template support than is possible in the original C programming language. By convention, C and C++ programs begin their core program logic at the main function. This function normally processes the command-line arguments of the program, prepares the program for execution, and then sets about the core program logic itself. For command-line programs, this may involve processing files and input/output streams. Graphical programs can also process files and input streams but will often also create windows, draw graphics to the screen for the user to interact with, and set up event handlers to respond to user input. In contrast to high-level languages like C and C++, programmers can also opt to use a low-level “assembly language” for writing their code. These assem- bly languages are strongly coupled to the target processor they are written for but give programmers much more flexibility to specify exactly which machine instructions should be run by the processor and in which order. There are a wide variety of reasons why a programmer might choose to write all or parts of their program in a low-level language beyond just personal preference. Table 2.1 gives a few use cases for low-level languages. Table 2.1: Programming in Assembly Use Cases USE CASE EXAMPLES Hardware-specific code that operates OS and hypervisor exception handlers outside of the standard C/C++ programmer’s model Firmware code Code with strict restrictions on binary size, Firmware boot-sequences and self-test with limited instruction availability, or that routines needs to run before critical parts of the hardware are initialized OS and hypervisor bootloaders and initialization sequences Shellcode for use in exploit development Accessing special-purpose instructions that Access to hardware cryptographic C/C++ compilers will not normally generate. instructions Performance-critical low-level library memcpy functions where hand-written assembly will be more efficient than compiler-generated memset assembly Library functions that do not use the setjmp standard C/C++ ABI, or violate C/C++ ABI semantics longjmp C++ exception handling internals Continues 24 Part I Arm Assembly Internals Table 2.1 (continued) USE CASE EXAMPLES Compiler and C-runtime internal routines PLT stubs (for lazy-symbol loading) that do not use the standard C/C++ ABI C runtime initialization sequence System call invocation stubs Built-in compiler intrinsics Debugging and hooking programs Detouring functions for analysis or to change program behavior Breakpoint injection routines used by debuggers Thread injection routines Before looking at how low-level languages are assembled, let’s first look at how compilers convert programs written in high-level languages like C/C++ into low-level assembly. The Compilation Process The core job of the compiler is to translate a program written in a high-level lan- guage like C/C++ into an equivalent program in a low-level language like the A64 instruction set as part of the Armv8-A architecture.1 Let’s start off with a simple example program written in C. #include #define GREETING "Hello" int main(int argc, char** argv) { printf("%s ", GREETING); for(int i = 1; i < argc; i++) { printf("%s", argv[i]); if(i != argc -1) printf(" and "); } printf("!\n"); return 0; } On Linux, a common C compiler is the GNU Compiler Collection, GCC. By default, GCC does not merely compile a C program to assembly code; it also 1 https://developer.arm.com/documentation/ddi0487/latest Chapter 2 ELF File Format Internals 25 manages the whole compilation process, assembling and linking the resulting output and producing a final ELF program binary that can be directly executed by the operating system. We can invoke GCC to create a program binary from our source code via the following command line: user@arm64:~$ gcc main.c - o example.so We can also direct the GCC compiler driver to give us details about what is happening behind the scenes by using the –v directive, as follows: user@arm64:~$ gcc main.c - o example.so - v The output from this command is large, but if we scroll near the end of the output, we can see that, toward the end of the process, GCC invokes the assem- bler on an assembly file emitted to a temporary location, such as the following: user@arm64:~$ as - v - EL - mabi=lp64 - o /tmp/e.o /tmp/.s This is because GCC is a collection of compilers. The C compiler itself turns C code into an assembly code listing, and this is sent to the assembler to be converted into an object file and later linked into a target binary. We can intercept this assembly listing to view what the compiler itself is gen- erating using the command-line option –S, e.g., invoking gcc main.c -S. GCC will then compile our program in main.c into an assembly listing and write it to the file main.s. Since C++ is, for the most part, a superset of the C language, we can also compile this same example as if it were C++. Here, we use the C++ compiler g++ to compile our code to a target binary via the command line: user@arm64:~$ g++ main.cpp - o example.so We can also direct g++ to output its assembly listing via the –S command-line option, i.e., via g++ main.cpp - S. If we allow GCC to run to completion, it will eventually output an execut- able ELF file that can be directly executed from the command line. For example, we can run the program with the two command-line options Arm-devs and reverse-engineers, and the program will print its output back to the console as follows: user@arm64:~$./example.so Arm- devs reverse- engineers Hello Arm- devs and reverse- engineers! Cross-Compiling for Other Architectures One of the main benefits of writing a program in a high-level language like C/C++ is that the source code is not, by default, strongly coupled to a specific processor 26 Part I Arm Assembly Internals architecture. This allows the same program source code to be compiled to run on different target platforms. In its default configuration, GCC and G++ will create target binaries designed to run on the same machine architecture that we are compiling from. For example, if we run gcc main.c - o example.so on a 64-bit Arm Linux machine, the resulting example.so binary will be an ELF binary designed to run on 64-bit Arm machines. If we were to run the same command on a Linux machine running x86_64, the resulting binary will be designed to run on x86_64 machines. One way to view the architecture that an ELF binary is compiled to is via the file command, as follows: user@arm64:~$ file example.so example.so: ELF 64- bit LSB pie executable, ARM aarch64, version 1 (SYSV)... user@x64:~$ file example.so example.so: ELF 64- bit LSB pie executable, x86- 64, version 1 (SYSV)... Normally, generating a program binary that matches the system we are running on is a helpful feature—we usually want the compiler to produce binaries that we can immediately run on our development machine. But what if our development machine isn’t the same architecture as our target machine? For example, what if our development machine is x86_64-based, but we want to create a target binary designed to run on a 64-bit Arm processor? For these scenarios we need to use a cross-compiler. The packages listed in Table 2.2 are the most commonly used Arm cross- compilers for GCC and G++ for creating binaries that can run on 32-bit and 64-bit Arm-based Linux machines. Table 2.2: GCC Cross-Compilers PACKAGE NAME PURPOSE gcc-aarch64-linux-gnu AArch64 C compiler g++-aarch64-linux-gnu AArch64 C++ compiler gcc-arm-linux-gnueabihf AArch32 C compiler g++-arm-linux-gnueabihf AArch32 C++ compiler On systems that use apt-get as the main package manager we can install these cross-compilers for Arm via the following command: user@x64:~$ sudo apt-get install gcc-aarch64-linux-gnu g++-aarch64- linux-gnu gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf Chapter 2 ELF File Format Internals 27 Having installed these cross-compilers, we can now generate 32-bit and 64-bit Arm binaries directly from a development machine running a different architecture. We do so by replacing gcc with its target-specific alternative. For example, an x86_64 machine can create a 64-bit Arm binary from C or C++ code as follows: user@x64:~$ aarch64-linux-gnu-gcc main.c -o a64.so user@x64:~$ aarch64-linux-gnu-g++ main.cpp -o a64.so We can create target binaries for 32-bit Arm systems in much the same way, just using the 32-bit Arm cross-compilers as follows: user@x64:~$: arm-linux-gnueabihf-gcc main.c -o a32.so user@x64:~$: arm-linux-gnueabihf-g++ main.cpp -o a32.so If we check these output binaries with file, we can see that these program binaries are compiled for 64-bit and 32-bit Arm, respectively. user@x64:~$ file a64.so a64.so: ELF 64-bit LSB pie executable, ARM aarch64, version 1 (SYSV),... user@x64:~$ file a32.so a32.so: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV),... Assembling and Linking Compilers and programmers writing assembly by hand create assembly list- ings that are the input to an assembler. The jobs of the assembler is to convert human-readable descriptions of machine instructions into their equivalent binary-encoded instructions and to output data and metadata for the program into other sections of the program binary as manually directed by the pro- grammer or compiler. The output of the assembler is an object file. Object files are encoded as ELF files, although it is perhaps better to think of these object files as partial ELF files that must be combined into a whole via a final linking process to create the final executable target binary. By convention, assembly code is written in.s files, and we can assemble these files into an object file using an assembler, such as the GNU Assembler (GAS), which is part of the GCC/G++ suite of tools. In later chapters in this book, we will see what instructions are available on the Armv8-A architecture and how they work. For now, however, it is useful to define a couple of template assembly programs that you can use to create basic assembly programs yourself. 28 Part I Arm Assembly Internals The following program is a simple assembly program that uses the write() system call to print a string and exits. The first three lines define the architecture, section, and end global entry point of the program. The write() function takes three arguments: a file descriptor, a pointer to a buffer where the data (e.g., string) is stored, and the number of bytes to write from the buffer. These are specified in the first three registers: x0, x1, and x2. Register x8 should hold the syscall number of the write system call, and the SVC instruction invokes it. The ascii string can be placed at the end of the.text section (in the so-called literal pool) or within a.data or rodata section. Template A64 Assembly Program write64.s.arch armv8-a // This program is a 64- bit Arm program for armv8-a.section.text // Specify the.text section to write code.global _start // Define _start as a global entry symbol _start: // Specify defined entry point mov x0, #1 // First argument to write() ldr x1, =mystring // Second arg: address of mystring mov x2, #12 // Thrid arg: string length mov x8, #64 // Syscall number of write() svc #1 // Invoke write() function mov x0, #0 // First arg to exit() function mov x8, #93 // Syscall number of exit() svc #1 // Invoke exit() function mystring: // Define mystring label for reference.asciz "Hello world\n" // Specify string as null- terminated ascii We can also use library functions to achieve the same result. The following programs both perform the same basic task—one for 64-bit Arm and the other for 32-bit Arm. They both define a _start function in the.text section of the resulting ELF file and place a zero-terminated string Hello world\n in the.rodata (read-only data) section of the resulting binary. The main function in both cases loads the address of this string into a register, calls printf to output the string to the console, and then calls exit(0) to exit the program. Template A64 Assembly Program print64.s.arch armv8- a // Define architecture.text // Begin.text section.global main // Define global symbol main main: // Start of the main function ldr x0, =MYSTRING // Load the address of MYSTRING into x0 Chapter 2 ELF File Format Internals 29 bl printf // Call printf to print the string mov x0, #0 // Move the value #0 into x0 bl exit // Call exit(0).section.rodata // Define the.rodata section for the string.balign 8 // Align our string to an 8- byte boundary MYSTRING: // Define the MYSTRING label.asciz "Hello world\n" // Null- terminated ascii string Template A32 Assembly Program print32.s.arch armv7- a // Define architecture.section.text // Begin.text section.global _start // Define global symbol main _start: // Start of the main function ldr r0, =MYSTRING // Load the address of MYSTRING into x0 bl printf // Call printf to print the string mov r0, #0 // Move the value #0 into x0 bl exit // Call exit(0).section.rodata // Define the.rodata section for the string.balign 8 // Align our string to an 8- byte boundary MYSTRING: // Define the MYSTRING label.asciz "Hello world\n" // Null- terminated ascii string If our development machine matches the architecture we are compiling for, we can assemble these programs directly using AS, as shown here: user@arm64:~$ as print64.s - o print64.o user@arm64:~$ as write64.s - o write64.o If our development machine does not match the target architecture, we can instead use GCC’s cross-compiler versions of AS. user@x86-64:~$ aarch64-linux-gnu-as print64.s -o print64.o user@x86-64:~$ aarch64-linux-gnu-as write64.s -o write64.o user@x86-64:~$ arm-linux-gnueabihf-as print32.s -o print32.o Attempting to run an object file directly will not normally work. First, we must link the binary. In the GCC suite, the linker binary is called ld (or aarch64-linux-gnu-ld and arm-linux-gnueabihf-ld as the case may be). We must provide to the linker all of the object files to create a full program binary and then specify the output file of the linker using the -o option. 30 Part I Arm Assembly Internals For the w r i t e 6 4. s program, we need only one object file named write64.o without specifying any additional libraries and can run it directly. user@arm64:~$ ld write64.o - o write64 user@arm64:~$./write Hello world When our assembly program uses specific library functions, as opposed to system calls directly, we need to include the necessary object files. For our printf64.s example, we specify print64.o as an input object file, but we also need to include several other object files before our program will run. One is libc.so, so our program can access the libc library functions printf and exit. Additionally, we need three object files that together form the C Runtime, needed to bootstrap the process prior to our function main being called. Table 2.3 describes the object dependencies we need. Table 2.3: Needed Object Files and Their Purpose OBJECT FILE PURPOSE /usr/lib/aarch64-linux-gnu/crt1.o Implements the C runtime stubs that implements the _start function that /usr/lib/aarch64-linux-gnu/crti.o bootstraps the program, runs global /usr/lib/aarch64-linux-gnu/crtn.o C++ constructors, and then calls the program’s main function /usr/lib/aarch64-linux-gnu/libc.so The C runtime library export stubs needed to bootstrap the program and that references the printf and exit functions that our program uses The final linker command line will therefore be the following: user@arm64:~$ ld print64.o /usr/lib/aarch64- linux-gnu/crt1.o /usr/lib/ aarch64-linux-gnu/crti.o /usr/lib/aarch64-linux-gnu/crtn.o /usr/lib/ aarch64-linux-gnu/libc.so -o print64.so The resulting target binary, print64.so, can then be run on a 64-bit Arm machine. user@arm64:~$./print64.so Hello world! The ELF File Overview The final output of the compilation and linking process is an Executable and Linkable Format (ELF) file, which contains all the information needed for the Chapter 2 ELF File Format Internals 31 operating system and loader to load and run the program. At the most abstract level, an ELF file can be thought of as a collection of tables describing the program and how to get it to run. In the ELF format, three types of tables exist: the ELF file header, which is at the start of the file, along with the program headers and the section headers that describe how to load the ELF program into memory and the logical sections of the ELF file that tell the loader how to prepare it for execution. The ELF File Header At the beginning of the ELF file is the ELF file header. The ELF header describes global attributes of the program, such as the architecture that the program is designed to run on, the program entry point, and the pointers and sizes to the other tables in the file. Given an ELF file, such as the print32.so and print64.so programs we assembled and linked earlier in the “Assembling and Linking” section, we can view these attributes and sections using a program such as readelf. The ELF file header can be viewed by using the –h parameter to readelf as follows: user@arm64:~$ readelf print64.so -h ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX -System V ABI Version: 0 Type: DYN (Shared object file) Machine: AArch64 Version: 0x1 Entry point address: 0x6a0 Start of program headers: 64 (bytes into file) Start of section headers: 7552 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 9 Size of section headers: 64 (bytes) Number of section headers: 29 Section header string table index: 28 user@arm64:~$ readelf print32.so -h ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) 32 Part I Arm Assembly Internals OS/ABI: UNIX -System V ABI Version: 0 Type: DYN (Shared object file) Machine: ARM Version: 0x1 Entry point address: 0x429 Start of program headers: 52 (bytes into file) Start of section headers: 7052 (bytes into file) Flags: 0x5000400, Version5 EABI, hard- float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 9 Size of section headers: 40 (bytes) Number of section headers: 29 Section header string table index: 28 The ELF file header fields subdivide into four main groups: the ELF file header information, information about the program’s target platform, the program entry point field, and the table location fields. The ELF File Header Information Fields The first of these groups tells the loader what type of ELF file this is and begins with the magic field. The magic field is a constant 16-byte binary pattern, called the ident pattern, indicating that the file is itself a valid ELF file. It always starts with the same 4-byte sequence, beginning with byte 0x7f followed by 3 bytes corresponding to the ASCII characters ELF. The class field tells the loader whether the ELF file itself uses the 32-bit or 64- bit ELF file format. Normally, 32-bit programs use the 32-bit format, and 64-bit programs use the 64-bit format. In our example, we can see that this is the case for programs on Arm: our 32-bit Arm binary uses the 32-bit ELF file format, and our 64-bit one uses the 64-bit format. The data field tells the loader that the ELF file’s own fields should be read as either big-or little-endian. ELF files on Arm normally use the little-endian encoding for the ELF file format itself. We will see later in this book how endi- anness works and how the processor can sometimes dynamically swap bet- ween little-and big-endian modes. For now, however, it is sufficient to know that this field only changes how the operating system and loader read the ELF file structures; this field does not change how the processor will behave when running the program. Finally, the version field tells the loader that we are using version 1 of the ELF file format. This field is designed to future-proof the ELF file format. Chapter 2 ELF File Format Internals 33 The Target Platform Fields The next set of fields tells the loader what type of machine the ELF file is designed to run on. The machine field tells the loader what processor class the program is designed to run on. Our 64-bit program sets this field to AArch64, indicating that the ELF file will run only on 64-bit Arm processors. Our 32-bit program specifies ARM, which means it will run only on 32-bit Arm processors or as a 32-bit process on a 64-bit Linux machine using the processor’s 32-bit AArch32 execution mode. The flags field specifies additional information that might be needed by the loader. This field is architecture-specific. In our 64-bit program, for example, no architecture-specific flags are defined, and this f