Chapter 1 - Foundations PDF
Document Details
Uploaded by EarnestRosemary
Tags
Summary
This chapter introduces reverse engineering, a process of extracting knowledge and design blueprints from man-made artifacts. It explains how this is similar to scientific research, focusing on the extraction of information for software engineering application purposes.
Full Transcript
# Chapter 1 - Foundations This chapter provides some background information on reverse engineering and the various topics discussed throughout this book. ## What Is Reverse Engineering? Reverse engineering is the process of extracting the knowledge or design blueprints from anything man-made. Th...
# Chapter 1 - Foundations This chapter provides some background information on reverse engineering and the various topics discussed throughout this book. ## What Is Reverse Engineering? Reverse engineering is the process of extracting the knowledge or design blueprints from anything man-made. The concept has been around since long before computers or modern technology, and probably dates back to the days of the industrial revolution. It is very similar to scientific research, in which a researcher is attempting to work out the "blueprint" of the atom or the human mind. The difference between reverse engineering and conventional scientific research is that with reverse engineering the artifact being investigated is man-made, unlike scientific research where it is a natural phenomenon. Reverse engineering is usually conducted to obtain missing knowledge, ideas, and design philosophy when such information is unavailable. ## Software Reverse Engineering: Reversing Software is one of the most complex and intriguing technologies around us nowadays, and software reverse engineering is about opening up a program's "box," and looking inside. Of course, we won't need any screwdrivers on this journey. Just like software engineering, software reverse engineering is a purely virtual process, involving only a CPU, and the human mind. Software reverse engineering requires a combination of skills and a thorough understanding of computers and software development, but like most worthwhile subjects, the only real prerequisite is a strong curiosity and desire to learn. Software reverse engineering integrates several arts: code breaking, puzzle solving, programming, and logical analysis. The process is used by a variety of different people for a variety of different purposes, many of which will be discussed throughout this book. ## Reversing Applications It would be fair to say that in most industries reverse engineering for the purpose of developing competing products is the most well-known application of reverse engineering. The interesting thing is that it really isn't as popular in the software industry as one would expect. There are several reasons for this, but it is primarily because software is so complex that in many cases reverse engineering for competitive purposes is thought to be such a complex process that it just doesn't make sense financially. ## Security-Related Reversing For some people the connection between security and reversing might not be immediately clear. Reversing is related to several different aspects of computer security. * **Encryption research** - A researcher reverses an encryption product and evaluates the level of security it provides. * **Malicious Software** - Reversing is also heavily used in connection with malicious software, on both ends of the fence; it is used by both malware developers and those developing the antidotes. * **Copy Protection** - Reversing is very popular with crackers who use it to analyze and eventually defeat various copy protection schemes. These are all discussed in the sections that follow. ## Malicious Software The Internet has completely changed the computer industry in general and the security-related aspects of computing in particular. Malicious software, such as viruses and worms, spreads so much faster in a world where millions of users are connected to the Internet and use e-mail daily. * **Old method of virus spreading** – a virus would usually have to copy itself to a diskette and that diskette would have to be loaded into another computer in order for the virus to spread. The infection process was fairly slow, and defense was much simpler because the channels of infection were few and required human intervention for the program to spread. * **Modern method of virus spreading** – The Internet has created a virtual connection between almost every computer on earth. Nowadays modern worms can spread automatically to millions of computers without any human intervention. Reversing is used extensively in both ends of the malicious software chain. * **Malware Developers** – Developers of malicious software often use reversing to locate vulnerabilities in operating systems and other software. Such vulnerabilities can be used to penetrate the system's defense layers and allow infection—usually over the Internet. Beyond infection, culprits sometimes employ reversing techniques to locate software vulnerabilities that allow a malicious program to gain access to sensitive information or even take full control of the system. * **Antivirus developers** – At the other end of the chain, developers of antivirus software dissect and analyze every malicious program that falls into their hands. They use reversing techniques to trace every step the program takes and assess the damage it could cause, the expected rate of infection, how it could be removed from infected systems, and whether infection can be avoided altogether. ## Reversing Cryptographic Algorithms Cryptography has always been based on secrecy: Alice sends a message to Bob, and encrypts that message using a secret that is (hopefully) only known to her and Bob. Cryptographic algorithms can be roughly divided into two groups: restricted algorithms and key-based algorithms. * **Restricted algorithms** – are the kind some kids play with; writing a letter to a friend with each letter shifted several letters up or down. Once the algorithm is exposed, it is no longer secure. Restricted algorithms provide very poor security because reversing makes it very difficult to maintain the secrecy of the algorithm. Once reversers get their hands on the encrypting or decrypting program, it is only a matter of time before the algorithm is exposed. * **Key-based algorithms** – The secret is a key, some numeric value that is used by the algorithm to encrypt and decrypt the message. In key-based algorithms users encrypt messages using keys that are kept private. This almost makes reversing pointless because the algorithm is already known. To decipher a message encrypted with a key-based cipher, you would have to either: * Obtain the key * Try all possible combinations until you get to the key * Look for a flaw in the algorithm that can be employed to extract the key or the original message ## Digital Rights Management Modern computers have turned most types of copyrighted materials into digital information. Music, films, and even books, which were once only available on physical analog mediums, are now available digitally. This trend is a mixed blessing, providing huge benefits to consumers, and huge complications to copyright owners and content providers. * **Benefits for consumers** – materials have increased in quality, and become easily accessible and simple to manage * **Benefits for providers** – it has enabled the distribution of high-quality content at low cost, but more importantly, it has made controlling the flow of such content an impossible mission. Digital information is incredibly fluid. It is very easy to move around and can be very easily duplicated. This fluidity means that once the copyrighted materials reach the hands of consumers, they can be moved and duplicated so easily that piracy almost becomes common practice. Traditionally, software companies have dealt with piracy by embedding copy protection technologies into their software. These are additional pieces of software embedded on top of the vendor's software product that attempt to prevent or restrict users from copying the program. In recent years, as digital media became a reality, media content providers have developed or acquired technologies that control the distribution of content such as music, movies, etc. These technologies are collectively called digital rights management (DRM) technologies. DRM technologies are conceptually very similar to traditional software copy protection technologies discussed above. The difference is that with software, the thing which is being protected is active or “intelligent," and can decide whether to make itself available or not. Digital media is a passive element that is usually played or read by another program, making it more difficult to control or restrict usage. ## Auditing Program Binaries One of the strengths of open-source software is that it is often inherently more dependable and secure. Regardless of the real security it provides, it just feels much safer to run software that has often been inspected and approved by thousands of impartial software engineers. Needless to say, open-source software also provides some real, tangible quality benefits. ## Reversing in Software Development Reversing can be incredibly useful to software developers. * **Discover how to interoperate with undocumented or partially documented software** * **Determine the quality of third-party code** * **Extract valuable information from a competitor's product for the purpose of improving your own technologies** ## Achieving Interoperability with Proprietary Software Interoperability is where most software engineers can benefit from reversing almost daily. When working with a proprietary software library or operating system API, documentation is almost always insufficient. Regardless of how much trouble the library vendor has taken to ensure that all possible cases are covered in the documentation, users almost always find themselves scratching their heads with unanswered questions. * **Two common responses** * Contact the vendor asking for answers. * Use reverse engineering (which is much easier) to solve the problem. ## Developing Competing Software As I've already mentioned, in most industries this is by far the most popular application of reverse engineering. Software tends to be more complex than most products, and so reversing an entire software product in order to create a competing product just doesn't make any sense. * **Why you would not want to reverse engineer a competitor’s product.** * It is usually much easier to design and develop a product from scratch, or simply license the more complex components from a third party rather than develop them in-house. * Even if a competitor has an unpatented technology, it would never make sense to reverse engineer their entire product. * It is almost always easier to independently develop your own software. * **When you would want to reverse engineer a competitor’s product.** * In cases where the application would be very difficult or costly to develop independently, and only highly complex or unusual components need to be reverse engineered. ## Evaluating Software Quality and Robustness Just as it is possible to audit a program binary to evaluate its security and vulnerability, it is also possible to try and sample a program binary in order to get an estimate of the general quality of the coding practices used in the program. The need is very similar: open-source software is an open book that allows its users to evaluate its quality before committing to it. Software vendors that don't publish their software's source code are essentially asking their customers to "just trust them." It's like buying a used car where you just can't pop up the hood. You have no idea what you are really buying. The need for having source-code access to key software products such as operating systems has been made clear by large corporations; several years ago Microsoft announced that large customers purchasing over 1,000 seats may obtain access to the Windows source code for evaluation purposes. Those who lack the purchasing power to convince a major corporation to grant them access to the product's source code must either take the company's word that the product is well built, or resort to reversing. Again, reversing would never reveal as much about the product's code quality and overall reliability as taking a look at the source code, but it can be highly informative. ## Low-Level Software Low-level software (also known as system software) is a generic name for the infrastructure of the software world. It encompasses development tools such as compilers, linkers, and debuggers, infrastructure software such as operating systems, and low-level programming languages such as assembly language. It is the layer that isolates software developers and application programs from the physical hardware. * **How low-level software isolates software developers and applications from the physical hardware.** * The development tools isolate software developers from processor architectures and assembly languages, * Operating systems isolate software developers from specific hardware devices and simplify the interaction with the end user by managing the display, the mouse, the keyboard, and so on. Years ago, programmers always had to work at this low level because it was the only possible way to write software—the low-level infrastructure just didn't exist. Nowadays, modern operating systems and development tools aim at isolating us from the details of the low-level world. This greatly simplifies the process of software development, but comes at the cost of reduced power and control over the system. In order to become an accomplished reverse engineer, you must develop a solid understanding of low-level software and low-level programming. That's because the low-level aspects of a program are often the only thing you have to work with as a reverser—high-level details are almost always eliminated before a software program is shipped to customers. Mastering low-level software and the various software-engineering concepts is just as important as mastering the actual reversing techniques if one is to become an accomplished reverser. A key concept about reversing that will become painfully clear later in this book is that reversing tools such as disassemblers or decompilers never actually provide the answers—they merely present the information. Eventually, it is always up to the reverser to extract anything meaningful from that information. ## Assembly Language Assembly language is the lowest level in the software chain, which makes it incredibly suitable for reversing—nothing moves without it. If software performs an operation, it must be visible in the assembly language code. Assembly language is the language of reversing. To master the world of reversing, one must develop a solid understanding of the chosen platform's assembly language. * **Important points to remember about assembly language** * It is a class of languages, not one language. Every computer platform has its own assembly language that is usually quite different from all the rest. * Machine code and assembly language are two different representations of the same thing. A CPU reads machine code, which is nothing but sequences of bits that contain a list of instructions for the CPU to perform. Assembly language is simply a textual representation of those bits—we name elements in these code sequences in order to make them human-readable. ## Compilers So, considering that the CPU can only run machine code, how are the popular programming languages such as C++ and Java translated into machine code? A text file containing instructions that describe the program in a high-level language is fed into a compiler. A compiler is a program that takes a source file and generates a corresponding machine code file. ## Virtual Machines and Bytecodes Compilers for high-level languages such as Java generate a bytecode instead of an object code. Bytecodes are similar to object codes, except that they are usually decoded by a program, instead of a CPU. The idea is to have a compiler generate the bytecode, and to then use a program called a virtual machine to decode the bytecode and perform the operations described in it. Of course, the virtual machine itself must at some point convert the bytecode into standard object code that is compatible with the underlying CPU. There are several major benefits to using bytecode-based languages. * **Platform independence** – The virtual machine can be ported to different platforms, which enables running the same binary program on any CPU as long as it has a compatible virtual machine. Regardless of which platform the virtual machine is currently running on, the byte-code format stays the same. ## Operating Systems An operating system is a program that manages the computer, including the hardware and software applications. An operating system takes care of many different tasks and can be seen as a kind of coordinator between the different elements in a computer. Operating systems are such a key element in a computer that any reverser must have a good understanding of what they do and how they work. ## The Reversing Process How does one begin reversing? There are really many different approaches that work, and I'll try to discuss as many of them as possible throughout this book. For starters, I usually try to divide reversing sessions into two separate phases. * **System-level reversing** – is really a kind of large-scale observation of the earlier program. * **Code-level reversing** – provides detailed information on a selected code chunk. ## System-Level Reversing System-level reversing involves running various tools on the program and utilizing various operating system services to obtain information, inspect program executables, track program input and output, and so forth. Most of this information comes from the operating system, because by definition every interaction that a program has with the outside world must go through the operating system. ## Code-Level Reversing Code-level reversing is really an art form. Extracting design concepts and algorithms from a program binary is a complex process that requires a mastery of reversing techniques along with a solid understanding of software development, the CPU, and the operating system. Software can be highly complex, and even those with access to a program's well-written and properly-documented source code are often amazed at how difficult it can be to comprehend. Deciphering the sequences of low-level instructions that make up a program is usually no mean feat. But fear not, the focus of this book is to provide you with the knowledge, tools, and techniques needed to perform effective code-level reversing. ## The Tools Reversing is all about the tools. The following sections describe the basic categories of tools that are used in reverse engineering. Many of these tools were not specifically created as reversing tools, but can be quite useful nonetheless. * **System-Monitoring Tools ** – are used to explore the program and show information gathered by the operating system about the application and its environment. * **Disassemblers** – are programs that take a program's executable binary as input and generate textual files that contain the assembly language code for the entire program or parts of it. * **Debuggers** – are programs that allow software developers to observe their program while it is running. * **Decompilers** – take an executable binary file and attempts to produce readable high-level language code from it. ## Is Reversing Legal? The legal debate around reverse engineering has been going on for years. It usually revolves around the question of what social and economic impact reverse engineering has on society as a whole. ## Interoperability Getting two programs to communicate and interoperate is never an easy task. Even within a single product developed by a single group of people, there are frequently interfacing issues caused when attempting to get individual components to interoperate. Software interfaces are so complex and the programs are so sensitive that these things rarely function properly on the first attempt. It is just the nature of the technology. When a software developer wishes to develop software that communicates with a component developed by another company, there are large amounts of information that must be exposed by the other party regarding the interfaces. ## Competition When used for interoperability, reverse engineering clearly benefits society because it simplifies (or enables) the development of new and improved technologies. When reverse engineering is used in the development of competing products, the situation is slightly more complicated. Opponents of reverse engineering usually claim that reversing stifles innovation because developers of new technologies have little incentive to invest in research and development if their technologies can be easily “stolen" by competitors through reverse engineering. ## Copyright Law Copyright laws aim to protect software and other intellectual property from any kind of unauthorized duplication, and so on. The best example of where copyright laws apply to reverse engineering is in the development of competing software. As I described earlier, in software there is a very fine line between directly stealing a competitor's code and reimplementing it. ## Trade Secrets and Patents When a new technology is developed, developers are usually faced with two primary options for protecting the unique aspects of it. * **Patents** – The benefit of patenting is that it grants the inventor or patent owner control of the invention for up to almost 20 years. The main catches for the inventor are that the details of the invention must be published and that after the patent expires the invention essentially becomes public domain. * **Trade Secrets** – A newly developed technology that isn't patented automatically receives the legal protection of a trade secret if significant efforts are put into its development and to keeping it confidential. A trade secret legally protects the developer from cases of “trade-secret misappropriation" such as having a rogue employee sell the secret to a competitor. ## The Digital Millennium Copyright Act The Digital Millennium Copyright Act (DMCA) has been getting much publicity these past few years. The DMCA was enacted in 1998, to protect the copyright protection technologies. * **Purpose of the DMCA** – The basic idea behind the DMCA is that it legally protects copyright protection systems from circumvention. The DMCA is the closest thing you'll find in the United States Code to an anti-reverse-engineering law. * **What the DMCA only applies to** – It only applies to copyright protection systems, which are essentially DRM technologies. ## DMCA Cases The DMCA is relatively new as far as laws go, and therefore it hasn't really been used extensively so far. There have been several high-profile cases in which the DMCA was invoked. ## License Agreement Considerations In light of the fact that other than the DMCA there are no laws that directly prohibit or restrict reversing, and that the DMCA only applies to DRM products or to software that contains DRM technologies, software vendors add anti-reverse-engineering clauses to shrink-wrap software license agreements. ## Conclusion In this chapter, we introduced the basic ground rules for reversing. We discussed some of the most popular applications of reverse engineering and the typical reversing process. We introduced the types of tools that are commonly used by reversers and evaluated the legal aspects of the process. Armed with this basic understanding of what it is all about, we head on to the next chapters, which provide an overview of the technical basics we must be familiar with before we can actually start reversing.