How Does Software Reverse Analysis Learn Systematically?

2022.3.15

Software reverse engineering, also known as software reverse engineering, refers to the reverse disassembly and analysis of software structure, process, algorithm and code by using various computer technologies such as decryption, disassembly, system analysis and program understanding, starting from the operable program system, so as to deduce the source code, design principle, structure, algorithm Processing process, operation method and relevant documents, etc. Generally, the whole process of software reverse analysis is collectively referred to as software reverse engineering, and the technologies used in this process are collectively referred to as software reverse engineering technology. Today, we are telling you how to learn software reverse analysis systematically.

Program Decompilation

At the beginning of reverse, the target will be decompiled. As software developers, we should not be unfamiliar with the word compilation. We write the program code and use the compiler to convert it into an executable program. This process is called compilation. Decompilation is naturally the inverse process of this process. What kind of program should be selected for decompilation? For programs written in C, C + +, golang and other type languages, we generally use IDA for disassembly. For class files and jar files written in Java language, we generally use JD GUI for decompilation. For executable programs written in c# language, we generally use reflector to decompile. Therefore, learning the use of the above three decompilers is very important for learning reverse engineering.

 

Executable file format

Different operating system platforms have different executable file formats, such as PE file on windows, ELF file on Linux platform and mach-o file on MacOS. In an executable file, in addition to the assembly instructions generated by the source code, there are static data (such as the string referenced in the code), import and export information, file attribute information, etc. mastering and extracting these information will be very helpful for us to understand the target program. This requires learning the format of executable files on different platforms, especially PE files and ELF files, which are the most common file formats in reverse engineering.

 

CPU instruction set

When reverse analyzing a program, the main energy and time is to read and analyze the assembly instructions decompiled. Therefore, the instruction set of CPU and assembly language are a necessary course for students engaged in reverse. The common PC CPU is Intel x86, x64 and AMD64, and the mobile CPU is ARM architecture. It is recommended to start with the most basic x86. In particular, it should be noted that many tutorials on the network still talk about the assembly language in 16 bit real mode, which is very misleading. Of course, we should understand the real mode, but we should focus on the 32-bit assembly language in protected mode. When x86 enters the door, you can expand the learning of x64, and then expand the learning of arm in the later stage. Learning assembly language is not only learning assembly instructions, but also learning to understand the CPU, which registers the CPU has, what they are used for, how it accesses memory, how to address, how to calculate and so on.

 

High level language features

The goal of our reverse engineering is mostly the program written in high-level language such as C / C + + / Java / C # such as C / C + +. If you want to restore the code logic of the program, you can’t do it if you don’t understand the high-level language itself. Of course, being a reverse student doesn’t need to be familiar with the characteristics of these languages and master many programming skills like professional development students. However, it is necessary to master the basic programming skills of these languages. Take C language as an example. In C language, we need to know the basic concepts, such as the principle of function call, how to transfer parameters, how to distribute local variables in functions, how to store arrays, how to layout structure members, how to implement pointers, etc. otherwise, we don’t know how to convert with high-level language when we get disassembly code.

 

Like the above-mentioned C language knowledge, when learning, you should compare the source code with the length of the compiled assembly instructions, and repeatedly compare and learn to produce conditional reflection. In addition to these, we should also pay attention to the object-oriented implementation principle in C + +, the virtual function mechanism, how to pass the parameter of this pointer, and how to implement new and delete / delete in the assembly instruction layer. We should not rely too much on tools, or we will become a complete tool man. Especially for beginners, if we try to convert assembly instructions into high-level language, we will have a more thorough understanding of the underlying principles of technology.

 

Software debugging

In many cases, the goal cannot be achieved by static analysis alone. For example, the program is shelled and other technologies. Under the static analysis, all the wrong instruction code is seen, and even the decompilation tool cannot analyze it. At this time, we need to combine the dynamic analysis technology to make the program run actually, and then analyze it. Therefore, mastering the software debugging technology is also an indispensable link in reverse engineering. Commonly used debuggers include OllyDbg, which is a very powerful tool plug-in, WinDbg, which is officially produced by Microsoft for kernel level debugging, GDB on Linux platform and so on. Mastering the use of these three debuggers can help us quickly analyze the dynamic behavior of the target, debug and study the key code segments, and even dump the program file in memory into the file, and then use the decompile tool for static analysis, etc.