1. What is a PE file
PE file, short for Portable Executable file, is a common executable file format, usually used for programs and libraries on the Windows operating system.
2.Entry point and base address
Entry Point (Entry Point):
The entry point is a specific function or instruction in the PE file, which is the place where the program starts executing at runtime. When the operating system loads a PE file and starts it, it will start executing the code from the entry point. The entry point usually resides in the code section of the PE file and is called by the operating system.
Base Address (Base Address):
The base address is the starting address of the PE file when it is loaded into memory. When the PE file is loaded into memory and runs, the operating system will allocate a continuous memory area for the file, and the base address is the starting address of this memory area. The base address is usually dynamically allocated by the operating system to ensure that there is no address conflict between different programs.
Difference:
The entry point is the place where the program starts executing, that is, the first instruction or function executed in memory by the program.
The base address is the starting address of the PE file when it is loaded into memory, used to determine the position of the file in memory. The base address plus the relative virtual address (RVA) can calculate the actual physical address in memory.
In summary, the entry point is the starting point of program execution, and the base address is the location of the file in memory. The operating system is responsible for loading the PE file into memory and executing the program. Entry points and base addresses are important concepts to ensure the normal execution of the program.
Third, PE file parsing
When loading PE files with windbg, the current base address will be given, that isimage_base,as shown in the figure below:
1.MS-DOSheader:
eachPEAll files start with aDOSthe starting point of the program, also known asMS-DOSheader, whose structure isIMAGE_DOS_HEADER,itsThe first byte isMZcharacters, as shown in the figure below:
IMAGE_DOS_HEADERstructureThere are two relatively important variables:
E_magicande_lfanew
E_magic That isdosheader,represents that it is adosfile, its feature value is:0x4d5a The corresponding definition is#define
IAMGE_DOS_SIGNATURE That isMZtwo letters
E_lfanew:PEfile headeroffset position (RVA(address) The value pointed to by this address is50 45 That isPEtwo letters
2.PEFile header:
After the MS-DOS header, the next is the PE file header,Its structure is:IMAGE_NT_HEADER, which contains many important fields that PE loaders can use, and occupies the most important proportion in PE file analysis.
PEThe calculation method of the file header address is:imagebase+e_lfanew(file offset position), the variables contained in its structure are:
+0h signature (points to PE characters) +4h IMAGE_FILE_HEADER (points to some basic information of PE files) +18H IMAGE_OPTIONAL_HEADER32 (points to PE file detailed information)
Here, only some important field contents are introduced:
For IMAGE_FILE_HEADER Among them, one of the more important variables is: +14h SizeOfOptionalHeader points to the size of IMAGE_OPTIONAL_HEADER32. As shown in the figure below, this is the value of SizeOfOptionalHeader:
where the red box is+14h That isIMAGE_OPTIONAL_HEANDER32 size,50 45 00 00IsIMAGE_NT_HEADERthe first content in the structure, that issignatureand the following until18hare allIMAGE_FILE_HEADERcontent,and by the time0x18hat that timeThat isIMAGE_OPTIONAL_HEANDER32the starting offset address.
ForIMAGE_OPTIONAL_HEADER32, the input table to be analyzed is at the 0x78h offset of this structure, that is
datadirectory[16]structure (Note: The offset address is relative toPEfile header, that is, starting fromIMAGE_NT_HEADERStartingInstead of relative toDOSStarting from the header).
3.DataDirectory[16]:
The data directory table, in Windows, has a structure:IMAGE_DATA_DIRECTORYTo describe this field, the data directory table is composed of multiple IMAGE_DATA_DIRECTORY, and its structure is as follows:
IMAGE_DATA_DIRECTORYDOWRD virtualAddress The address of the data block (RVA: relative address) which maps to the address in memory is calculated by: image_base + RVA DWORD size Data block length (which determines the size of the current table in memory at this time)
For IMAGE_DATA_DIRECTORY, here are two common examples:
+78hThat isexport table (Export Table) Its structure is as follows:IMAGE_DIRECTORY_ENTRY_EXPORT
The meaning of the output table:
The output table (Export Table) is a data structure used to store export information about functions, variables, or other symbols contained in executable files. The output table allows other programs (such as dynamic link libraries DLL or other executable files) to access and call these exported functions or symbols at runtime.
This sentence explainsThe meaning of this data directory table: the most important is toDLLThe file is useful, as it explainsdllproviding information about functions.
The output table is particularly useful in dynamic link libraries (DLLs) because it allows other programs to load and call these library functions at runtime. For example, if a function in a DLL is exported and another program needs to use that function, it can achieve this by loading the DLL and searching for the function address in the export table.
+80his the second data directory table: that isimport table(Input Table):IMAGE_DIRECTORY_ENTRY_IMPORT
Input Table:It is evident that it is the opposite meaning of the output table, that is, when we program, we often need to refer to other library functions, such as the built-in library of window, kernel.dll, etc. This table stores the other functions (external function libraries) we want to import. This is particularly useful in reverse engineering, as functions imported from external libraries allow us to more intuitively analyze what the current program is doing and what its purpose is from a macro perspective.
Input Table (Import Table), which is used to store information about functions or symbols referenced by other modules (such as DLLs) in executable files or DLLs. The import table allows the program to parse and call functions in other modules at runtime.
FromIMAGE_DATA_DIRECTORYIn it, we can locate the address of the input table of the current program and find the information of the imported functions.
4. Block Table:
As shown in the following figure is aPEThe simple structure and mapping in memory of the file:
The figure above is the simple structure of the current file and its mapping in memory:PEThe general structure of the file, as can be seen from the figure,PEAbove the file header is the block table structure.
Meaning of block table:
The block table (Section Table) is a data structure used to store information about various sections (sections) of an executable file. Each section corresponds to a memory section in a PE file and can contain code, data, resources, etc. The block table records the attributes and positions of these sections, so that the operating system and loading program can correctly map the file to memory.
Each entry in the block table represents a section and contains the following important information:
Section name: Each section has a name, usually an ASCII string, used to identify the purpose of the section. For example, code section, data section, resource section, etc.
Section virtual memory address: The starting address of each section in virtual memory, used by the loading program to map the section to memory.
SectionOffset address in the file: The starting position of each section in the PE file, used by the loading program to read the content of the section from the file.
Section size: The size of each section, i.e., the number of bytes contained.
Section attributes: Each section has specific attributes, such as whether it is executable, readable, writable, etc.
which is the structure of the block tableIt stores, for example.text .radata .dataetc. information of data blocks
Its structure is:IMAGE_SECTION_HEANER
Name: An 8-byte field that saves the name of the section. For example, “.text” indicates code, and “.data” indicates initialized data. VirtualSize: The size of the section when loaded into memory. VirtualAddress: The virtual address where the section data starts (in memory). SizeOfRawData: The size of the section data on the disk (in PE files). PointerToRawData: A file pointer (offset) pointing to the beginning of the section data in the file. Characteristics: A flag that describes various attributes of the section, such as whether it contains code or data, whether it is writable or readable, etc.
For the specific meaning of the structure, let's not go into detail now. For the time being, just remember one variable:name which is the first variable in the block structureThe size isBYTE8 8bytes in size, and this name is the current block name, as shown in the following figure:
As shown in the figure aboveThe current red box positionwhich is the first block table entryname That is.textBlock, and its calculation method is:
PEThe starting position of the file header+F0(The starting offset of the last data directory table)+8h(The current size of the data directory table)
Why is this calculation method used, let's briefly explain: the data directory table isPEThe last variable in the file header is followed by the block table, with a total of16such structure variables, the last offset isF0, size is8, soPEFile header position+F0+8Is the starting position of the block table, as shown in the following figure:
About the meanings of some common blocks:
.text Default code block, all content is instruction code.
.data Default read\Write block, global variables, static variables are generally placed here.
.rdataDefault read-only data block.
.idata Used to store references to other modules (such asDLL)information of functions and symbols. This allows programs to parse and call functions in other modules at runtime.
.edata Used to store export information of functions, variables, or other symbols in executable files, allowing other programs to access and call these exported contents at runtime. This block is usually merged into.textin the block.
Block alignment value:
PEThere are two types of block alignment in the file, the first one isis disk alignmentisMemory alignment.
For example, for disk alignment, the start position of each block is an integer of the alignment value. Suppose.textThe first byte is at400hThe length is90hThe starting address is.textThe content is in the range of400~490hbetween, assuming the alignment value isThe value between, then to the next block, such as, then at this timeblock,textwith, then at this timemust beThe value between200h, then at this timeidata600hThe starting address is, at this time490~600hThe gap between00h
fileAlignment in the IMAGE_OPTIONAL_HEANDER32 structure is padded. This is the block alignment valueDefining the disk block alignment value. Its calculation method isPEThe start address of the file header+3ch.
PEThe start address of the file header is0x4550The address where it appears is3chIt is atIMAGE_OPTIONAL_HEANDER32structurefileAlignmentOffset address, so the disk block alignment value can be calculated smoothly:
The second one isMemory alignment valueThat isIMAGE_OPTIONAL_HEANDER32structuresectionalignmentValue, offset is38h, that is, for1000h In this example
This is the relationship, in the disk.textThe starting value is400h Then corresponding to memory is1000h ,.idataThe disk starting value is600h Then corresponding to memory is2000h.
This is the basic understanding of disk alignment and memory alignment, the specific differencespeFiles need to be analyzed differently, but the general approach is like this. No further description will be provided.
5. Input Table (Import Table)
import_table, its corresponding directorydata structure is as follows:IMAGE_DIRECTORY_ENTRY_IMPORT Next, we will analyze the content and meaning of this structure.
input table isPEfile header with aIMAGE_IMPORT_DESCRITOR(IID)Array starts, each PE file linkedDLLall have such an array.IIDThe end marker is the last unit asNULL. For example, somePEintroduced twodllfiles, then there will be twoIIDstructure to describe the current twoDLLFile, and finally ends with a0OfIIDStructure ends.
The IID structure is as follows:typedef struct _IMAGE_IMPORT_DESCRIPTOR { union { DWORD Characteristics; DWORD OriginalFirstThunk; } DUMMYUNIONNAME; DWORD TimeDateStamp; DWORD ForwarderChain; DWORD Name; DWORD FirstThunk; } IMAGE_IMPORT_DESCRIPTOR;
The meaning of each member of the IID structure is introduced as follows:
1.OriginalFirstThunk, RVA, points to the input name table (abbreviated as INT), INT is an array of structures of type IMAGE_THUNK_DATA. Similarly, an empty IMAGE_THUNK_DATA structure is appended to the end of the array to indicate the end of the array. Each input function has a corresponding IMAGE_THUNK_DATA structure;
2.TimeDateStamp, the DLL's time and date stamp, usually ignored;
3.ForwarderChain, usually ignored, without explanation;
4.Name, RVA, points to the DLL name, such as 'User32.dll';
5.FirstThunk, RVA, points to the input address table (abbreviated as IAT), IAT is also an array of structures of type IMAGE_THUNK_DATA. Similarly, an empty IMAGE_THUNK_DATA structure is appended to the end of the array to indicate the end of the array. Each input function has a corresponding IMAGE_THUNK_DATA structure.
In this structure, we only need to pay attention to three contents:OriginalFirstThunk, Name, FirstThunk
OriginalFirstThunkpoints toINTIsIMAGE_THUNK_DATA,OriginalFirstThunkpoints toINT(also known as the hint name table)is immutable, which means it points to the actualIMAGE_THUNK_DATA structure,Its value is based onRVAstored in a form.
AndFirstThunkAfter being directly loaded into memory by the loader, it directly points to the function address (the direct address in memory, not counting the one withRVAform stored),FirstThunkInPEDuring loading, it can bePEModified by the loader, inPEWhen loaded into memory, the loader will use the actual entry address of the function to fillFirstThunk, soFirstThunkPoints to the actual function address, that isIAT.
The structure of IMAGE_THUNK_DATA is as follows:}}typedef struct _IMAGE_THUNK_DATA32 { union { DWORD ForwarderString; DWORD Function; DWORD Ordinal; DWORD AddressOfData; } u1; } IMAGE_THUNK_DATA32;
It can be seen that the only member u1 of this type is a union (union), and the types inside the union are DWORD, so the size of IMAGE_THUNK_DATA is 4 bytes.When the most significant bit of this type is1when,indicating that the function is calledSerial numbermode, at this time the value of the lower 31 bits represents the function serial number;When the most significant bit of this type is0
when,indicating that the function is calledNamemode, at this time the value represents an RVA pointing to the structure of IMAGE_IMPORT_BY_NAME.
So at this time, we know that INT points to the structure of IMAGE_IMPORT_BY_NAME,
The structure of IMAGE_IMPORT_BY_NAME is as follows:typedef struct _IMAGE_IMPORT_BY_NAME { WORD Hint; Function serial number BYTE Name[1]; Function name } IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;
Here we borrow the picture made by the csdn big man,That isPEThe loader has not loadedPEThe process of calling the function when loading into memory(FirstThunkhas not been changed by the loader)At this time, about the two methods of taking function names from kernel.dll, that is, INT and IAT addressing mode:
WinDbg debugging input table:
The following is a specific practice,teach you how to usewindbugto debugpeThe program gets the current program'sIMAGE_DIRECTORY_ENTRY_IMPORT related information, that is, to obtain the input table (dllrelated information).
Usewindbgactual debuggingIATandINT Observe the structure of the import table (the loader has loadedPELoaded into memory)
1. ObtainImport tabledirectorydataTable information
According to the definition of the IMAGE_DATA_DIRECTORY structure above, we can know
DOWRD virtualAddress Address of the data block (RVA: relative address) which is mapped to the address in memory by the calculation method: image_base + RVA DWORD size Data block length (determines the current table size in memory at this time)
So at this time, we need to know the value size of the two variables for the import table import table, so that it is convenient for us to confirm the starting address and ending address of the current import table.
Firstly, we need to determine the datadirectory of the import table in 32-bit systems, the offset address of the import table is 80h (note that this offset address is based on the PE file header, not the DOS header).
Therefore, at this time, the steps to obtain the datadirectory structure in windbg are: obtain the PE file header position, then +80 import table offset position, as shown in the figure below
Firstly, get the offset amount of the PE file header: image_base+3cH:
To obtain the offset amount of the import table (RVA)And the value of its structure
directorydataOffset address is based onPEFile header-basedThat isPEFile header address+Offset addressIsdirectorydataAddress, from the figure below we getThe currentIIDStructureRVA Then throughVA(IIDMemory address)=image_base+RVA We can get the currentIIDAddress, that is400000+2040.
A simple explanation of the meaning of 400000+40h+80h:
1. Here,00400000 Isimage_base That is, base address40hIspeFile header offset addressIts calculation method isimage_base+03ch(That ise_lfanewThe value is 03h).
2. 40h is e_lfanew, which is the offset value of the PE file header, 400000+40h is the starting position of the PE file header, 80h is the offset position of the input table, so 400000+40h+80h is the starting position of the input table. This way, we can directly go through windbg dt_image_data_directory 400000+40+80 After obtaining the information of the input table, that is
RVA (Relative Virtual Address) is 2040 The size of the import table data is 3c
At this timeWe can directly go to look for the address information about the import table (IID)(There is a point to note hereThat isWhen looking for the import table addressIs the one that needs to be mapped to memoryVA Virtual address,VAThe calculation method is:VA=image_base+RVA). That is 400000+2040
At this time inwindbgThe instruction to obtain the IID structure value in the following is:
At this time, we need to pay attention to the fact that there are five consecutive marks00000000 According to the knowledge points mentioned above: that isIIDEnd of structure flag.
2. Analysis of IID structureFirstThunk
As mentioned before,}}FirstThunkAfter being directly loaded into memory by the loader, it directly points to the function address (the direct address in memory, not counting the one withRVAform stored),FirstThunkInPEDuring loading, it can bePEModified by the loader, inPEWhen loaded into memory, the loader will use the actual entry address of the function to fillFirstThunk, soFirstThunkPoints to the actual function address, that isIAT.
At this point, windbg has loaded the PE file, so at this point FirstThunk has been modified, so at this point FirstThunk directly points to the function address, as shown in the following figure:
From the above IID structure, it can be known that,FirstThunk is at the fifth DWORD position, that is, 00002010 (RVA value) in the figure above
Therefore, by using 'dd 400000+2010', we can directly obtain the direct address of the import function, and its function name can also be obtained, as follows:
3. Analysis of OriginalFirstThunk
Firstly, obtain the RVA value of the current OriginalFirstThunk, as shown in the following figure:
As mentioned above: For OriginalFirstThunk, when its most significant bit is 1, it indicates that the function is input by number, at this time the value of the lower 31 bits represents the function number; when the most significant bit is 0, it indicates that the function is input by name, at this time the value represents a pointer toIMAGE_IMPORT_BY_NAMEThe RVA of the structure.
So at this point we start fromwindbgTo obtain the current most significant bit, as shown in the following figure:
It can be seen that the most significant bits of the current ones are0, so at this point it points toIMAGE_IMPORT_BY_NAME
As mentioned above, but forFor IMAGE_IMPORT_BY_NAME,The second valueName[1] is the function name,We analyzePEThe purpose of importing the import table is to obtain its import function names, that is, to obtain its name value,First, let's try towindbgDirectly obtain the currentIMAGE_IMPORT_BY_NAMETry the nameCheck if it can be successfully obtained, as shown in the following figure:
At this point, we find that there are always two extra bytes before each obtained function name. At this point, when we move two bytes forward during debugging, we will find that we areOriginalFirstThunkOfRVAIn the form of obtaining the current import function name:
At this point, we have obtained the currentThe loader loadsPEThe file is loaded into memoryThe entire process of calling the import table function diagram (still referred toCSDNThe master's picture). As follows:
Through thiswindbgAnalysisPEThe process of importing the file function table has been completed.

评论已关闭