20M golang to 1KB shellcode

1.Preface

Golang has many advantages and disadvantages, but for security personnel, I think the biggest advantage is Can be directly compiled into machine code, not dependent on other librariesBecause we often need to use static compilation and cross compilation, at that time, I encountered various link library download and compile, download and compile cycles when I statically compiled the C environment, which is still vivid in my memory. Based on this advantage and characteristic, golang is naturally suitable for developing RAT or other C/S architecture software.

But at the same time, this advantage is also a very fatal disadvantage, if the function is slightly more, the client can easily reach tens of M even hundreds of M, which narrows the scope of use.

If you search for golang and shellcode on search engines now, almost all the results are using golang to make shellcode loaders for anti-kill. But I was thinking about another question,How to turn a golang program into a very small shellcode.

First, I will show you some project examples that I have already implemented:https://github.com/veo/vshell

Next, I will introduce to you how I have implemented it

2.PE to shellcode

https://github.com/hasherezade/pe_to_shellcode

This project utilizesReflectiveDLLInjectionTechnology can convert PE programs into shellcode, of course, including PE programs compiled by golang.

The usage is also very simple:

pe2shc.exe <path to your PE> [output path*]

It can convert any exe program into shellcode.

It should be noted that the converted file is still an exe file, which can be directly double-clicked to run, but the program content can also be run as shellcode.

Now, with this project, it is indeed possible to convert Golang programs to shellcode easily, but the volume has not decreased, but has increased by 1M.At present, the actual problem of large volume has not been solved.

3. Cobalt Strike Stager

I have not studied cobalt strike much before, and without understanding the detailed principles, I was thinking about a question:Why can the shellcode of msf and cs be so short, but the generated exe is close to 500K, and how does such a short shellcode complete so many functions?.

Later, I learned that the technique of segmented loading of payload was used, divided into stage and unstage.

Stage is just a loader, which needs to pull the complete functional payload from the server and run it in memory. Unstage is the complete functional payload, which does not need to pull the payload from the server again.
The short shellcode generated by cs is just the loader's shellcode, not the complete payload shellcode.

By now, the basic idea is basically clear. First, convert the large Golang program to shellcode, and then use C/C++ to create a loader called stage to pull this remote large shellcode, completing the conversion from Golang program to small shellcode.

4. Stager Production

Stager is actually a loader that runs the main program in memory by remotely downloading it. Similar to msf and cs, they both have stagers using HTTP/HTTPS and TCP protocols, where TCP uses the ws2_32.dll library and HTTP uses the wininet.dll library. Subsequently, VirtualAlloc function is used to allocate memory and run in memory.

The project I developed does not currently have the functionality of HTTP protocol communication, so I chose the wsock32 library for programming, which is essentially similar to ws2_32 and even the same. I chose this library to avoid static scanning.

Firstly, on the server side, we need to set up a download channel, and judge the system type by the first 3 bytes sent through socket, and return different shellcodes.

Subsequent operations require receiving IP and port, and customizing the Golang's large shellcode, which is to modify the configuration information within the main program. This involves knowledge of network byte order, where IP is 4 bytes and port is 2 bytes. If you are not familiar with byte order, you can search for it online.

The conversion of network byte order is actually very important, so it can be very convenient to modify configuration information by replacing the program bytecode, and generate programs with different configurations.

The C++ written starger completes the connection and download of the remote payload according to the protocol, and network byte order is also used here to facilitate the generation of programs with different configurations.

After communication is completed, memory is applied, the payload is downloaded to memory, and then executed. The applied memory must be larger than the shellcode of the remote main program, so I applied 30M memory for writing. The remote payload here is about 20M.

For Linux, I used C language for writing, although Linux has memfd_create and can also load elf programs without file landing, but there may be some compatibility issues, and there are fewer antivirus software on Linux, so there is no need to make it fileless memory loading and execution. Therefore, the scheme of downloading the remote payload to the local disk and then executing it was adopted.

5. stager to shellcode

The next step is to convert the C++ written stager into shellcode ,

Because I had no foundation in shellcode writing before, I used an open-source shellcode generation framework created by others ,https://github.com/TonyChen56/ShellCodeFrame, and then I supplemented some knowledge about PE files and shellcode.

In summary, replace all the functions used with the way of writing shellcode, and then you can generate shellcode (it sounds simple, but it still has some difficulty).

To find the HASH corresponding to the function, you cannot use restrictions such as double quotes.