Basic Concepts
Introduction
Java bytecode is the execution language of the Java Virtual Machine (JVM). Whenever we write Java code and compile it, the Java compiler converts the source code into a platform-independent binary format, i.e., bytecode. Bytecode is a core component of Java, enabling Java to realize its iconic 'write once, run anywhere' (WORA) philosophy. Java bytecode is an intermediate code that lies between Java source code and machine code. It is usually stored in files with extensions..class
files with the extension. Java bytecode is a set of instructions executed by the Java Virtual Machine (JVM), designed to allow Java code to run seamlessly on a variety of hardware platforms as long as compatible JVMs are running on those platforms.
Generation
The generation of bytecode is completed by the Java compiler (javac). The compilation process includes the following steps:
- Lexical Analysis:Decompose the source code into a series of symbols.
- Syntax Analysis: Organizes symbols into syntactic structures, usually represented in a tree-like form, such as the Abstract Syntax Tree (AST).
- Semantic Analysis: Checks for semantic errors in the syntax tree and fills in necessary information.
- Generate Bytecode: Generates corresponding bytecode instructions based on the syntax tree.

Each Java source file is compiled into an independent.class
file, which contains all the bytecode instructions of the class or interface.
Structure
The structure of the bytecode file includes the following parts:
- Magic Number: The few bytes at the beginning of each bytecode file, used to identify that this is a Java bytecode file.
- Version Number: Identifies the version of the class file format of the bytecode file.
- Constant Pool: Stores various constant values, including numbers, strings, and symbolic references to classes and interfaces.
- Access Flags: Identifies the access permissions of a class or interface (such as public, private).
- Class Index, Superclass Index, and Interface Index: Identifies the class and its superclass and implemented interfaces.
- Field Tables Collection: Stores descriptive information about fields.
- Method Tables Collection: Stores descriptive information about methods.
- Attribute Tables Collection: Includes additional information about classes, fields, and methods, such as exception tables, inner classes, and more.
Execution
The execution of bytecode is handled by the interpreter or Just-In-Time (JIT) compiler within the JVM. The JVM first loads.class
file, then execute the bytecode contained within. The JVM uses a class loader to read bytecode and convert it into runtime data structures. Next, the JVM interprets the bytecode or compiles it into native machine code and then executes it.
In addition to the basic bytecode concepts, there are several advanced techniques and tools that implement more complex functions through bytecode manipulation. These technologies include ASM, JavaAssist, Instrumentation API, and JavaAgent. We will discuss these technologies and their applications in detail below.
ASM
Basic Principles
ASM is a low-level bytecode manipulation and analysis framework that provides APIs for direct interaction with bytecode. The core of ASM revolves around the production and consumption of bytecode. It accesses and modifies the structure of class files, including classes, methods, fields, and their attributes, through the visitor pattern (Visitor Pattern).
- ClassVisitor: This is an abstract class, and users can inherit from this class to access different parts of the class file.
- MethodVisitor: Used to access methods in a class, which can obtain and modify bytecode instructions within the method.
- FieldVisitor: Used to access fields in a class.
When ASM reads bytecode, it triggers corresponding events based on the structure of bytecode and calls methods of the visitor object. Developers can insert, modify, or delete bytecode in these methods to change the behavior of the class.
Bytecode processing flow and core classes
target class class bytes->
ClassReader parses->
ClassVisitor enhances and modifies bytecode->
ClassWriter generates enhanced class bytes->
Loaded through Instrumentation as a new Class
ClassReader:This class is used to parse the compiled
.class
Bytecode file. It reads the bytecode array and passes the data to the instance of ClassVisitor.ClassWriter:Used to generate bytecode for modified classes or new classes. It processes the modification instructions provided by ClassVisitor and outputs the new bytecode array.
ClassVisitorAnd MethodVisitor:These interfaces (and their implementation classes) are used to access and modify the structure and methods of classes. ClassVisitor is responsible for accessing class-level information such as class name, superclass, etc., while MethodVisitor focuses on bytecode instructions within the method.
Processing flow:
- Reading:Use ClassReader to read the bytecode of the class.
- Modification:The ClassVisitor and MethodVisitor modify the class structure or method as needed.
- Writing:The ClassWriter receives the modified instructions and generates new bytecode.
- Loading:Load and apply the newly generated bytecode through a custom class loader or Instrumentation API.
Example
Assuming that we need to add performance monitoring to each method in an existing Java class, recording the execution time of the methods. Using ASM, we can insert code for time recording at the beginning and end of the method.
Firstly, we need to define aMethodVisitor
Insert bytecode at the entry and exit of the method (the purpose is to test the time required for method execution.):
import org.objectweb.asm.MethodVisitor; import org.objectweb.asm.Opcodes; public class PerformanceMonitorMethodVisitor extends MethodVisitor { public PerformanceMonitorMethodVisitor(MethodVisitor mv) { super(Opcodes.ASM5, mv); } @Override public void visitCode() { super.visitCode(); mv.visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "currentTimeMillis", "()J"); mv.visitVarInsn(Opcodes.LSTORE, 1); } @Override public void visitInsn(int opcode) { if (opcode >= Opcodes.IRETURN && opcode <= Opcodes.RETURN) { mv.visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "currentTimeMillis", "()J"); mv.visitVarInsn(Opcodes.LSTORE, 3); mv.visitVarInsn(Opcodes.LLOAD, 3); mv.visitVarInsn(Opcodes.LLOAD, 1); mv.visitInsn(Opcodes.LSUB); mv.visitFieldInsn(Opcodes.GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;"); mv.visitInsn(Opcodes.SWAP); mv.visitMethodInsn(Opcodes.INVOKEVIRTUAL, "java/io/PrintStream", "println", "(J)V", false); } super.visitInsn(opcode); } }
visitCode()
The method is called at the beginning of the method. Here we callSystem.currentTimeMillis()
Get the current time and store it in local variable 1.visitInsn(int opcode)
The method is called when visiting each instruction. When encountering instructions that indicate the return of a method (such asIRETURN
),we retrieve the current time again and calculate the difference from the previously stored time, and then print out this time difference.
Then, we need aClassVisitor
to visit each method in the class and apply the aboveMethodVisitor
:
import org.objectweb.asm.ClassVisitor; import org.objectweb.asm.MethodVisitor; import org.objectweb.asm.Opcodes; public class PerformanceMonitorClassVisitor extends ClassVisitor { public PerformanceMonitorClassVisitor(ClassVisitor cv) { super(Opcodes.ASM5, cv); } @Override public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) { MethodVisitor mv = cv.visitMethod(access, name, desc, signature, exceptions); return new PerformanceMonitorMethodVisitor(mv); } }
visitMethod()
The method is called when visiting each method in the class. Here we create and return aPerformanceMonitorMethodVisitor
Instance, so that performance monitoring code can be added to each method.
At runtime, it will useClassReader
Read the specified class throughPerformanceMonitorClassVisitor
Access and modify the bytecode of the class, and finally useClassWriter
Generate a new class file so that the modified class includes performance monitoring functionality.
This process allows us to add additional functionality to existing Java applications without modifying the source code in an intrusive manner.
Below we will continue this process to demonstrate how to use these classes to actually modify an existing Java class file and insert performance monitoring code.
Modify class files using ASM
To apply the above visitor, we need to read an existing class file, so thatClassReader
Parse the class and then useClassWriter
The following is the implementation code for the entire process to output the modified class file:
import org.objectweb.asm.ClassReader; import org.objectweb.asm.ClassWriter; import org.objectweb.asm.ClassVisitor; import java.io.FileOutputStream; import java.io.IOException; public class ASMExample { public static void main(String[] args) throws IOException { // Read the existing class file ClassReader classReader = new ClassReader("com/example/YourClass"); ClassWriter classWriter = new ClassWriter(ClassWriter.COMPUTE_FRAMES); // Create a custom ClassVisitor ClassVisitor classVisitor = new PerformanceMonitorClassVisitor(classWriter); // Parse and modify the class file classReader.accept(classVisitor, ClassReader.EXPAND_FRAMES); // Get the modified class bytecode byte[] modifiedClassBytes = classWriter.toByteArray(); // Write the modified class bytecode to the file try (FileOutputStream fos = new FileOutputStream("YourClassModified.class")) { fos.write(modifiedClassBytes); } } }
- ClassReader: This class reads bytecode from the specified class file.
- ClassWriter: This class is responsible for generating the modified bytecode.
COMPUTE_FRAMES
parameter to ensure that stack map frames are correctly calculated, which is necessary for Java 7 and above versions. - PerformanceMonitorClassVisitor: This is our custom class visitor that will create a
PerformanceMonitorMethodVisitor
An example for inserting performance monitoring code.
Advantages and disadvantages of ASM
Advantages:
- High performance: Directly manipulating bytecode means extremely high execution efficiency.
- High flexibility: It can almost perform any form of bytecode operation, providing developers with great freedom.
- Mature and stable: ASM is a mature library widely used in production environments, with good community support and documentation.
Disadvantages:
- Complexity: Directly manipulating bytecode requires a deep understanding of the working principles of JVM, which has high requirements for developers.
- Prone to errors: Any small error can lead to runtime errors or unstable behavior, making debugging and maintenance difficult.
- Poor code readability: Directly manipulating bytecode makes the code difficult to understand and maintain, especially in large projects.
Conclusion
ASM is a powerful but complex framework, suitable for advanced application scenarios that require fine-grained control over bytecode. For ordinary developers, it may require a certain amount of learning investment, but the flexibility and performance it provides are irreplaceable. When choosing to use ASM, one should weigh the benefits it brings and the potential complexity.
Canary protection mechanism and bypassing
Enterprise Practice of Data Security Self-Assessment
How to conduct offensive and defensive exercise risk assessment for AI systems: Red Teaming Handbook
Cyber Attacks on Small Businesses
6. Traceability of Transmission Channels
A brief discussion on the safety design of smart door lock clutches

评论已关闭