How to read Java Bytecode for fun and profit
Embarking on a journey through the world of Java Bytecode? This article covers everything you need to know to get started.
What is bytecode?
Back in 1995, Sun Microsystems, the creators of the Java programming language, made a bold claim. They said that Java would allow you to “write once and run anywhere.” That meant that the compiled binaries would be able to run on any system architecture, something that C could not do and remains a core tenant of writing Java to this day.
To achieve this cross-platform capability, Java employs a unique approach when compiling. Instead of going from source code directly into machine code (which would be specific to each system architecture), Java compiles its programs into an intermediate form known as bytecode. Bytecode is a set of instructions that is neither tied to a particular machine language nor dependent on any specific hardware architecture. This abstraction is the key to Java's portability.
The program that interprets and executes Java bytecode instructions is called a Java Virtual Machine (JVM). The JVM translates each bytecode instruction into the machine code native to the particular system architecture it is running on. This process, often referred to as "just-in-time" (JIT) compilation, allows Java bytecode to be executed as efficiently as possible on any given platform.
Viewing Bytecode
Bytecode isn’t just useful for the JVM, though. Because the bytecode of a Java class is helpful for reverse engineering, performance optimization, security research, and other static analysis functions, the JDK ships with utilities to help you and me inspect it.
To glimpse at an example of bytecode, consider the following two methods from `java.lang.Boolean`, `booleanValue` and `valueOf(boolean)` which respectively unbox and box the `boolean` primitive type:
public boolean booleanValue() { return value; } public static Boolean valueOf(boolean b) { return (b ? TRUE : FALSE); }
Using the `javap` command, which ships with the JDK, we can see the bytecode for each. You can do this by running `javap` with the `-c` command and the fully-qualified name of the class, like so:
javap -c java.lang.Boolean
There result is the bytecode for all the public methods in `java.lang.Boolean`. Here I’ve copied just the bytecode for `booleanValue` and `valueOf(boolean)`:
public boolean booleanValue(); code: 0: aload_0 1: getfield #7 // Field value:Z 4: ireturn public static java.lang.Boolean valueOf(boolean); Code: 0: iload_0 1: ifeq 10 4: getstatic #27 // Field TRUE:Ljava/lang/Boolean; 7: goto 13 10: getstatic #31 // Field FALSE:Ljava/lang/Boolean; 13: areturn
Dissecting Bytecode
At first glance, it’s an entirely new language to learn. However, it quickly becomes straightforward when as you learn what each instruction does and that Java operates with a stack.
Take the three bytecode instructions for `booleanValue`, for example:
`aload_n` means to place a reference to a local variable onto the stack. In a class instance, `aload_0` refers to `this`.
-
`getfield` means to read the member variable from `this` (the lower item on the stack) and place that value onto the stack
`#7` refers to the reference’s index in the constant pool
`// Field value:Z` tells us what `#7` refers to, a field named `value` of type `boolean` (Z)
`ireturn` means to pop a primitive value off of the stack and return it
Long story short, these three instructions lookup the instance’s `value` field and return it.
As a second example, take a look at the next method, `valueOf(boolean)`:
`iload_n` means to place a primitive local variable onto the stack. `iload_0` refers to the first method parameter (since the first method parameter is a primitive)
`ifeq n` means pop the value off of the stack and see if it is true; if so, proceed to the next line, otherwise jump to line `n`
-
`getstatic #n` means read a static member onto the stack
`#27` refers to the static member’s index in the constant pool
`// Field TRUE:Ljava/lang/Boolean` tells us what `#27` refers to, a static member named `TRUE` of type `Boolean
`goto n` means now jump to line `n` in the bytecode
`areturn` means pop a reference off of the stack and return it
In other words, these instructions say, take the first method parameter, if it’s true, then return `Boolean.TRUE`; otherwise, return `Boolean.FALSE`.
Leveraging Bytecode Analysis
I mentioned earlier that this can be helpful for reverse engineering, performance optimization, and security research. Let’s expand on those now.
Reverse Engineering
When working with third-party libraries or closed-source components, bytecode analysis becomes a powerful tool. Decompiling bytecode can provide a glimpse into the inner workings of these libraries, aiding in integration, troubleshooting, and ensuring compatibility.
In situations where you encounter proprietary or closed-source Java code, reading bytecode can be the only feasible way to understand its functionality. Bytecode analysis allows you to reverse engineer and comprehend the behavior of closed-source applications, facilitating interoperability or customization.
In the way of a real-life example, I was recently trying to integrate a third-party package tangle analysis tool into our Ci system. Unfortunately, the vendor was closed-sourced and only had documentation for how to access the library through their proprietary UI. By analyzing the bytecode, I was able to reverse engineer the expected inputs and outputs of the underlying analytics engine.
The above is the detailed content of How to read Java Bytecode for fun and profit. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Troubleshooting and solutions to the company's security software that causes some applications to not function properly. Many companies will deploy security software in order to ensure internal network security. ...

Solutions to convert names to numbers to implement sorting In many application scenarios, users may need to sort in groups, especially in one...

Field mapping processing in system docking often encounters a difficult problem when performing system docking: how to effectively map the interface fields of system A...

When using MyBatis-Plus or other ORM frameworks for database operations, it is often necessary to construct query conditions based on the attribute name of the entity class. If you manually every time...

Start Spring using IntelliJIDEAUltimate version...

Conversion of Java Objects and Arrays: In-depth discussion of the risks and correct methods of cast type conversion Many Java beginners will encounter the conversion of an object into an array...

Detailed explanation of the design of SKU and SPU tables on e-commerce platforms This article will discuss the database design issues of SKU and SPU in e-commerce platforms, especially how to deal with user-defined sales...

When using TKMyBatis for database queries, how to gracefully get entity class variable names to build query conditions is a common problem. This article will pin...
