Reliable Java Decompiler

I got an idea a few days ago, and have been working on it since then, and it’s turning out pretty good so far. I’m just concerned about how helpful people will find it.

Basically, it should decompile ANY valid java bytecode, obfuscated or not, into valid java source code. The downside being it basically takes bytecode instructions and converts them directly back into equivalent source code statements, so it doesn’t produce very pretty source code. In fact, it is basically like reading raw JVM instructions, but in the source code. Would you still find this helpful? I think it might still be, since you can do things like insert arbitrary code and not worry about fixing the stack like you have to do when you play with bytecode.

I’ll hopefully have some examples of the code produced tonight, but if not it’ll be the weekend.

If someone wants to help me out in this, I could use a test class that utilizes every instruction and construct in the Java language.

the major problem is that you can do things in bytecode that you cant do is source code. the biggest example being goto, which you would have to recreate with some sort of loop construct

id be interested in seeing how you have written it, is it based on another decompiler?

I was concerned about goto for awhile, but there actually is a way to implement goto in java source code. BUT, like I said, it’s not pretty.

It is based on another (GNU/GPL, of course) program, but it’s not a decompiler per-se.

edit: sort of offtopic, did anyone know you could do this in java?

Initializer blocks for instance variables look just like static initializer blocks, but without the static keyword:
{

    // whatever code is needed for initialization goes here
}</blockquote>

I most certainly did NOT know that before, interesting.

edit2: I’m down to only 70-something errors in a rather large source file, most of them are due to the dalvik vm supporting boolean types through an int, and I have to find a good way to convert back and forth automatically. (yes, I’m converting from .dex and not .class, it’s easier/cleaner since it’s register based instead of stack based) It’s 5am so I better get to bed and get back to this during the weekend. Fun fun. :smiley:

Regarding GOTO, the way I can envision implementing it is putting the code that will be GOTO’d to in a named loop that will iterate only once, then doing continue loopName; from inside to go back, though I’m not sure if this is possible in all cases.

EDIT: I did not know that either.

bads

a reliable java decompiler doesn’t require it can decompile the most fucked up code unless it has a deobfuscation transform. frank_ and i have been planning a deobfuscator for a long time

Planning doesn’t get you anywhere unless it’s carried out, I’ve planned a lot of things, look an Cherokee for just one example.

It would be good though if a decompiler could decompile even the most obfuscated code though, wouldn’t it? :wink:

Alright, I’ve got a first version working, sorta…

Here is the original source code that was compiled with SUN javac 1.6.0_22 on Kubuntu Lucid, no compiler options:
http://pastebin.com/LRfBZgbR

Here is the source code output by the ‘decompiler’:
http://pastebin.com/2r38S6yt (updated because of un-implemented ‘goto-16’ pointed out by frank)

It does compile without errors or warnings, and it also runs to completion (to the final ‘return’ statement in the main method, though it doesn’t actually exit, probably due to some other threads running). But it doesn’t run as expected, so I probably translated an instruction slightly wrong somewhere along the way. The instructions are the comment to the right of the statement, if you see something I did wrong I’d really appreciate you pointing me to it so I can fix it.

By now you probably know what ‘tool’ I am using, because I didn’t bother changing some names in the code, so in the next few days I’ll try to finish up the instructions I haven’t implemented yet, and release the code I have written. (which is 1 XML file, the whole transformation uses XSLT :))

So, again, if someone would like to write me a sample Java class that uses all of Java’s features to test on I’d appreciate it.

edit: Some details real quick before I go to bed (5am…)
Flow control Obfuscation should be no problem at all, because as you can see, I implement the ‘goto’ instruction in source code exactly like it is in byte code. I have not implemented try/catch yet, but it basically just makes use of gotos as well, so again, it shouldn’t be a problem.

Alright! I’ve actually got a working version that decompiles the Calculator class above, the problem was simple in concept but kind of painful in implementation. As I said before, the dalvik VM doesn’t support boolean register types, and instead uses a 0/1 in an int to indicate false/true. I originally supported booleans and tried to convert between them, but this turned into a nightmare, so I changed it to only use ints just as the dalvik VM expects.

This is the new decompile:
http://pastebin.com/3N0rvUPQ

@ the inner XMLVMElem class
Frank_ expressed distress that this is an ‘extra dependancy’ that is introduced to the code. The reason for this is because the dalvik VM is register based, and a register can hold variables of any type, and are often re-used with different types. So there is only one other way I can think of to handle this. It would be to do something similar to the following:

Say __r0 holds a float and an int throughout the method, currently it does this in this way:

XMLVMElem __r0 = new XMLVMElem();
__r0.i = int_value;
someIntMethod(__r0.i);
__r0.f = float_value;
someFloatMethod(__r0.f);

This only requires one XMLVMElem variable, which can be reused unlimited number of times, and looks like this:

static class XMLVMElem {
     byte bt;
     short s;
     int i;
     long l;
     float f;
     double d;
     char c;
     Object o;
}

The alternative is to scan through the method, seeing which types each register needs to support, and declaring them thus:

int __r0i;
float __r0f;
__r0i = int_value;
someIntMethod(__r0i);
__r0f = float_value;
someFloatMethod(__r0f);

It’s slightly more complicated, but may be better. The way I’m writing the code it will be extremely easy to change to this method after it is all done.

So, what do you think so far?

id have thought the xmlvmelem would be more like a union

It is basically a union, but you can’t do that in Java, you can’t even implement it through inheritance with basic types.

I’ve cleaned up the code substantially, added support for more types of member and static types, and most importantly added array support:

This is my current test class:
http://pastebin.com/36xusAET
and decompiled:
http://pastebin.com/1RPjhBMx

As expected, loops ‘just work’.

Also, I learned something today, I didn’t know you could cast an Object to any type and dimension of array (even primitive types), who knew? :slight_smile:

[quote=“Moparisthebest, post:12, topic:370275”]Also, I learned something today, I didn’t know you could cast an Object to any type and dimension of array (even primitive types), who knew? :)[/quote]Pretty sure most people knew.

I believe I am actually done with this, everything should work perfectly, even Exception handling:

Test class:
http://pastebin.com/AbrxCfZL
decompiled:
http://pastebin.com/E7MFmZMV

And they actually produce identical output:

Starting Test.init()
i_s: 555
b: true, bt: 5, s: 6, i: 7, l: 8, f: 9.0, d: 10.0, c: z, o: []
b_s: true, bt_s: 5, s_s: 6, i_s: 7, l_s: 8, f_s: 9.0, d_s: 10.0, c_s: z, o_s: []
bytes[4]: 12
bytes.length: 11
fill_data: [Ljava.lang.String;@7987aeca
data: bob
data: tom
data: jim
data: tim
b: true, bt: 122, s: 122, i: 122, l: 122, f: 122.0, d: 122.0, c: z, o: []
i: 8
i: 9
i: 10
l: 9
l: 10
f: 10.0
o is not an instance of String!
o is an instance of java.util.ArrayList!
default case
this try statement is successful
this try statement is successful, until the next statement
this exception should always happen:
java.lang.ArithmeticException: / by zero
        at Test.init(Test.java:123)
        at Test.main(Test.java:136)
Finishing Test.init()
Finishing Main method

decompiled:

Starting Test.init()
i_s: 555
b: true, bt: 5, s: 6, i: 7, l: 8, f: 9.0, d: 10.0, c: z, o: []
b_s: true, bt_s: 5, s_s: 6, i_s: 7, l_s: 8, f_s: 9.0, d_s: 10.0, c_s: z, o_s: []
bytes[4]: 12
bytes.length: 11
fill_data: [Ljava.lang.String;@3ae48e1b
data: bob
data: tom
data: jim
data: tim
b: true, bt: 122, s: 122, i: 122, l: 122, f: 122.0, d: 122.0, c: z, o: []
i: 8
i: 9
i: 10
l: 9
l: 10
f: 10.0
o is not an instance of String!
o is an instance of java.util.ArrayList!
default case
this try statement is successful
this try statement is successful, until the next statement
this exception should always happen:
java.lang.ArithmeticException: / by zero
        at Test.init(Test.java:581)
        at Test.main(Test.java:36)
Finishing Test.init()
Finishing Main method

I’ll be submitting this to the XMLVM project soon, then I’ll release a binary for people to have fun with.

whats with the loops around switches that shouldnt be there

They definitely SHOULD be there, that is how gotos are implemented. Each case in the switch statement is a possible branch, so, with the loop on the switch statement, a ‘__next_label = 881; break;’ is equivalent to a goto. When the code reaches it’s natural end it returns from the method, breaking out of the loop.

but you have that construct where there isnt any control flow changes

Yes, it is actually in every method. This is basically a direct 1:1 mapping of byte code to source code, so every method would look and work the same.

A neat thing to do would be to take this concept of doing ‘gotos’ in source code and modify something like JODE that works good most of the time to fall back to this method when it can’t reduce the flow control to loops and such, then it could even decompile obfuscated code.

it appears the reason “your” so-called “decompiler” is so reliable is because it is has nothing to be reliable about.

it gained 500 lines…