Deobfuscating Datalife
So I was digging through whatCMS.org looking for proprietary php code to practice deobfuscation.. as one does. And I came across this obscure CMS called Datalife Engine. They offer a free demo on their website which you can download as a zip achieve.
upload
├── admin.php
├── backup
│ └── index.html
├── cron.php
├── engine
│ ├── ajax
│ │ ├── addcomments.php
│ │ ├── adminfunction.php
│ │ ├── allvotes.php
│ │ ├── antivirus.php
│ │ ├── bbcode.php
│ │ ├── calendar.php
│ │ ├── clean.php
│ │ ├── comments.php
│ │ ├── commentssubscribe.php
│ │ ├── complaint.php
│ │ ├── controller.php
│ │ ├── deletecomments.php
│ │ ├── editcomments.php
│ │ ├── editnews.php
│ │ ├── favorites.php
│ │ ├── feedback.php
│ │ ├── find_relates.php
│ │ ├── find_tags.php
│ │ ├── keywords.php
etc..
1408 directories, 3142 files
Aside from some instructions on how to to set it up, every other php file in the archive looks like this:
<?php
/*
=====================================================
DataLife Engine - by SoftNews Media Group
-----------------------------------------------------
http://dle-news.ru/
-----------------------------------------------------
Copyright (c) 2004,2021 SoftNews Media Group
=====================================================
*/
?><?php $_F=__FILE__;$_X='[KMy45Y3MzX3swY1s5Wygisuperlongstring]..';
$_D=strrev('edoced_46esab');eval($_D('[PS5Fey1sczM5R3NrOmEvc1anotherlongstring]');?>
Clearly some intricate bootstrapping is going on here and I had to find out how it works. This thing almost presents itself as a fun CTF beginner challenge.
Working with what we have⌗
__FILE__
This is a php constant used to return the full path of the current file
strrev('edoced_46esab')
strrev reverses a string in php and when given the argument 'edoced_46esab'
it will return base64_decode
.
base64_decode
Literally decodes a base64 encoded string.
eval
A php built-in that will evaluate a string as php code.
Grabbing and decoding the arguments to eval⌗
given these variables my first guess is that we will need to base64 decode the given strings then evaluate their output with a php interpreter to get more information. There are two obviously encoded strings here, I started with the second one because it was shorter. Here’s how I extracted that string using some python regex magic.
import re, base64, subprocess, sys
def main(file_path = sys.argv[1]):
decode_shit(file_path)
def decode_shit(file_path):
with open(file_path) as f:
text = f.readlines()
eval_line = ""
for i in text:
if "eval" in i:
# grabs only the base64 encoded string
eval_str = re.search("eval\(\$_D\(\'(.*)\'", i)
eval_str = eval_str.group(1)
# decodes this bas64 encoded string
eval_str = base64.b64decode(eval_str).decode()
print(eval_str)
Formatted Output
<?php
$_X = base64_decode($_X);
$_X = strtr(
$_X,
'.<sd[GNU {cyPpZiT/w}D6RoO4zKjvx3]>V
grWh7JS92lank=AmC01MIuFqEHQ8bYBef5XtL',
'c3o=sr.ALeiHy68]Gpm[Kf{BJ5bFNT9nPVQk<Ij0>u
tYC UlaOESxMqv47zhRWwd/g}XZD21'
);
$_R = str_replace('__FILE__', "'" . $_F . "'", $_X);
eval($_R);
$_R = 0;
$_X = 0;
This reveals that when evaluated our string will decode $_X
as seen
in the original file then run PHP’s builtin strtr
on $_X
using seemingly
random characters as it’s arguments.
strtr()
as per PHP’s documentation:
strtr — Translate characters or replace substrings
My guess is that this is similar to the unix tr - translate characters
command which can be used to substitute strings with substring and such:
ex:
echo 'notjoemartinez' | tr 'oat' '0@+'
output
n0+j0em@r+inez
In order to verify this I grabbed the base 64 encoded string
from $_X
using regex then piped it the command line decoder
var_x = re.search("__FILE__;\$_X=\'(.*)\';\$", i)
var_x = var_x.group(1)
print(var_x)
python3 decode_code.py antibot.php | base64 -d
output
SaKck{:a=39czs9N/E/
S-----------------------------------------------------
San[{:alU]vlyU
Sddddddddddddddddddddddddddddddddddddddddddddddddddddd
S*Y
S
S
S#aDlU]vlyUa]HAOmlva>mHCrAjatNh
S
S#aUJ9sw=9c.a9{[9a9sa9{kka.sw/J9{G[a=3baEJw=3[a=/=G9
S
S#als/PGcBE9azPaDGJBksIaC{GB{c,athhp,athhF,athhZ,athLL
S#a888N.=/9.E=NGJ,a888N
and so on...
but if we pipe that decoded output to the specified tr
command
including newline characters we get this
python3 decode_code.py antibot.php | base64 -d | tr '.<sd[GNU {cyPpZiT/w}D6RoO4zKjvx3]>V
grWh7JS92lank=AmC01MIuFqEHQ8bYBef5XtL' 'c3o=sr.ALeiHy68]Gpm[Kf{BJ5bFNT9nPVQk<Ij0>u
tYC UlaOESxMqv47zhRWwd/g}XZD21
output
?><?php
/*
=====================================================
DataLife Engine - by SoftNews Media Group
-----------------------------------------------------
http://dle-news.ru/
-----------------------------------------------------
Copyright (c) 2004-2019 SoftNews Media Group
=====================================================
This code is protected by copyright
=====================================================
File: antibot.php
-----------------------------------------------------
Use: CAPTCHA
=====================================================
*/
# KCAPTCHA PROJECT VERSION 2.0
# Automatic test to tell computers and humans apart
and so on...
$_R
⌗
I’m still stumped on what this last line does
$_R = str_replace('__FILE__', "'" . $_F . "'", $_X);
eval($_R);
$_R = 0;
$_X = 0;
It looks like something to overwrite the files content with the decoded version of itself but if that is the case, what was the point of obfuscating the file? I’ve yet to actually run the startup script so who knows.
In my attempts to see what this thing does I realized that I was using python to write and execute php… cool?
eval_str = re.search("eval\(\$_D\(\'(.*)\'", i)
eval_str = eval_str.group(1)
eval_str = base64.b64decode(eval_str).decode()
var_x = re.search("__FILE__;\$_X=\'(.*)\';\$", i)
var_x = var_x.group(1)
temp_dcfp = f"tmp_{file_path[-4]}_decoded.php"
f = open(temp_dcfp, "a")
f.write("<?php\n")
f.write("$_F=__FILE__;")
f.write("$_X=\'{}\';".format(var_x))
f.write(eval_str)
f.write("echo $_R;")
f.write("\n?>")
f.close()
subprocess.run(f"php -f {temp_dcfp}", shell=True)
This results in the same output as the bash script I showed earlier but with a traceback referencing bunch of undefined global variables.
I refactored the script one more time to remove the confusing $_R
statements and redirected the output to a _decoded.php
file.
and here’s what that looks like.
import re, base64, subprocess, sys
def main(file_path = sys.argv[1]):
decode_shit(file_path)
def decode_shit(file_path):
with open(file_path) as f:
text = f.readlines()
eval_line = ""
for i in text:
if "eval" in i:
eval_str = re.search("eval\(\$_D\(\'(.*)\'", i)
eval_str = eval_str.group(1)
eval_str = base64.b64decode(eval_str).decode()
eval_str = re.search("^(.*)\$_R=str_rep",eval_str,re.DOTALL)
eval_str = eval_str.group(1)
var_x = re.search("__FILE__;\$_X=\'(.*)\';\$", i)
var_x = var_x.group(1)
temp_dcfp = f"tmp_{file_path[-4]}_decoded.php"
f = open(temp_dcfp, "a")
f.write("<?php\n")
f.write("$_X=\'{}\';".format(var_x))
f.write(eval_str)
f.write("echo $_X;")
f.write("\n?>")
f.close()
# real decoded filepath
real_dcfp = f"{file_path[:-4]}_decoded.php"
subprocess.run(f"php -f {temp_dcfp} >> {real_dcfp} ", shell=True)
if __name__ == '__main__':
main()
Part of the output file⌗
class KCAPTCHA{
// generates keystring and image
function __construct(){
$alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"; # do not change without changing font files!
# symbols used to draw CAPTCHA
//$allowed_symbols = "0123456789"; #digits
$allowed_symbols = "23456789abcdegikpqsvxyz"; #alphabet without similar symbols (o=0, 1=l, i=j, t=f)
}
As a side note, the code this obfuscation was protecting is part of the open source and weirdly spelled Kcaptcha project. It generates supper weak “Completely Automated Public Turing test to tell Computers and Humans Apart” tests. I’ve always wanted to train a captcha solver and this might be good start.