Sunday, July 11, 2010

Analysis of a malformed and exploited PDF

Hi
today I'm going to analyze an infected PDF which allows Acrobat exploitation :)

The file target is called soreheadprattler.pdf
md5: AF485196F31F66B07D87E63DFCA41239
At moment when I'm writing, referring to Virustotal, PDF is detected by 29.27% of AV ( 12/41 ), to be honest, very low rate to the potential of the exploit in question. This PDF, using the Sophos nomenclatur, is identified as Troj/PDFJs-LJ

Let's go to analyze the PDF.

First of all I take this opportunity to thank my friend Daniel for giving me the opportunity to act as tester, being still under development. Thx =)
The tool in question is PDF Insider, for more info visit ntcore.com.

Opening the PDF file in PDF Insider we immediately notice a malformation.



We warned of an unresolved xref. The xref keyword ( Cross-reference ) in PDF format are used to search the objects, in fact for this problem we have no object apparently, but this is not a problem because PDF Insider provides us special functions for finding objects to solve these mishaps ;).
In fact clicking on Detect Object we get 4 objects: 1.0, 2.0, 3.0 and 4.0.
Here a screenshot:



Each object may contain the JS code and / or compressed Stream. Of course in our case being merely 4 would not be a problem to go through each object and check for interesting content, but if it was a pdf with many object was a real suicide, unless you are masochistic :P.

PDF Insider intervenes again to our aid, showing what object or stream contains JavaScript code.



We can see that the 4.0 object contains both Stream ( compressed ) and Javascript code, as well as being the only one!
But let's go to see what's interesting inside it:



We note immediately between the Info, interesting Filters:

- LZWDecode: This indicats that data are compressed, as I said before. LZW ( Lempel-Ziv-Welsh ) is more used as a data compression algorithm in PDF;
- ASCII85Decode: Other encryption algorithm, also known as Base85 encoding used for communication protocols;
- ASCIIHexDecode: Decodes data encoded in an ASCII hexadecimal
representation, reproducing the original binary data;
- RLE: The RLE ( Run Length Decode ) decompresses data encoded using a byte-oriented run-length encoding algorithm, reproducing the original text or binary data.

Now that we have this info we can also do proper analysis of 4.0 Object. PDF Insider supports LZW algorithm and thus is able to decompress it easily to show the contents of the Stream:



What is immediately evident is the declaration of a variable, specifically named B0b. Skip to the eye because it contains a very long string. But scrolldown to see how this variable is used!

As I thought! It is used in a function that operates a character replacement. It's easy to see that there are many "@" and indeed this character will be replaced by another. Better explain the whole, below the rest of the code:



First are declared some variables. At z variable is assigned the value app.doc which is then chained to complete the function with syncAnnotScan().
Immediatly below B0b varaible is worked. BOb.replace (/ @ / g, String.fromCharCode (32-1 +6) makes a global research (-> /g) throughout the data block to find "@" char and then replace it by the function String.fromCharCode () with the symbol related to hex code 37 (32-1 +6 == 37) that corresponds to the symbol "%". Well, we obtein a new data block:



Before I mentioned app.doc and syncAnnotScan so now I report the explanation from Adobe documentation:

- app: The app object is a static object that represents the Acrobat application itself. It offers a
number of Acrobat-specific functions in addition to a variety of utility routines and
convenience functions.

- doc: The doc object is the primary interface to the PDF document, and it can be used to access
and manipulate its content. The doc object provides the interfaces between a PDF
document open in the viewer and the JavaScript interpreter.

- syncAnnotScan: The syncAnnotScan method guarantees that all annotations in the documents are scanned.

Once we've done all, we find the classic eval () function and inside the unescape() function.
First of all through the unescape function data block which we talked about before is decoded getting the horrible javascript code and then run through eval () function, so oN ().

In the next post I will explain how functions in the javascript code, which we got after these simple steps, are used to exploit vulnerabilities in various versions of Adobe.

Bye, see you in the next post. =)

No comments:

Post a Comment