This is a free tool for the analysis of malicious PDF documents. This tool has been made possible through the use of a mountain of open source code. Thank you to all of the authors involved.
Has specialized tools for dealing with obsfuscated javascript, low level pdf headers and objects, and shellcode. In terms of shellcode analysis, it has an integrated interface for libemu sctest, an updated build of iDefense sclog, and a shellcode_2_exe feature.
Javascript tools include integration with JS Beautifier for code formatting, live script debugging, toolbox classes to handle extra canned functionality, as well as a pretty stable refactoring engine that will parse a script and replace all the screwy random function and variable names with logical sanitized versions for readability.
Tool also supports unescaping/formatting manipulated pdf headers, as well as being able to decode filter chains (multiple filters applied to the same stream object.)
Note: .NET v2/3 is required for some filters to work. Add remove programs -> Install Optional Components. If you are on a domain you might get an stupid error when trying to install it.
Supported Platforms: WinXP+
If you are looking for malicious pdf samples to analyze make sure to check out the Contagio and jsunpack sites.
supports filter chaining (ie multiple filters applied to same stream)
supports unescaping encoded pdf headers
scriptable interface to process multiple files and generate reports
view all pdf objects
view deflated streams
view stream details such as file offsets, header, etc
save raw and deflated data
search streams for strings
scan for functions which contain pdf exploits (dumb scan)
format javascript using js beautifier (see credits in readme)
view streams as hex dumps
zlib compress/decompress arbitrary files
replace/update pdf streams with your own data
basic javascript interface so you can run parts of embedded scripts + support for using MS Script Debugger
PdfDecryptor w/source - uses iTextSharp and requires .Net Framework 2.0
Basic Javascript de-obsfuscator
can hide: header only streams, duplicate streams, selected streams
js ui also has access to a toolbox class to
simplify fragmented strings
read/write files
do hexdumps
do unicode safe unescapes
disassembler engine
replicate some common Adobe API (new)
Current Automation scripts include:
csv_stats.vbs - Builds csv file with results from lower status bar for all files in a directory
pdfbox_extract.vbs - use pdfbox to extract all images and text from current file
string_scan.vbs - scan all decompressed streams in all files in a directory for a string you enter
unsupported_filters.vbs - scan a directory and build list of all pdfs which have unsupported filters
filter_chains.vbs - recursivly scans parent dir for pdfs that use multiple encoding filters on a stream.
obsfuscated_headers.vbs - recursivly scans parent dir for pdfs that have obsfuscated object headers
pdfbox_extract_text_page_by_page.vbs - uses pdfbox to extract page data into individual files
Current Plugins include:
Build_DB.dll - Search and sort data inside multiple samples, move and organize files
obj_browser.dll - view layout and data inside pdf in text form
Credits:
---------------------------
stream parser was written by VBboy136 - 12/9/2008
http://www.codeproject.com/KB/DLL/PDF2TXTVB.aspx
Scintilla by Neil Hodgson [neilh@scintilla.org]
http://www.scintilla.org/
ScintillaVB by Stu Collier
http://www.ceditmx.com/software/scintilla-vb/
AS3 Sorcerer Trial provided courtesy of Manitu Group.
http://www.as3sorcerer.com/
JS Beautify by Einar Lielmanis, _
conversion to Javascript code by Vital,
http://jsbeautifier.org/
zlib.dll by Jean-loup Gailly and Mark Adler
http://www.zlib.net/
CRC32 code by Steve McMahon
http://www.vbaccelerator.com/home/vb/code/libraries/CRC32/article.asp
iTextDecode/iTextFilters use iTextSharp by Bruno Lowagie and Paulo Soares
http://itextpdf.com/terms-of-use/index.php
olly.dll GPL code Copyright (C) 2001 Oleh Yuschuk.
http://home.t-online.de/home/Ollydbg/
http://sandsprite.com/CodeStuff/olly_dll.html
MuPDF is released under GPL and Copyright 2006-2012 Artifex Software, Inc.
http://www.mupdf.com/
CCTIFaxDecoder copyright Sun MicroSystems and intarsys consulting GmbH.
http://java.net/projects/pdf-renderer/
libemu written by Paul Baecher and Markus Koetter 2007.
http://libemu.carnivore.it/about.html
scdbg homepage
http://sandsprite.com/blogs/index.php?uid=7&pid=152
sclog is a tool i wrote back at iDefense (no longer available on their site)
https://github.com/dzzie/sclog
Interface by dzzie@yahoo.com
http://sandsprite.com
WinGraphViz OOD Tsen oodtsen@gmail.com
http://wingraphviz.sourceforge.net/wingraphviz/index.htm
GraphViz - AT&T Labs
http://graphviz.org/
Other thanks to Didier Stevens for the info on his blog on tags and encodings.
http://blog.didierstevens.com/2008/04/29/pdf-let-me-count-the-ways
Comments: (17)
On 08.20.10 - 2:16pm Dave wrote:
lotta noise searching for pdf decrypter source..here are some of the more interesting links:
Reading the iText source is good info too, but its mammoth. Turns out if only the owner password is set, you can use iText to make a copy of the pages and transfer them into a new pdf so they are not encrypted anymore and then this can parse them again. Also I think the luckysploit pdf exploits use what may be a malformed pdf It is owner password encrypted, but has no password set. I could not create this condition in Acrobat pro 7.x anyway
On 12.05.10 - 6:55am Dave wrote:
0.9.125 is out, bugfix release..had to do some fixups in sclog to clean up output and make sure hooks for UrlDownloadToFile were being installed correctly. Couple small usability additions were added to main exe as well.
New feature: If javascript is broken up across multiple streams, you can control select the streams and hit JS_UI menu item and it will grab them all and put them all into the JS ui together..also did a bunch of small bug fixes. Forcing all FlateDecode through zlib for now..noticable slower on some files, but the iText FlateDecode was causing unexpected crashs on long automation scripts.
On 12.19.10 - 1:02pm dave wrote:
we now have basic support for things like app.doc.getAnnots, app.info.title etc. there are some quirks so you might have to play with it some. I didnt really want to add this feature because its problamatic and will spoil your manual skills..but it is useful and can speed things up a lot. Just dont expect it to be perfect.
On 01.08.11 - 7:33am Dave wrote:
quick new usability feature, You can drag and drop files onto the desktop icon to launch them. 3 actions based on file type:
js/vbs files are treated as automation scripts
.sc file is treated and shellcode and loaded in escaped format in the js form ready for analysis
any other file is considered a pdf and it tries to load it as such.
On 12.27.11 - 5:52pm Dave wrote:
remember if you get a script that uses a = "string"; a[x] to access individual characters, you have to replace it to a.charAt(x) or do a = "string".split("") to explicitly turn it into an array for the MS Script control to work with it. Bug fix today dealing with the escaping of headers which contain JS scripts in them and added the ability to override the default connection string in the sample db plugin.
On 01.19.12 - 1:29am Marc wrote:
i just want to have a portable version of that tool, while investigating an infected system I like using a software adhock, no installing and so. Some malware crashes installation some change the code on the fly etc...
nice to see a portable version in the future
Marc
On 01.19.12 - 3:53am Dave wrote:
Hi Marc, thats an interesting idea. It makes heavy use of COM object components so at a minimum you would have to run a batch script to register those first (add to registry). I will whip up a minimal binary install batch file today and post here. Some packers/binders can actually build in the target dlls to the main executable, but I am not sure if they support COM objects as well. that would be powerful kungfu.
If the system is wacked enough that you cant install stuff, I usually boot off a live CD and go into file retrieval mode to analyze the files on a clean analysis system. I never trust an infected system to be stable or tell me the truth.
Also many malicious PDFs will delete themselves as part of the infection routine, replacing the exploit file with a clean copy. finding the actual infection vector will require tracing it back to the source such as the URL of the pdf in web browser history, or email attachment and re-downloading the file.
On 01.19.12 - 10:34pm Dave wrote:
command line option added to extract objects such as flash, fonts, prc, u3d
I have found this not only useful for examining PDFs but also for malicious web pages containing javascript, such as blackhole. Just drag the .html page into the input, switch to the java editor and edit out the html. Works great.
Note: I have never been impressed with the MS script debugging capabilities and it doesnt always connect. Sometimes I have to have a VS instance open for it to work. (Probably requires mdm.exe running in background, maybe I will have to manage its startup/exit)
I am also doing some experiments on how to replace the MS script engine entirely in favor of a more standards compliant one. I am currently exploring both QtScript as well as DukTape. Both of these support integrated debugging capabilities and with some more elbow grease can be used with VB6