PDF Stream Dumper


Author: David Zimmer
Date: 07.21.10 - 7:55pm



This is a free tool for the analysis of malicious PDF documents. This tool has been made possible through the use of a mountain of open source code. Thank you to all of the authors involved.

Has specialized tools for dealing with obsfuscated javascript, low level pdf headers and objects, and shellcode. In terms of shellcode analysis, it has an integrated interface for libemu sctest, an updated build of iDefense sclog, and a shellcode_2_exe feature.
Javascript tools include integration with JS Beautifier for code formatting, the ability to run portions of the script live for live deobsfuscation, toolbox classes to handle extra canned functionality, as well as a pretty stable refactoring engine that will parse a script and replace all the screwy random function and variable names with logical sanitized versions for readability.

Tool also supports unescaping/formatting manipulated pdf headers, as well as being able to decode filter chains (multiple filters applied to the same stream object.)

Download: PDF Stream Dumper Setup     (Version: 0.9.519)
MD5: 0C90EEA25A2F19DF29E3F145E28676D5
Source code

Supported Platforms: Win2k, XP, Vista, Win7

Update: Stream Parser has finally been optimized and is now 20x faster.

Training videos for PDFStreamDumper: If you are looking for malicious pdf samples to analyze make sure to check out the Contagio and jsunpack sites.

Full feature list
  • supported filters: FlateDecode, RunLengthDecode, ASCIIHEXDecode, ASCII85Decode, LZWDecode, JBIG2, CCITTFaxDecode, DecodePredictors
  • Integrated shellcode tools:
    • sclog gui (Shellcode Analysis tool I wrote at iDefense)
    • scdbg libemu based Shellcode analysis tool
    • Shellcode_2_Exe functionality
    • Export unescaped bytes to file
  • supports filter chaining (ie multiple filters applied to same stream)
  • supports unescaping encoded pdf headers
  • scriptable interface to process multiple files and generate reports
  • view all pdf objects
  • view deflated streams
  • view stream details such as file offsets, header, etc
  • save raw and deflated data
  • search streams for strings
  • scan for functions which contain pdf exploits (dumb scan)
  • format javascript using js beautifier (see credits in readme)
  • view streams as hex dumps
  • zlib compress/decompress arbitrary files
  • replace/update pdf streams with your own data
  • basic javascript interface so you can run parts of embedded scripts
  • PdfDecryptor w/source - uses iTextSharp and requires .Net Framework 2.0
  • Basic Javascript de-obsfuscator
  • can hide: header only streams, duplicate streams, selected streams
  • js ui also has access to a toolbox class to
    • simplify fragmented strings
    • read/write files
    • do hexdumps
    • do unicode safe unescapes
    • disassembler engine
    • replicate some common Adobe API (new)
Current Automation scripts include:
  • csv_stats.vbs - Builds csv file with results from lower status bar for all files in a directory
  • pdfbox_extract.vbs - use pdfbox to extract all images and text from current file
  • string_scan.vbs - scan all decompressed streams in all files in a directory for a string you enter
  • unsupported_filters.vbs - scan a directory and build list of all pdfs which have unsupported filters
  • filter_chains.vbs - recursivly scans parent dir for pdfs that use multiple encoding filters on a stream.
  • obsfuscated_headers.vbs - recursivly scans parent dir for pdfs that have obsfuscated object headers
  • pdfbox_extract_text_page_by_page.vbs - uses pdfbox to extract page data into individual files

Current Plugins include:
  • Build_DB.dll - Search and sort data inside multiple samples, move and organize files
  • obj_browser.dll - view layout and data inside pdf in text form


Credits:
---------------------------
stream parser was written by VBboy136 - 12/9/2008
http://www.codeproject.com/KB/DLL/PDF2TXTVB.aspx
	
Scintilla by Neil Hodgson [neilh@scintilla.org] 
http://www.scintilla.org/

ScintillaVB by Stu Collier
http://www.ceditmx.com/software/scintilla-vb/

AS3 Sorcerer Trial provided courtesy of Manitu Group. 
http://www.as3sorcerer.com/

JS Beautify by Einar Lielmanis, _
conversion to Javascript code by Vital, 
http://jsbeautifier.org/
	
zlib.dll by Jean-loup Gailly and Mark Adler
http://www.zlib.net/
	
CRC32 code by Steve McMahon
http://www.vbaccelerator.com/home/vb/code/libraries/CRC32/article.asp
	
iTextDecode/iTextFilters use iTextSharp by Bruno Lowagie and Paulo Soares
http://itextpdf.com/terms-of-use/index.php
	
olly.dll GPL code Copyright (C) 2001 Oleh Yuschuk.
http://home.t-online.de/home/Ollydbg/
http://sandsprite.com/CodeStuff/olly_dll.html
	
MuPDF is released under GPL and Copyright 2006-2012 Artifex Software, Inc.
http://www.mupdf.com/

CCTIFaxDecoder copyright Sun MicroSystems and intarsys consulting GmbH.
http://java.net/projects/pdf-renderer/
  
libemu written by Paul Baecher and Markus Koetter 2007.	
http://libemu.carnivore.it/about.html

scdbg homepage
http://sandsprite.com/blogs/index.php?uid=7&pid=152
	
sclog is a tool i wrote back at iDefense (no longer available on their site)
https://github.com/dzzie/sclog

Interface by dzzie@yahoo.com 
http://sandsprite.com
	
Other thanks to Didier Stevens for the info on his blog on tags and encodings.
http://blog.didierstevens.com/2008/04/29/pdf-let-me-count-the-ways




















RSS Feed
About Me
Home

Posts:
Delphi IDA Plugin
scdbg IDA integration
Embeded device browsing
API Hash Database
Winmerge plugin
IDACompare Updates
Guest Post @ hexblog
TCP Stream Reassembly
SysAnalyzer Updates
Apilogger Video
Shellcode2Exe trainer
Car System Hacks
KeyPad Bruteforcer
scdbg updates
IDA Javascript w/IDE
Rop Analysis II
scdbg vrs ROP
flash patching
x64 Hooks
micro hook
jmp api+5 *2
SysAnalyzer Updates
InjDll runtime config
C# Asm/Dsm Library
Shellcode Hook Detection
Updates II
findDll
Java Hacking
Windows 8
Win7 x64
bugfix
Graphing ideas
.Net Hacking
Old iDefense Releases
BootLoaders
hll shellcode
ActionScript Tips
-patch fu
scdbg ordinal lookup
scdbg -api mode
Peb Module Lists
scdbg vrs Process Injection
GetProcAddress Scanner
scdbg fopen mode
scdbg findsc mode
scdbg MemMonitor
demo shellcodes
scdbg download
api hashs redux
Api hash gen
Retro XSS Chat Codes
Exe as DLL
Olly Plugins
Debugging Explorer
Attach to hidden process
JS Refactoring
Asm and Shellcode in CSharp
Fancy Return Address
PDF Stream Dumper
Malcode Call API by Hash
WinDbg Cheat Sheet
GPG Automation


Comments: (11)

On 08.20.10 - 2:16pm Dave wrote:
lotta noise searching for pdf decrypter source..here are some of the more interesting links:

On 08.21.10 - 4:43pm Dave wrote:
Reading the iText source is good info too, but its mammoth. Turns out if only the owner password is set, you can use iText to make a copy of the pages and transfer them into a new pdf so they are not encrypted anymore and then this can parse them again. Also I think the luckysploit pdf exploits use what may be a malformed pdf It is owner password encrypted, but has no password set. I could not create this condition in Acrobat pro 7.x anyway

On 12.05.10 - 6:55am Dave wrote:
0.9.125 is out, bugfix release..had to do some fixups in sclog to clean up output and make sure hooks for UrlDownloadToFile were being installed correctly. Couple small usability additions were added to main exe as well.

New feature: If javascript is broken up across multiple streams, you can control select the streams and hit JS_UI menu item and it will grab them all and put them all into the JS ui together..also did a bunch of small bug fixes. Forcing all FlateDecode through zlib for now..noticable slower on some files, but the iText FlateDecode was causing unexpected crashs on long automation scripts.

On 12.19.10 - 1:02pm dave wrote:
we now have basic support for things like app.doc.getAnnots, app.info.title etc. there are some quirks so you might have to play with it some. I didnt really want to add this feature because its problamatic and will spoil your manual skills..but it is useful and can speed things up a lot. Just dont expect it to be perfect.

On 01.08.11 - 7:33am Dave wrote:
quick new usability feature, You can drag and drop files onto the desktop icon to launch them. 3 actions based on file type:
  • js/vbs files are treated as automation scripts
  • .sc file is treated and shellcode and loaded in escaped format in the js form ready for analysis
  • any other file is considered a pdf and it tries to load it as such.

On 12.27.11 - 5:52pm Dave wrote:
remember if you get a script that uses a = "string"; a[x] to access individual characters, you have to replace it to a.charAt(x) or do a = "string".split("") to explicitly turn it into an array for the MS Script control to work with it. Bug fix today dealing with the escaping of headers which contain JS scripts in them and added the ability to override the default connection string in the sample db plugin.

On 01.19.12 - 1:29am Marc wrote:
i just want to have a portable version of that tool, while investigating an infected system I like using a software adhock, no installing and so. Some malware crashes installation some change the code on the fly etc... nice to see a portable version in the future Marc

On 01.19.12 - 3:53am Dave wrote:
Hi Marc, thats an interesting idea. It makes heavy use of COM object components so at a minimum you would have to run a batch script to register those first (add to registry). I will whip up a minimal binary install batch file today and post here. Some packers/binders can actually build in the target dlls to the main executable, but I am not sure if they support COM objects as well. that would be powerful kungfu.

If the system is wacked enough that you cant install stuff, I usually boot off a live CD and go into file retrieval mode to analyze the files on a clean analysis system. I never trust an infected system to be stable or tell me the truth.

Also many malicious PDFs will delete themselves as part of the infection routine, replacing the exploit file with a clean copy. finding the actual infection vector will require tracing it back to the source such as the URL of the pdf in web browser history, or email attachment and re-downloading the file.

On 01.19.12 - 10:34pm Dave wrote:
command line option added to extract objects such as flash, fonts, prc, u3d

usage;
pdfstreamdumper "c:\file.pdf" /extract "c:\folder"

On 06.06.12 - 7:26pm Dave wrote:
Not real happy with the builtin decrypter functionality using iTextSharp. I will probably remove it soon and add a plugin to launch one of these:

Online PDF Decrypters:

On 06.24.13 - 1:16pm Bill Orvis wrote:
I have found this not only useful for examining PDFs but also for malicious web pages containing javascript, such as blackhole. Just drag the .html page into the input, switch to the java editor and edit out the html. Works great.

 
Leave Comment:
Name:
Email: (not shown)
Message: (Required)
Math Question: 31 + 93 = ? followed by the letter: P