PDF Stream Dumper

Author: David Zimmer
Date: 07.21.10 - 7:55pm

This is a free tool for the analysis of malicious PDF documents. This tool has been made possible through the use of a mountain of open source code. Thank you to all of the authors involved.

Has specialized tools for dealing with obsfuscated javascript, low level pdf headers and objects, and shellcode. In terms of shellcode analysis, it has an integrated interface for libemu sctest, an updated build of iDefense sclog, and a shellcode_2_exe feature.
Javascript tools include integration with JS Beautifier for code formatting, live script debugging, toolbox classes to handle extra canned functionality, as well as a pretty stable refactoring engine that will parse a script and replace all the screwy random function and variable names with logical sanitized versions for readability.

Tool also supports unescaping/formatting manipulated pdf headers, as well as being able to decode filter chains (multiple filters applied to the same stream object.)

Download: PDF Stream Dumper Setup (Version: 0.9.624)
Source code

Supported Platforms: Win2k, XP, Vista, Win7

Update: Stream Parser has finally been optimized and is now 20x faster.

Training videos for PDFStreamDumper: If you are looking for malicious pdf samples to analyze make sure to check out the Contagio and jsunpack sites.

Full feature list
  • supported filters: FlateDecode, RunLengthDecode, ASCIIHEXDecode, ASCII85Decode, LZWDecode, JBIG2, CCITTFaxDecode, DecodePredictors
  • Integrated shellcode tools:
    • sclog gui (Shellcode Analysis tool I wrote at iDefense)
    • scdbg libemu based Shellcode analysis tool
    • Shellcode_2_Exe functionality
    • Export unescaped bytes to file
  • supports filter chaining (ie multiple filters applied to same stream)
  • supports unescaping encoded pdf headers
  • scriptable interface to process multiple files and generate reports
  • view all pdf objects
  • view deflated streams
  • view stream details such as file offsets, header, etc
  • save raw and deflated data
  • search streams for strings
  • scan for functions which contain pdf exploits (dumb scan)
  • format javascript using js beautifier (see credits in readme)
  • view streams as hex dumps
  • zlib compress/decompress arbitrary files
  • replace/update pdf streams with your own data
  • basic javascript interface so you can run parts of embedded scripts + support for using MS Script Debugger
  • PdfDecryptor w/source - uses iTextSharp and requires .Net Framework 2.0
  • Basic Javascript de-obsfuscator
  • can hide: header only streams, duplicate streams, selected streams
  • js ui also has access to a toolbox class to
    • simplify fragmented strings
    • read/write files
    • do hexdumps
    • do unicode safe unescapes
    • disassembler engine
    • replicate some common Adobe API (new)
Current Automation scripts include:
  • csv_stats.vbs - Builds csv file with results from lower status bar for all files in a directory
  • pdfbox_extract.vbs - use pdfbox to extract all images and text from current file
  • string_scan.vbs - scan all decompressed streams in all files in a directory for a string you enter
  • unsupported_filters.vbs - scan a directory and build list of all pdfs which have unsupported filters
  • filter_chains.vbs - recursivly scans parent dir for pdfs that use multiple encoding filters on a stream.
  • obsfuscated_headers.vbs - recursivly scans parent dir for pdfs that have obsfuscated object headers
  • pdfbox_extract_text_page_by_page.vbs - uses pdfbox to extract page data into individual files

Current Plugins include:
  • Build_DB.dll - Search and sort data inside multiple samples, move and organize files
  • obj_browser.dll - view layout and data inside pdf in text form

stream parser was written by VBboy136 - 12/9/2008
Scintilla by Neil Hodgson [neilh@scintilla.org] 

ScintillaVB by Stu Collier

AS3 Sorcerer Trial provided courtesy of Manitu Group. 

JS Beautify by Einar Lielmanis, _
conversion to Javascript code by Vital, 
zlib.dll by Jean-loup Gailly and Mark Adler
CRC32 code by Steve McMahon
iTextDecode/iTextFilters use iTextSharp by Bruno Lowagie and Paulo Soares
olly.dll GPL code Copyright (C) 2001 Oleh Yuschuk.
MuPDF is released under GPL and Copyright 2006-2012 Artifex Software, Inc.

CCTIFaxDecoder copyright Sun MicroSystems and intarsys consulting GmbH.
libemu written by Paul Baecher and Markus Koetter 2007.	

scdbg homepage
sclog is a tool i wrote back at iDefense (no longer available on their site)

Interface by dzzie@yahoo.com 
WinGraphViz OOD Tsen oodtsen@gmail.com

GraphViz - AT&T Labs

Other thanks to Didier Stevens for the info on his blog on tags and encodings.

Comments: (16)

On 08.20.10 - 2:16pm Dave wrote:
lotta noise searching for pdf decrypter source..here are some of the more interesting links:

On 08.21.10 - 4:43pm Dave wrote:
Reading the iText source is good info too, but its mammoth. Turns out if only the owner password is set, you can use iText to make a copy of the pages and transfer them into a new pdf so they are not encrypted anymore and then this can parse them again. Also I think the luckysploit pdf exploits use what may be a malformed pdf It is owner password encrypted, but has no password set. I could not create this condition in Acrobat pro 7.x anyway

On 12.05.10 - 6:55am Dave wrote:
0.9.125 is out, bugfix release..had to do some fixups in sclog to clean up output and make sure hooks for UrlDownloadToFile were being installed correctly. Couple small usability additions were added to main exe as well.

New feature: If javascript is broken up across multiple streams, you can control select the streams and hit JS_UI menu item and it will grab them all and put them all into the JS ui together..also did a bunch of small bug fixes. Forcing all FlateDecode through zlib for now..noticable slower on some files, but the iText FlateDecode was causing unexpected crashs on long automation scripts.

On 12.19.10 - 1:02pm dave wrote:
we now have basic support for things like app.doc.getAnnots, app.info.title etc. there are some quirks so you might have to play with it some. I didnt really want to add this feature because its problamatic and will spoil your manual skills..but it is useful and can speed things up a lot. Just dont expect it to be perfect.

On 01.08.11 - 7:33am Dave wrote:
quick new usability feature, You can drag and drop files onto the desktop icon to launch them. 3 actions based on file type:
  • js/vbs files are treated as automation scripts
  • .sc file is treated and shellcode and loaded in escaped format in the js form ready for analysis
  • any other file is considered a pdf and it tries to load it as such.

On 12.27.11 - 5:52pm Dave wrote:
remember if you get a script that uses a = "string"; a[x] to access individual characters, you have to replace it to a.charAt(x) or do a = "string".split("") to explicitly turn it into an array for the MS Script control to work with it. Bug fix today dealing with the escaping of headers which contain JS scripts in them and added the ability to override the default connection string in the sample db plugin.

On 01.19.12 - 1:29am Marc wrote:
i just want to have a portable version of that tool, while investigating an infected system I like using a software adhock, no installing and so. Some malware crashes installation some change the code on the fly etc... nice to see a portable version in the future Marc

On 01.19.12 - 3:53am Dave wrote:
Hi Marc, thats an interesting idea. It makes heavy use of COM object components so at a minimum you would have to run a batch script to register those first (add to registry). I will whip up a minimal binary install batch file today and post here. Some packers/binders can actually build in the target dlls to the main executable, but I am not sure if they support COM objects as well. that would be powerful kungfu.

If the system is wacked enough that you cant install stuff, I usually boot off a live CD and go into file retrieval mode to analyze the files on a clean analysis system. I never trust an infected system to be stable or tell me the truth.

Also many malicious PDFs will delete themselves as part of the infection routine, replacing the exploit file with a clean copy. finding the actual infection vector will require tracing it back to the source such as the URL of the pdf in web browser history, or email attachment and re-downloading the file.

On 01.19.12 - 10:34pm Dave wrote:
command line option added to extract objects such as flash, fonts, prc, u3d

pdfstreamdumper "c:\file.pdf" /extract "c:\folder"

On 06.06.12 - 7:26pm Dave wrote:
Not real happy with the builtin decrypter functionality using iTextSharp. I will probably remove it soon and add a plugin to launch one of these:

Online PDF Decrypters:

On 06.24.13 - 1:16pm Bill Orvis wrote:
I have found this not only useful for examining PDFs but also for malicious web pages containing javascript, such as blackhole. Just drag the .html page into the input, switch to the java editor and edit out the html. Works great.

On 07.15.15 - 11:49am dave wrote:
Debugger options: Note: I have never been impressed with the MS script debugging capabilities and it doesnt always connect. Sometimes I have to have a VS instance open for it to work. (Probably requires mdm.exe running in background, maybe I will have to manage its startup/exit)

I am also doing some experiments on how to replace the MS script engine entirely in favor of a more standards compliant one. I am currently exploring both QtScript as well as DukTape. Both of these support integrated debugging capabilities and with some more elbow grease can be used with VB6

On 06.17.17 - 12:00am dAVE wrote:
Good set of links for other pdf tools: http://forensicswiki.org/wiki/PDF

On 08.28.17 - 8:04am Grimm Master wrote:
Hello. I need this tool in my work.

On 08.11.19 - 5:24am Dave wrote:
some good links and other tools for pdf structure analysis: https://stackoverflow.com/questions/3549541/best-tool-for-inspecting-pdf-files

On 10.18.19 - 3:55pm Edwin Steiner wrote:
Thanks you very much for this great tool!

Leave Comment:
Email: (not shown)
Message: (Required)
Math Question: 82 + 14 = ? followed by the letter: L 

About Me
More Blogs
Main Site
vbdec dbg updates
vb6 PCode NOP
vb6 API and call backs
how pcode works Pt1
Reversing PCode Args
VB6 PCode Disassembly
VB6 PCode Debugger
UConnect Disable Cell Modem
IDA python over IPC
dns wildcard blocking
64bit IDA Plugins
anterior lines
misc news/updates
Decoders again
CDO.Message Breakpoints
SysAnalyzer Updates
SysAnalyzer and Site Updates
crazy decoder
ida js w/dbg
flash patching #2
JS Graphing
packet reassembly
Delphi IDA Plugin
scdbg IDA integration
API Hash Database
Winmerge plugin
IDACompare Updates
Guest Post @ hexblog
TCP Stream Reassembly
SysAnalyzer Updates
Apilogger Video
Shellcode2Exe trainer
scdbg updates
IDA Javascript w/IDE
Rop Analysis II
scdbg vrs ROP
flash patching
x64 Hooks
micro hook
jmp api+5 *2
SysAnalyzer Updates
InjDll runtime config
C# Asm/Dsm Library
Shellcode Hook Detection
Updates II
Java Hacking
Windows 8
Win7 x64
Graphing ideas
.Net Hacking
Old iDefense Releases
hll shellcode
ActionScript Tips
-patch fu
scdbg ordinal lookup
scdbg -api mode
Peb Module Lists
scdbg vrs Process Injection
GetProcAddress Scanner
scdbg fopen mode
scdbg findsc mode
scdbg MemMonitor
demo shellcodes
scdbg download
api hashs redux
Api hash gen
Retro XSS Chat Codes
Exe as DLL
Olly Plugins
Debugging Explorer
Attach to hidden process
JS Refactoring
Asm and Shellcode in CSharp
Fancy Return Address
PDF Stream Dumper
Malcode Call API by Hash
WinDbg Cheat Sheet
GPG Automation