PDF Stream Dumper
Author: David Zimmer
Date: 07.21.10 - 7:55pm
This is a free tool for the analysis of malicious PDF documents. This tool has been made possible through the use of a mountain of open source code. Thank you to all of the authors involved.
Tool also supports unescaping/formatting manipulated pdf headers, as well as being able to decode filter chains (multiple filters applied to the same stream object.)
Download: PDF Stream Dumper Setup
Supported Platforms: Win2k, XP, Vista, Win7
Update: Stream Parser has finally been optimized and is now 20x faster.
Training videos for PDFStreamDumper:
If you are looking for malicious pdf samples to analyze make sure to check out the Contagio and jsunpack sites.
Full feature list
Current Automation scripts include:
- supported filters: FlateDecode, RunLengthDecode, ASCIIHEXDecode, ASCII85Decode, LZWDecode, JBIG2, CCITTFaxDecode, DecodePredictors
- Integrated shellcode tools:
- sclog gui (Shellcode Analysis tool I wrote at iDefense)
- scdbg libemu based Shellcode analysis tool
- Shellcode_2_Exe functionality
- Export unescaped bytes to file
- supports filter chaining (ie multiple filters applied to same stream)
- supports unescaping encoded pdf headers
- scriptable interface to process multiple files and generate reports
- view all pdf objects
- view deflated streams
- view stream details such as file offsets, header, etc
- save raw and deflated data
- search streams for strings
- scan for functions which contain pdf exploits (dumb scan)
- view streams as hex dumps
- zlib compress/decompress arbitrary files
- replace/update pdf streams with your own data
- PdfDecryptor w/source - uses iTextSharp and requires .Net Framework 2.0
- can hide: header only streams, duplicate streams, selected streams
- js ui also has access to a toolbox class to
- simplify fragmented strings
- read/write files
- do hexdumps
- do unicode safe unescapes
- disassembler engine
- replicate some common Adobe API (new)
Current Plugins include:
- csv_stats.vbs - Builds csv file with results from lower status bar for all files in a directory
- pdfbox_extract.vbs - use pdfbox to extract all images and text from current file
- string_scan.vbs - scan all decompressed streams in all files in a directory for a string you enter
- unsupported_filters.vbs - scan a directory and build list of all pdfs which have unsupported filters
- filter_chains.vbs - recursivly scans parent dir for pdfs that use multiple encoding filters on a stream.
- obsfuscated_headers.vbs - recursivly scans parent dir for pdfs that have obsfuscated object headers
- pdfbox_extract_text_page_by_page.vbs - uses pdfbox to extract page data into individual files
- Build_DB.dll - Search and sort data inside multiple samples, move and organize files
- obj_browser.dll - view layout and data inside pdf in text form
stream parser was written by VBboy136 - 12/9/2008
Scintilla by Neil Hodgson [firstname.lastname@example.org]
ScintillaVB by Stu Collier
AS3 Sorcerer Trial provided courtesy of Manitu Group.
JS Beautify by Einar Lielmanis, _
zlib.dll by Jean-loup Gailly and Mark Adler
CRC32 code by Steve McMahon
iTextDecode/iTextFilters use iTextSharp by Bruno Lowagie and Paulo Soares
olly.dll GPL code Copyright (C) 2001 Oleh Yuschuk.
MuPDF is released under GPL and Copyright 2006-2012 Artifex Software, Inc.
CCTIFaxDecoder copyright Sun MicroSystems and intarsys consulting GmbH.
libemu written by Paul Baecher and Markus Koetter 2007.
sclog is a tool i wrote back at iDefense (no longer available on their site)
Interface by email@example.com
WinGraphViz OOD Tsen firstname.lastname@example.org
GraphViz - AT&T Labs
Other thanks to Didier Stevens for the info on his blog on tags and encodings.
Comments: (12)On 08.20.10 - 2:16pm Dave wrote:
On 08.21.10 - 4:43pm Dave wrote:
|Reading the iText source is good info too, but its mammoth. Turns out if only the owner password is set, you can use iText to make a copy of the pages and transfer them into a new pdf so they are not encrypted anymore and then this can parse them again. Also I think the luckysploit pdf exploits use what may be a malformed pdf It is owner password encrypted, but has no password set. I could not create this condition in Acrobat pro 7.x anyway|
On 12.05.10 - 6:55am Dave wrote:
|0.9.125 is out, bugfix release..had to do some fixups in sclog to clean up output and make sure hooks for UrlDownloadToFile were being installed correctly. Couple small usability additions were added to main exe as well.
On 12.19.10 - 1:02pm dave wrote:
|we now have basic support for things like app.doc.getAnnots, app.info.title etc. there are some quirks so you might have to play with it some. I didnt really want to add this feature because its problamatic and will spoil your manual skills..but it is useful and can speed things up a lot. Just dont expect it to be perfect.|
On 01.08.11 - 7:33am Dave wrote:
|quick new usability feature, You can drag and drop files onto the desktop icon to launch them. 3 actions based on file type:
- js/vbs files are treated as automation scripts
- .sc file is treated and shellcode and loaded in escaped format in the js form ready for analysis
- any other file is considered a pdf and it tries to load it as such.
On 12.27.11 - 5:52pm Dave wrote:
|remember if you get a script that uses a = "string"; a[x] to access individual characters, you have to replace it to a.charAt(x) or do a = "string".split("") to explicitly turn it into an array for the MS Script control to work with it. Bug fix today dealing with the escaping of headers which contain JS scripts in them and added the ability to override the default connection string in the sample db plugin.|
On 01.19.12 - 1:29am Marc wrote:
|i just want to have a portable version of that tool, while investigating an infected system I like using a software adhock, no installing and so. Some malware crashes installation some change the code on the fly etc...
nice to see a portable version in the future
On 01.19.12 - 3:53am Dave wrote:
|Hi Marc, thats an interesting idea. It makes heavy use of COM object components so at a minimum you would have to run a batch script to register those first (add to registry). I will whip up a minimal binary install batch file today and post here. Some packers/binders can actually build in the target dlls to the main executable, but I am not sure if they support COM objects as well. that would be powerful kungfu.|
If the system is wacked enough that you cant install stuff, I usually boot off a live CD and go into file retrieval mode to analyze the files on a clean analysis system. I never trust an infected system to be stable or tell me the truth.
Also many malicious PDFs will delete themselves as part of the infection routine, replacing the exploit file with a clean copy. finding the actual infection vector will require tracing it back to the source such as the URL of the pdf in web browser history, or email attachment and re-downloading the file.
On 01.19.12 - 10:34pm Dave wrote:
|command line option added to extract objects such as flash, fonts, prc, u3d
pdfstreamdumper "c:\file.pdf" /extract "c:\folder"
On 06.06.12 - 7:26pm Dave wrote:
On 06.24.13 - 1:16pm Bill Orvis wrote:
On 07.15.15 - 11:49am dave wrote:
Note: I have never been impressed with the MS script debugging capabilities and it doesnt always connect. Sometimes I have to have a VS instance open for it to work. (Probably requires mdm.exe running in background, maybe I will have to manage its startup/exit)
I am also doing some experiments on how to replace the MS script engine entirely in favor of a more standards compliant one. I am currently exploring both QtScript as well as DukTape. Both of these support integrated debugging capabilities and with some more elbow grease can be used with VB6