PDF Stream Dumper


Author: David Zimmer
Date: 07.21.10 - 7:55pm



This is a free tool for the analysis of malicious PDF documents. This tool has been made possible through the use of a mountain of open source code. Thank you to all of the authors involved.

Has specialized tools for dealing with obsfuscated javascript, low level pdf headers and objects, and shellcode. In terms of shellcode analysis, it has an integrated interface for libemu sctest, an updated build of iDefense sclog, and a shellcode_2_exe feature.
Javascript tools include integration with JS Beautifier for code formatting, live script debugging, toolbox classes to handle extra canned functionality, as well as a pretty stable refactoring engine that will parse a script and replace all the screwy random function and variable names with logical sanitized versions for readability.

Tool also supports unescaping/formatting manipulated pdf headers, as well as being able to decode filter chains (multiple filters applied to the same stream object.)

Download: PDF Stream Dumper Setup v0.9.624 (Source)

Note: .NET v2/3 is required for some filters to work. Add remove programs -> Install Optional Components. If you are on a domain you might get an stupid error when trying to install it.


Supported Platforms: WinXP+

If you are looking for malicious pdf samples to analyze make sure to check out the Contagio and jsunpack sites.

Full feature list
  • supported filters: FlateDecode, RunLengthDecode, ASCIIHEXDecode, ASCII85Decode, LZWDecode, JBIG2, CCITTFaxDecode, DecodePredictors
  • Integrated shellcode tools:
    • sclog gui (Shellcode Analysis tool I wrote at iDefense)
    • scdbg libemu based Shellcode analysis tool
    • Shellcode_2_Exe functionality
    • Export unescaped bytes to file
  • supports filter chaining (ie multiple filters applied to same stream)
  • supports unescaping encoded pdf headers
  • scriptable interface to process multiple files and generate reports
  • view all pdf objects
  • view deflated streams
  • view stream details such as file offsets, header, etc
  • save raw and deflated data
  • search streams for strings
  • scan for functions which contain pdf exploits (dumb scan)
  • format javascript using js beautifier (see credits in readme)
  • view streams as hex dumps
  • zlib compress/decompress arbitrary files
  • replace/update pdf streams with your own data
  • basic javascript interface so you can run parts of embedded scripts + support for using MS Script Debugger
  • PdfDecryptor w/source - uses iTextSharp and requires .Net Framework 2.0
  • Basic Javascript de-obsfuscator
  • can hide: header only streams, duplicate streams, selected streams
  • js ui also has access to a toolbox class to
    • simplify fragmented strings
    • read/write files
    • do hexdumps
    • do unicode safe unescapes
    • disassembler engine
    • replicate some common Adobe API (new)
Current Automation scripts include:
  • csv_stats.vbs - Builds csv file with results from lower status bar for all files in a directory
  • pdfbox_extract.vbs - use pdfbox to extract all images and text from current file
  • string_scan.vbs - scan all decompressed streams in all files in a directory for a string you enter
  • unsupported_filters.vbs - scan a directory and build list of all pdfs which have unsupported filters
  • filter_chains.vbs - recursivly scans parent dir for pdfs that use multiple encoding filters on a stream.
  • obsfuscated_headers.vbs - recursivly scans parent dir for pdfs that have obsfuscated object headers
  • pdfbox_extract_text_page_by_page.vbs - uses pdfbox to extract page data into individual files

Current Plugins include:
  • Build_DB.dll - Search and sort data inside multiple samples, move and organize files
  • obj_browser.dll - view layout and data inside pdf in text form


Credits:
---------------------------
stream parser was written by VBboy136 - 12/9/2008
http://www.codeproject.com/KB/DLL/PDF2TXTVB.aspx
	
Scintilla by Neil Hodgson [neilh@scintilla.org] 
http://www.scintilla.org/

ScintillaVB by Stu Collier
http://www.ceditmx.com/software/scintilla-vb/

AS3 Sorcerer Trial provided courtesy of Manitu Group. 
http://www.as3sorcerer.com/

JS Beautify by Einar Lielmanis, _
conversion to Javascript code by Vital, 
http://jsbeautifier.org/
	
zlib.dll by Jean-loup Gailly and Mark Adler
http://www.zlib.net/
	
CRC32 code by Steve McMahon
http://www.vbaccelerator.com/home/vb/code/libraries/CRC32/article.asp
	
iTextDecode/iTextFilters use iTextSharp by Bruno Lowagie and Paulo Soares
http://itextpdf.com/terms-of-use/index.php
	
olly.dll GPL code Copyright (C) 2001 Oleh Yuschuk.
http://home.t-online.de/home/Ollydbg/
http://sandsprite.com/CodeStuff/olly_dll.html
	
MuPDF is released under GPL and Copyright 2006-2012 Artifex Software, Inc.
http://www.mupdf.com/

CCTIFaxDecoder copyright Sun MicroSystems and intarsys consulting GmbH.
http://java.net/projects/pdf-renderer/
  
libemu written by Paul Baecher and Markus Koetter 2007.	
http://libemu.carnivore.it/about.html

scdbg homepage
http://sandsprite.com/blogs/index.php?uid=7&pid=152
	
sclog is a tool i wrote back at iDefense (no longer available on their site)
https://github.com/dzzie/sclog

Interface by dzzie@yahoo.com 
http://sandsprite.com
	
WinGraphViz OOD Tsen oodtsen@gmail.com
http://wingraphviz.sourceforge.net/wingraphviz/index.htm

GraphViz - AT&T Labs
http://graphviz.org/

Other thanks to Didier Stevens for the info on his blog on tags and encodings.
http://blog.didierstevens.com/2008/04/29/pdf-let-me-count-the-ways





















Comments: (17)

On 08.20.10 - 2:16pm Dave wrote:
lotta noise searching for pdf decrypter source..here are some of the more interesting links:

On 08.21.10 - 4:43pm Dave wrote:
Reading the iText source is good info too, but its mammoth. Turns out if only the owner password is set, you can use iText to make a copy of the pages and transfer them into a new pdf so they are not encrypted anymore and then this can parse them again. Also I think the luckysploit pdf exploits use what may be a malformed pdf It is owner password encrypted, but has no password set. I could not create this condition in Acrobat pro 7.x anyway

On 12.05.10 - 6:55am Dave wrote:
0.9.125 is out, bugfix release..had to do some fixups in sclog to clean up output and make sure hooks for UrlDownloadToFile were being installed correctly. Couple small usability additions were added to main exe as well.

New feature: If javascript is broken up across multiple streams, you can control select the streams and hit JS_UI menu item and it will grab them all and put them all into the JS ui together..also did a bunch of small bug fixes. Forcing all FlateDecode through zlib for now..noticable slower on some files, but the iText FlateDecode was causing unexpected crashs on long automation scripts.

On 12.19.10 - 1:02pm dave wrote:
we now have basic support for things like app.doc.getAnnots, app.info.title etc. there are some quirks so you might have to play with it some. I didnt really want to add this feature because its problamatic and will spoil your manual skills..but it is useful and can speed things up a lot. Just dont expect it to be perfect.

On 01.08.11 - 7:33am Dave wrote:
quick new usability feature, You can drag and drop files onto the desktop icon to launch them. 3 actions based on file type:
  • js/vbs files are treated as automation scripts
  • .sc file is treated and shellcode and loaded in escaped format in the js form ready for analysis
  • any other file is considered a pdf and it tries to load it as such.

On 12.27.11 - 5:52pm Dave wrote:
remember if you get a script that uses a = "string"; a[x] to access individual characters, you have to replace it to a.charAt(x) or do a = "string".split("") to explicitly turn it into an array for the MS Script control to work with it. Bug fix today dealing with the escaping of headers which contain JS scripts in them and added the ability to override the default connection string in the sample db plugin.

On 01.19.12 - 1:29am Marc wrote:
i just want to have a portable version of that tool, while investigating an infected system I like using a software adhock, no installing and so. Some malware crashes installation some change the code on the fly etc... nice to see a portable version in the future Marc

On 01.19.12 - 3:53am Dave wrote:
Hi Marc, thats an interesting idea. It makes heavy use of COM object components so at a minimum you would have to run a batch script to register those first (add to registry). I will whip up a minimal binary install batch file today and post here. Some packers/binders can actually build in the target dlls to the main executable, but I am not sure if they support COM objects as well. that would be powerful kungfu.

If the system is wacked enough that you cant install stuff, I usually boot off a live CD and go into file retrieval mode to analyze the files on a clean analysis system. I never trust an infected system to be stable or tell me the truth.

Also many malicious PDFs will delete themselves as part of the infection routine, replacing the exploit file with a clean copy. finding the actual infection vector will require tracing it back to the source such as the URL of the pdf in web browser history, or email attachment and re-downloading the file.

On 01.19.12 - 10:34pm Dave wrote:
command line option added to extract objects such as flash, fonts, prc, u3d

usage;
pdfstreamdumper "c:\file.pdf" /extract "c:\folder"

On 06.06.12 - 7:26pm Dave wrote:
Not real happy with the builtin decrypter functionality using iTextSharp. I will probably remove it soon and add a plugin to launch one of these:

Online PDF Decrypters:

On 06.24.13 - 1:16pm Bill Orvis wrote:
I have found this not only useful for examining PDFs but also for malicious web pages containing javascript, such as blackhole. Just drag the .html page into the input, switch to the java editor and edit out the html. Works great.

On 07.15.15 - 11:49am dave wrote:
Debugger options: Note: I have never been impressed with the MS script debugging capabilities and it doesnt always connect. Sometimes I have to have a VS instance open for it to work. (Probably requires mdm.exe running in background, maybe I will have to manage its startup/exit)

I am also doing some experiments on how to replace the MS script engine entirely in favor of a more standards compliant one. I am currently exploring both QtScript as well as DukTape. Both of these support integrated debugging capabilities and with some more elbow grease can be used with VB6


On 06.17.17 - 12:00am dAVE wrote:
Good set of links for other pdf tools: http://forensicswiki.org/wiki/PDF

On 08.28.17 - 8:04am Grimm Master wrote:
Hello. I need this tool in my work.

On 08.11.19 - 5:24am Dave wrote:
some good links and other tools for pdf structure analysis: https://stackoverflow.com/questions/3549541/best-tool-for-inspecting-pdf-files

On 10.18.19 - 3:55pm Edwin Steiner wrote:
Thanks you very much for this great tool!

On 08.28.20 - 9:10am Dave wrote:
If .NET v2 wont install with error 0x800F0954

HKLM\SOFTWARE\Policies\Microsoft\Windows\WindowsUpdate\AU
UseWUServer=0

rebooted then ran the following from an admin 64bit cmd window

DISM /Online /Enable-Feature /FeatureName:NetFx3 /All

 
Leave Comment:
Name:
Email: (not shown)
Message: (Required)
Math Question: 2 + 49 = ? followed by the letter: D 



About Me
More Blogs
Main Site
Posts: (All)
2023 ( 4 )
2022 ( 5 )
2021 ( 2 )
2020 ( 5 )
2019 ( 6 )
2017 ( 5 )
2016 ( 4 )
2015 ( 5 )
2014 ( 5 )
2013 ( 9 )
2012 ( 13 )
2011 ( 19 )
2010 (11)
     Retro XSS Chat Codes
     Exe as DLL
     Olly Plugins
     Debugging Explorer
     Attach to hidden process
     JS Refactoring
     Asm and Shellcode in CSharp
     Fancy Return Address
     PDF Stream Dumper
     Malcode Call API by Hash
     WinDbg Cheat Sheet
2009 (1)
     GPG Automation