amccormack.net

Things I've learned and suspect I'll forget.

Retrieving a Malicious Attachment from Gmail 2012-07-22

A while back I received an email from a friend of mine which had a PDF attachment. Given that the email body was blank and the "report" was unsolicited, I assumed my friend's email had been compromised and that the PDF was malicious. Wanting to examine the PDF later, I found that google recognized that the file was malicious and would not allow me to download it. Instead it gave me only options to "View" or "Learn more". If you click view, a message appears that says "virus found" and "learn more" takes you to this page.
But, we can still get the file from the original text of the email. Click the drop down arrow attached to the reply button and select "show original". Scrolling down a bit in the original message you'll see a section that looks like this:

--f46d0444ef637df04804b6e6fc01
Content-Type: application/pdf; name="744810.pdf"
Content-Disposition: attachment; filename="744810.pdf"
Content-Transfer-Encoding: base64
X-Attachment-Id: file0

JVBERi0xLjYKJeLjz9MNCjEgMCBvYmoNCjw8L1R5cGUvUGFnZS9QYXJlbnQgNSAwIFIgL01lZGlh
Qm94IFswIDAgNjQwIDQ4MF0vQ29udGVudHMgNiAwIFIgL1Jlc291cmNlcyA3IDAgUj4+DQplbmRv
JSVFT0YNCg==
--f46d0444ef637df04804b6e6fc01--

The sample above has been blatantly cut down to conserve space, but the first two lines and the last two lines of the attachment are shown. The part that is interesting is the base64 encoding and the text starting on line 7 and ending on line 9. This is the base64 encoding of the file. A trivial method of decoding this string would be to fire up python, and decode using the base64 module:

>>> import base64
>>> raw = base64.decodestring('JVBERi0xLjYKJeLjz9MNCjEgMCBvYmoNCjw8L1R5cGUvUGFnZS9QYXJlbnQgNSAwIFIgL01lZGlhQm94IFswIDAgNjQwIDQ4MF0vQ29udGVudHMgNiAwIFIgL1Jlc291cmNlcyA3IDAgUj4+DQplbmRvJSVFT0YNCg==')
>>> raw
'%PDF-1.6\n%\xe2\xe3\xcf\xd3\r\n1 0 obj\r\n<</Type/Page/Parent 5 0 R /MediaBox [0 0 640 480]/Contents 6 0 R /Resources 7 0 R>>\r\nendo%%EOF\r\n'

A simple script to do this automatically would look like this:

#simpledecode.py
import sys
def decode(filein, fileout):
    import base64
    emailFile = open(filein,'r')
    rawfile = open(fileout,'wb')
    while(True):
        line = emailFile.readline()
        if line == "":
            break
        raw = line.strip()
        try:
            rawfile.write(base64.decodestring(raw))
        except:
            print "Incorrect Padding: "+ line
            raise

    print "wrote file: " + fileout
    rawfile.close()

if __name__ == '__main__':
    if len(sys.argv) == 3:
        decode(sys.argv[1],sys.argv[2])
    else:
        print "simpledecode.py fileIn fileOut"
        print "fileIn should be the base64 MIME encoded string"

simpledecode.py is supplied a text file with the base64 encoded file. So all you would need to do is copy the first base64 encoded section of the file (lines 7-9 in the original sample above) into a new text file and supply that as the first argument to simpledecode.py. The second argument is where you want the extracted file to be saved.

In order to make it a little bit simpler to extract the files, I wrote a python program with a bit more sophistication. This script needs only a file of the original email text to access. That is, when you view the original source of the email, hit ctrl-a to select the entire file, and then copy and paste it into a text file. Then supply that file as the argument to the script below. By default, the script will save the attachment to the filename that is included with the attachment. You can also supply your own filename, and if there are multiple attachments they will be written out with a number attached.

#emailextract.py
import sys
def processMIME(emailMessageFile,outputFile=None):
    '''
    Takes a MIME extended email and extracts the attachments

    emailMessageFile - a file containing the plaintext email in MIME format
    outputFile - Where to place the new file (if not keeping the original name)

    Note: the outputFile will only be written for one file. If there are
    multiple message then outputFile will append counts before the file
    extension (if a file extension exists)
    '''
    import re
    import base64

    emailFile = open(emailMessageFile,'r')
    line = ""
    #Read lines until the first content-type is shown.
    while "Content-Type:" not in line:
        line=emailFile.readline()

    #get the boundary string
    match = re.search("boundary=(\S*)",line)
    boundary = match.group(1)

    #now that we have the boundary find the attachments
    count=0
    while line != "":
        line=emailFile.readline()
        if "Content-Type: application" in line:
            processApplicationSection(line, emailFile,outputFile,boundary,count)
            count+=1



def processApplicationSection(line, emailFile, outputFile, boundary,count):
    '''
    Reads a Content-Type: application section of a MIME message and
    determines the filename moves emailFile to the data
    '''
    import re
    import random
    makeFileName = outputFile == None
    while "--".join(boundary) not in line:
        #Get the name of the file to write
        if makeFileName:
            matchFilename = re.search('filename=(\S*)',line)
            if matchFilename != None:
                outputFile = matchFilename.group(1)
            matchFilename = re.search('filename=\"(.*)\"',line)
            if matchFilename != None:
                outputFile = matchFilename.group(1)
        else:
            if "." in outputFile:
                m = re.search("(.*)(\..*)",outputFile)
                filename = m.group(1)
                ext = m.group(2)
            else:
                filename = outputFile
                ext = ""
            outputFile = filename+str(count)+ext


        if line == "\n":
            if outputFile == None:
                outputFile = ''.join(random.sample('0123456789abcdefg',10))
            processRawData(emailFile, outputFile, boundary)
            return

        line = emailFile.readline()


def processRawData(emailFile, outputFile, boundary):
    import base64
    rawfile = open(outputFile,'wb')
    while(True):
        line = emailFile.readline()
        if "--"+boundary in line:
            break
        if line == "":
            break
        raw = line.strip()
        try:
            rawfile.write(base64.decodestring(raw))
        except:
            print "Incorrect Padding: "+ line
            raise

    print "wrote file: " + outputFile
    rawfile.close()


if __name__ == "__main__":
    if len(sys.argv) > 1:
        emailFile = sys.argv[1]
        outputFile = sys.argv[2] if len(sys.argv) > 2 else None
        processMIME(emailFile,outputFile)
    else:
        print "emailextract.py emailIn fileOut(opt)"
        print "fileIn should be the plain text original MIME encoded email"

published on 2012-07-22 by alex