Things I've learned and suspect I'll forget.
A while back I received an email from a friend of mine which had a PDF attachment. Given that the email body was blank and the "report" was unsolicited, I assumed my friend's email had been compromised and that the PDF was malicious. Wanting to examine the PDF later, I found that google recognized that the file was malicious and would not allow me to download it. Instead it gave me only options to "View" or "Learn more". If you click view, a message appears that says "virus found" and "learn more" takes you to this page.
But, we can still get the file from the original text of the email. Click the drop down arrow attached to the reply button and select "show original". Scrolling down a bit in the original message you'll see a section that looks like this:
--f46d0444ef637df04804b6e6fc01 Content-Type: application/pdf; name="744810.pdf" Content-Disposition: attachment; filename="744810.pdf" Content-Transfer-Encoding: base64 X-Attachment-Id: file0 JVBERi0xLjYKJeLjz9MNCjEgMCBvYmoNCjw8L1R5cGUvUGFnZS9QYXJlbnQgNSAwIFIgL01lZGlh Qm94IFswIDAgNjQwIDQ4MF0vQ29udGVudHMgNiAwIFIgL1Jlc291cmNlcyA3IDAgUj4+DQplbmRv JSVFT0YNCg== --f46d0444ef637df04804b6e6fc01--
The sample above has been blatantly cut down to conserve space, but the first two lines and the last two lines of the attachment are shown. The part that is interesting is the base64 encoding and the text starting on line 7 and ending on line 9. This is the base64 encoding of the file. A trivial method of decoding this string would be to fire up python, and decode using the base64 module:
>>> import base64 >>> raw = base64.decodestring('JVBERi0xLjYKJeLjz9MNCjEgMCBvYmoNCjw8L1R5cGUvUGFnZS9QYXJlbnQgNSAwIFIgL01lZGlhQm94IFswIDAgNjQwIDQ4MF0vQ29udGVudHMgNiAwIFIgL1Jlc291cmNlcyA3IDAgUj4+DQplbmRvJSVFT0YNCg==') >>> raw '%PDF-1.6\n%\xe2\xe3\xcf\xd3\r\n1 0 obj\r\n<</Type/Page/Parent 5 0 R /MediaBox [0 0 640 480]/Contents 6 0 R /Resources 7 0 R>>\r\nendo%%EOF\r\n'
A simple script to do this automatically would look like this:
#simpledecode.py import sys def decode(filein, fileout): import base64 emailFile = open(filein,'r') rawfile = open(fileout,'wb') while(True): line = emailFile.readline() if line == "": break raw = line.strip() try: rawfile.write(base64.decodestring(raw)) except: print "Incorrect Padding: "+ line raise print "wrote file: " + fileout rawfile.close() if __name__ == '__main__': if len(sys.argv) == 3: decode(sys.argv,sys.argv) else: print "simpledecode.py fileIn fileOut" print "fileIn should be the base64 MIME encoded string"
simpledecode.py is supplied a text file with the base64 encoded file. So all you would need to do is copy the first base64 encoded section of the file (lines 7-9 in the original sample above) into a new text file and supply that as the first argument to simpledecode.py. The second argument is where you want the extracted file to be saved.
In order to make it a little bit simpler to extract the files, I wrote a python program with a bit more sophistication. This script needs only a file of the original email text to access. That is, when you view the original source of the email, hit ctrl-a to select the entire file, and then copy and paste it into a text file. Then supply that file as the argument to the script below. By default, the script will save the attachment to the filename that is included with the attachment. You can also supply your own filename, and if there are multiple attachments they will be written out with a number attached.
#emailextract.py import sys def processMIME(emailMessageFile,outputFile=None): ''' Takes a MIME extended email and extracts the attachments emailMessageFile - a file containing the plaintext email in MIME format outputFile - Where to place the new file (if not keeping the original name) Note: the outputFile will only be written for one file. If there are multiple message then outputFile will append counts before the file extension (if a file extension exists) ''' import re import base64 emailFile = open(emailMessageFile,'r') line = "" #Read lines until the first content-type is shown. while "Content-Type:" not in line: line=emailFile.readline() #get the boundary string match = re.search("boundary=(\S*)",line) boundary = match.group(1) #now that we have the boundary find the attachments count=0 while line != "": line=emailFile.readline() if "Content-Type: application" in line: processApplicationSection(line, emailFile,outputFile,boundary,count) count+=1 def processApplicationSection(line, emailFile, outputFile, boundary,count): ''' Reads a Content-Type: application section of a MIME message and determines the filename moves emailFile to the data ''' import re import random makeFileName = outputFile == None while "--".join(boundary) not in line: #Get the name of the file to write if makeFileName: matchFilename = re.search('filename=(\S*)',line) if matchFilename != None: outputFile = matchFilename.group(1) matchFilename = re.search('filename=\"(.*)\"',line) if matchFilename != None: outputFile = matchFilename.group(1) else: if "." in outputFile: m = re.search("(.*)(\..*)",outputFile) filename = m.group(1) ext = m.group(2) else: filename = outputFile ext = "" outputFile = filename+str(count)+ext if line == "\n": if outputFile == None: outputFile = ''.join(random.sample('0123456789abcdefg',10)) processRawData(emailFile, outputFile, boundary) return line = emailFile.readline() def processRawData(emailFile, outputFile, boundary): import base64 rawfile = open(outputFile,'wb') while(True): line = emailFile.readline() if "--"+boundary in line: break if line == "": break raw = line.strip() try: rawfile.write(base64.decodestring(raw)) except: print "Incorrect Padding: "+ line raise print "wrote file: " + outputFile rawfile.close() if __name__ == "__main__": if len(sys.argv) > 1: emailFile = sys.argv outputFile = sys.argv if len(sys.argv) > 2 else None processMIME(emailFile,outputFile) else: print "emailextract.py emailIn fileOut(opt)" print "fileIn should be the plain text original MIME encoded email"
published on 2012-07-22 by alex