Server-side upload verification with Taffy and Lucee
I was dismayed to learn recently that every image upload process I had ever developed before now was vulnerable to improper file uploads. Like many developers, I used the HTML <input type="file">
type form fields to allow users to upload images. I made sure to restrict (I thought) those uploads to just images, but it turns out that a lot of the “security” around image uploads is based on nothing more than the file extension or, even worse, on the supplied file mime type. Did you catch that? SUPPLIED! The browser supplies the mime type of said file when you upload it, and it’s about two lines of code to spoof that mime type. Queue the face-palms, shaking heads, and utter disbelief at my own insecure code.
Like many of you, I’m sure, I’m working on being better at developing zero-trust applications. As a great man once said: “Trust, but verify.”
So, in my quest to be a better developer, I was researching this issue and found Pete Freitag’s excellent post on the topic secure uploads. Pete is a friend of mine, and his opinion on security is always solid. I wish I had found his post earlier! lol. Anyway, Pete’s post mentions server-side validation, but I want more. I want to lock that crap down to specific mime types, not just “IsImage()”, but more like: “IsJPEG()”. Unfortunately, nothing like “isJPG()” exists and getting what I wanted turned out to be an ordeal; hence this blog post.
Pete mentions a JAVA library called JHOVE (currently found here), but in my humble opinion it does not support enough file types. For example, I wanted the ability to detect PNG images, not just JPG’s.
I found two potential solutions:
The Linux “File” command:
The project I am working on now will be hosted on a Ubuntu server which includes the Linux “file” command by default. The file command is awesome because it doesn’t take the file extension into account when detecting a file type. Instead, it runs a series of tests on the file. These tests include, but are not limited to tests on the file system, tests on the file’s magic numbers, among others.
Running the file command directly gives you a lot of information:
# file testFile41.tmp.jpeg
testFile41.tmp.jpeg: JPEG image data, JFIF standard 1.02, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 400x400, components 3
… but that’s obviously far more info than I need. For my purposes I just want to get the mime-type. The file command has a nice option for that. The --mime-type
option spits out the files mime type and the -b
option keeps the response “brief”:
# file --mime-type -b testFile41.tmp.jpeg
image/jpeg
Boom. My temp file is a real JPEG.
To use this in my Taffy API, I would need to process the upload, save to a temp file, then run a CFEXECUTE on the temp file in order to determine it’s type.
<!--- comma-delimited list of supported mime-types --->
<cfset supported_types="image/jpeg, image/jpg, image/png">
<!--- run the file command to get the mime type --->
<cfexecute name="/usr/bin/file" arguments="--mime-type -b #local.tmpFile#" variable="local.imgType" />
<!--- loop over the supported types list and make sure the files mime type is supported --->
<cfset local.typeFound = 0>
<cfloop list="#supported_types#" index="i">
<cfif local.imgType eq i>
<!--- image detected as supported type --->
<cfset local.typeFound = 1>
</cfif>
</cfloop>
<cfif local.typeFound eq 1>
<!--- a supported type was found --->
<cfelse>
<!--- no supported type found --->
</cfif>
Apache Tika
The Apache Tika project is a file detection suite written in JAVA that is used for things like search engines (read: Apache Lucene & Solr) in order to identify files and their types. It’s well maintained and documented, and unlike the JHOVE project, it supports all image formats supported by Java, along with a great many other file formats. Their home page currently claims over 1k file formats are currently supported.
Similar to the Linux “File” command, Tika uses multiple tests in order to identify the formats of files. Tika is available as a server, or as a standalone java app (jar file). Since we’re already running an active J2EE server with Lucee (tomcat), we can simply add that jar file to lucee by dropping it into our “lib” folder. This will make Tika available through a java invoke:
<cfset local.TikaObj = createObject( "java", "org.apache.tika.Tika" ).init() />
Once Tika is initialized, we can run the Tika.detect() subroutine on any file we want:
<cfset local.fileType = TikaObj.detect(expandPath("./myimage.jpg")) />
which in this case returns the mime type we’re looking for: image/jpeg
For testing purposes, you can also run the Tika app from the command-line (note how this test file doesn’t have an extension at all):
# java -jar ./tika-app-2.7.0.jar -d /home/myuser/file23
image/jpeg
Sadly, I could not get Tika to properly identify files that I had loaded in memory (incoming from an API). Every time I tried to identify an in-memory file, I would only get an “application/octet-stream” type response which, based on the Tika API docs, appears to be be Tika’s “I don’t really know” response. Instead, the files had to be saved to the file system first, then identified. Not ideal, but this would also be true with the “file” command, so the process is the same.
Hope this helps!
3 thoughts on “Server-side upload verification with Taffy and Lucee”
Thanks Jordan,
I’m revisiting this issue after giving up on it a while ago for a slightly different scenario.
The problem I have is that the mime type of ‘office’ files with a false file extension are not correctly reported. For example “myDocxFile.pdf” is reported as ‘application/pdf’.
Interestingly, “myPdf.docx” is reported correctly as ‘application/pdf’.
I’m calling the Tika detect() method with the full file path to get the mime type.
I wonder if you have tested this scenario yourself and whether perhaps you have had a different result?
Sorry, ignore my previous question… Lucee bundles a very old version of Tika (1.28.4) but version 2.6.0 fixes the issue.
Glad you got it figured out Brett. I didn’t even realize Lucee shipped with Tika – wonder if that’s something new since I originally wrote this post? Either way, posted your responses anyway in case anyone had a similar issue. TY!