The Importance of Forensic Tools Validation

I recently finished consulting on a rather high profile case, and once again found myself spending almost as much time correcting reports from third party forensic tools vendors as I did analyzing actual evidence. It’s even sadder that I charged less for my services than these tools manufacturers charge for a single license of their buggy software. I don’t say high profile to sound important, I say it because these types of cases are generally of great importance themselves, and you absolutely need the evidence to be accurate. Many in the law enforcement community have learned to “trust the tools”, citing scientific method and all that. The problem I’ve found throughout my entire career in forensics, however, has shown me quite the opposite. When it comes to forensic software, the judge should not automatically trust the forensic tools as part of the scientific process, and neither should the forensic examiners using them. Let me explain why…

In forensics, we often misplace our trust in tools that, unlike tried and true scientific methods, are usually closed source. While true scientific process relies on making our findings repeatable and verifiable, the methods to analyze data are sometimes patented, and almost always considered trade secrets. This is the complete opposite of the scientific method, where methods are fully explained and documented. In the software industry, repeatable is exactly what you don’t want your methods to be – especially by your competitors. The nature of secrecy in the software industry doesn’t rub well against the open scientific nature that you’d expect to find in forensic, or other scientific disciplines.As such, “software” is not scientific in nature, and should not be trusted using the same rules as science. Sure, we have some validation experts out there. NIST does a good job of validating logical data acquired from a number of devices and has struck some good and interesting results that have helped the industry. Even still, such tests are only a single data point on an ever evolving software manufacturing process riddled with regression bugs and programming errors that only show up in certain specific data sets.

A recent court ruling highlighted some of the technical issues that the criminal justice community are finally starting to wake up to with regards to this closed source “scientific” world we live in. The judge basically said that, until you can explain how you’re going to conduct your search only for data that you have a warrant for, I’ll consider his phone protected under the 4th Amendment (such a rarity to see coming from a judge these days). This whole matter seems that it could have been resolved if the forensic examiner could have explained technically how his tools work… unfortunately, not many can (although perhaps this one will follow up later with the judge).

As one example, I wrote previously about how a few different tools were miscalculating application usage data. In this particular case, I found the data being reported was off by an entire day. Guess what? When you’re dealing with criminal charges, a day can quite literally mean the difference between life in prison or not. Some other misleading information I found was even further off. Did you know that the “uninstall date” that iOS reports for uninstalling an application is actually wrong? It’s misreported by the operating system as the timestamp when the user last rebooted their phone after uninstalling the app in question. Commercial tools developers all seem to have just seen a date stuck in a file somewhere and thought, “oh, well that’s labeled uninstall date, so it must be accurate”. If you’re going to implicate someone in a crime based on, say, uninstalling a texting app they may have used to commit a crime, then you’d better get those dates right. This date could be off hours, days, weeks, or more, as it’s entirely dependent on when the user reboots their phone after uninstalling the app. The date reported by the tools would have implicated another forensic examiner for either intentionally or unintentionally deleting the application in the lab. The actual data, however, showed that it was deleted long before it was taken into evidence. Huge difference.

As one final example, I’ve read a number of reports from one manufacturer, in particular, that reports an “access time” for data found on an iOS user data partition; I’ve seen access times used to implicate people in a number of cases. The problem with this, however, is that anyone who knows iOS intimately also knows that the kernel is configured to automatically mount the user partition with a noatime option, so that access times aren’t actually written. It’s part of Apple’s wear leveling strategy to prevent the NAND from wearing down. As it turns out, this particular manufacturer first copies the iOS file system onto the NTFS partition that their software is running on, and then pulls the timestamp information off of the copy of the data. Asinine! and irresponsible, to say the least. Other access times, I wasn’t able to trace back to any source – they were seemingly made up by the software.

The reports I read had these and many other flaws in them. And sadly, this isn’t the only case I’ve worked on where I’ve been asked to review reports from third party tools. I’m often asked to consult on cases when commercial solutions have failed or fallen apart, and my own more hands-on techniques are required. To this end, forensics feels more like janitorial work than actual crime fighting.

The first issue took reverse engineering one of Apple’s frameworks to validate. The second was initially found merely by taking five minutes to experiment and then further validated with reverse engineering. The last is just general knowledge that anyone intimate with iOS should already know. So why did these companies make such poor mistakes? Well, we can’t just blame laziness. In fact, I don’t blame the programmers at all. The problem here is that software engineers are almost never also infosec (information security) experts. Software engineers understand process, methodology, design patterns, languages, and spend most of their time adding functionality outlined in user stories or requirements docs. While software engineers do a great job at making software, they’re not usually good at unmaking it, or breaking it. That is, software engineers don’t typically reverse engineer. For that, you need the kind of reverse engineering skill you find in the people who write exploits or do R&D for a living. This kind of reverse engineering is crucial when dealing with operating systems, such as iOS, that are closed source. Apple doesn’t give magic tech notes to these guys to just divulge all of their forensic secrets. Quite the contrary, most closed source manufacturers try very hard to protect whatever code secrets that they can. You can’t fully understand how iOS works just by looking at it; you have to completely reverse engineer it down to the bare bones machine code. In a sense, closed source is making it harder for people to get a fair trial these days, and that’s including both those of the operating system to those writing forensic tools.

Reverse engineering is becoming less and less of a niche skill every day, and there are a number of great software solutions out there now to make it much easier. The most popular reverse engineering tool is Hex Rays, which can not only reverse engineer a scary number of different processor architectures, but also has modules to decompile many of them into C-like pseudocode, that’s readable by most software engineers. This software package runs about $5K for the full caboodle, but more than pays for itself in productivity (as well as embarrassment if you’re an attorney about to lose a case to a bad report). A much less expensive tool that still does a decent job is Hopper, which runs on Mac and costs about $60. Hopper also produces pseudocode, however it’s not as good or readable as the much more expensive Hex Rays. It is still a fantastic tool to work with. Sending software engineers to reverse engineering 101, or simply hiring some reverse engineers / pentesters to validate your forensics products can certainly help to avoid bugs like these, many of which are likely to go unnoticed, as bug reports from the convicted felons they helped put away are likely to get little attention. That $5K you invest today could save you exponentially more than that in the long run, in everything from embarrassment, loss of business, lawsuits, or just plain old man-hours trying to fix bugs that get reported.

More importantly, however, forensic examiners really need to, in order to do their jobs to the most competent capacity, understand software engineering and reverse engineering. They don’t just need this to validate the tools they’re using, but when their evidence gets called into question, gives them solid footing to prove that the software is producing a timestamp or other evidence based on a specific programmed behavior. A forensic examiner has to deal with a number of different scenarios, many of which may include the tiniest of details, which could make a big difference in a case. I’ve met some fantastic forensic examiners, who have this thirst for knowledge, and understand that the tools exist only to help streamline their discipline. I’ve also met some terrible examiners who stop once they find the email or image they’re looking for, and don’t care enough about the quality of their work to even process all of the evidence. Being a good forensic examiner means acquiring the skills to be able to validate the tools yourself, by hand if necessary. For this, you need to understand how software works, as well as how to reverse engineer it. This is why a CJ/CS track seems so appealing to a lot of people.

There are a number of great books and tools you can use to further your knowledge. If you’re working in iOS or other ARM-based mobile platforms, the ARM System Developers Guide by Elsevier is a great book to learn ARM instructions, calling conventions, and a lot of other things you’ll need to know to understand ARM applications. The IDA Pro Book by NoStarch is a great guide on how to use Hex Rays (IDA Pro) to reverse engineer applications. Learning about Objective-C software development, the runtime, and the iOS operating system is also important if you’re dissecting iOS applications. I’ve written a number of great books on these subjects, but there are also some more recent great titles out there too. I hear great things about Beginning iOS 7 Development as well as iOS Programming: The Big Nerd Ranch Guide.

Bottom line is this: You can’t fully trust forensic tools these days. I’ve used a number of the top contenders, and they all have their shortcomings. A forensic examiner isn’t defined by the tool he or she uses to generate reports. A good forensic examiner simply uses them to get a foundation to build their case on, and relies on their skills to validate the information, to ultimately tell the story. If you’re interested in the forensics industry, this is one way to distinguish yourself as an expert in the field. If your’e interested in software engineering, consider also learning how to break and dismantle code, as well as write it. This makes for a very well rounded software engineer. Lastly, if you’re on the criminal justice side of things, question everything: especially the fine details important to your cases that rely on reporting like this from tools, and make sure the examiner you’re using has done their homework to verify this information.