Inside FAT: Data Recovery Algorithm

Algorithm

In 2013, there are plenty of file systems around. There are FAT, NTFS, HFS, exFAT, ext2/ext3 and many other file systems used by the many different operating systems. And yet, the oldest and simplest file system of them all is still going strong. The FAT system is aged, and has many limitations on maximum volume size and the size of a single file. This file system is rather simplistic by today’s standards. It does not offer any kind of permission management nor built-in transaction roll-back and recovery mechanisms. No built-in compression or encryption either. And yet it is very popular for many applications. The FAT system is so simple to implement, requires so little resources and imposes such a small overhead that it becomes irreplaceable for a wide range mobile applications.

The FAT is used in most digital cameras. The majority of memory cards used in media players, smartphones and tablets are formatted with the FAT. Even Android devices take memory cards formatted with the FAT system. In other words, despite its age, FAT is alive and kicking  hard drive recovery.

Recovering Information from FAT Volumes

If the FAT system is so popular, there must be need for data recovery tools supporting that file system. In this article we’ll be sharing experience gained during the development of a data recovery tool.

Before we go talking about the internals of the file system, let’s have a brief look at why data recovery is at all possible. As a matter of fact, the operating system (Windows, Android, or whatever system that’s used in a digital camera or media player) does not actually wipe or destroy information once a file gets deleted. Instead, the system marks a record in the file system to advertise disk space previously occupied by the file as available. The record itself is marked as deleted. This way is much faster than actually wiping disk content. It also reduces wear.

As you can see, the actual content of a file remains available somewhere on the disk. This is what allows data recovery tools to work. The question now is how to identify which sectors on the disk contain information belonging to a particular file. In order to do that, a data recovery tool could either analyze the file system or scan the content area on the disk looking for deleted files by matching the raw content against a database of pre-defined persistent signatures.

This second method is often called “signature search” or “content-aware analysis”. In forensic applications, this same approach is called “carving”. Whatever the name, the algorithms are very similar. They read the entire disk surface looking for characteristic signatures identifying files of certain supported formats. Once a known signature is encountered, the algorithm will perform a secondary check, then read and parse what appears to be the file’s header. By analyzing the header, the algorithm can determine the exact length of the file. By reading disk sectors following the beginning of the file, the algorithm recovers what it assumes to be the content of a deleted file.

If you’re following carefully, you could have already noticed several issues with this approach. It works extremely slowly, and it can only identify a finite number of known (supported) file formats. Most importantly, this approach assumes that disk sectors following the file’s header do belong to that particular file, which is not always true. Files are not always stored in a consecutive manner. Instead, the operating system can write chunks into first available clusters on the disk. As a result, the file can be fragmented into multiple pieces. Recovering fragmented files with signature search is a matter of hit or miss: short, defragmented files are usually recoverable without a sweat, while long, fragmented ones may not be recovered or may come out damaged after the recovery.

In practice, signature search does work pretty well. Most files that are of any importance to the user are documents, pictures, and other similarly small files. Granted, a lengthy video may not be recovered, but a typical document or a JPEG image is usually sized below fragmentation threshold and recovers pretty well.

If, however, one needs to recover fragmented files, the tool must combine information obtained from the file system and gathered during the disk scan. This, for example, allows excluding clusters that are already occupied by other files, which, as we’ll see in the next chapter, greatly improves the chance of successful recovery.

Using Information from the File System to Improve Recovery Quality

As we could see, signature search alone works great if there is no file system left on the disk, or if the file system is so badly damaged that it becomes unusable. In all other cases, information obtained from the file system can greatly improve the quality of the recovery.

Let’s take a large file we need to recover. Suppose the file was fragmented (as is typical for larger files). Simply using signature search will result in only recovering the first fragment of the file; the other fragments will not recover correctly. It is therefore essential to determine which sectors on the disk belong to that particular file.

Windows and other operating systems determine which sectors belong to which file by enumerating records in the file system. File system records contain information about which sectors belong to which file.

Leave a Reply

Your email address will not be published. Required fields are marked *