Sniffing The Traces Left Behind By Deleted Files On Hard Disks

Security researcher Shivan Sengar discusses about how files are stored in the hard-disks and the measures that should be taken to secure those files.

Sniffing The Traces Left Behind By Deleted Files On Hard Disks

Planning on buying a new laptop or smartphone, or changing your hard disk drives to solid state disks in your desktop?

According to Consumer Electronics Association, the average life expectancy of a smart phone is 4.7 years. Some of us even change our smartphone before as a better model arrives in the market, or our favorite brand launches a new smart phone. This change often occurs with a potential risk of data privacy. How do you format the phone to make the data inaccessible even if the disks get into the wrong hands? The same is the case with the laptop and desktop hard disks. I have found that even today people are not fully aware and informed about measures to take in order to protect their data stored in old disks.
Before talking about the techniques that should be implemented in order to mitigate the issue discussed above lets go over how files are stored in the disks.

What is a filesystem and inodes?

Everything in any computing device is 0 and 1 (called binary), even the files stored in it. Enormous quantities of these binary digits calls for the need to manage files- a filesystem. Filesystems mean different things to different people. A filesystem provides structure to store and retrive data, provide namespace, set security protocols and provide API to read/write data in the file.
Your operating system doesn't store a file in contiguous manner, rather they are split into small chunks and then stored. Also it is not possible to store the whole file in one contiguous segment, so they are broken into multiple segments, as the files are edited for more data in the future requiring more space. Also a new data segment is likely to be strategically placed at the location on the HDD that minimizes access latency or which decreases fragmentation. Since all these chunks are scattered across the disk there is a need to keep track of all these chunks by the filesystem. An inode is a data structure on a filesystem in Linux and Unix type OS's that stores information like attributes, data block location, owner info, time of last change and other metadata. Directories are lists of names assigned to inodes. A directory contains an entry for itself, its parent, and each of its children. You can use the df command to find out the used inode of your system.

df -hi

What happens when you delete a file?

You can think of inode values as pointers to your file. In simple terms when a file is deleted its inode entries are removed from the inode table. Though the actual implementation of different steps taken when a file is deleted is dependent on the type of filesystem (FAT32, EXT4, etc.) or the disk technology being used (solid state disks, hard disk drive, etc.)
Many filesystems don't delete a file when you ask them to delete a particular file. Instead they mark the first byte of the header with a marker to let the system know that this space can be overwritten if required.

Hard Disk Drives

In Hard Disk Drives, data is never deleted but rather overwritten with new data. Here, data is stored in the form of magnetic waves and it is binarized before being processed by the processor. There are four possibilities when data is overwritten one time. Two are when initially the bit was 0 and then overwritten with 0 or 1. Two more when initially it was a 1 and then overwritten with 0 or 1. These four values will give different traces on the disk. So, what a recovery tool usually does is simply create four regions via different thresholds and decides what the previous value was. This can be extended when data is overwritten multiple times. This video by computerphile explains this concept.

Solid State Disks

Solid State Disks (SSDs) have a very different mechanism of working when compared with HDDs. In SSDs the whole storage unit is divided into blocks or pages, and each block can be asked to read/write. These blocks are index-addressable just like an array. Thus there is no overhead for data access as was the case in HDDs. In order to write anything in these blocks, you need to delete the data first. The OS of your device doesn't access all these blocks directly, instead, it talks to the microcontroller present in the SSD or Flash drive. This microcontroller keeps track of all these blocks using a table (or map). Since deletion takes a lot of time thus, the microcontroller of the Flash drive dynamically maps the requested location for write on some other block which is empty and keeps a log for itself to delete the data in the block (that was requested) sometime later. This is what allows the option for recovery even when you have deleted your data. For more details, you can read this paper.

Dependence on File Systems

Management and deletion of a file is dependent on the type of filesystem you are using. Many filesystems keep redundant copies of information. For e.g., in EXT4 multiple copies of small files are kept in the journal. Deleting a file doesn't ensure that all the information related to that particular file has been deleted. EXT4 is one of the most common filesystems being used today.

How to ensure your data privacy?

Now we know the problem, the inner workings of hard disks, and the reason behind the problem. The question now becomes how to solve the problem.
The best measure to take is to encrypt your hard disks. Encryption ensures that no one will gain access to your data even if they get physical access to your device. During Linux installation users are given the option of encrypting their disk but in case you're already a linux user and haven't encrypted the disks, don't worry. Go through this article to encrypt and secure your data. You can find more information related to this topic here.
folder-in-lock
Windows 10 provides BitLocker to encrypt the disks. In case you are using some other version of Windows you can either install and use VeraCrypt or BitLocker. You can also learn to encrypt your flash drives in this article.
Now you might think that encryption of disks is not really necessary as you have your system password protected and never leave the device unlocked. Well not leaving your device unlocked is a very good habit, but it doesn't gaurantee disk protection against theft. In order to gain access to the data, one can easily remove the hard disks from your device and then access that data from a different system. The harddisk not only store the important documents and files, but also cookies, cache and other sensitive data. Getting hands to this data can allow a hacker to launch other kind of attacks.

What to do if you have a non-encrypted disk/flash-drive and you want to throw it away? The simple answer is to format it, but there is a catch. You might have observed that while formatting a disk or USB, you are given two options; the first is fast format, and the other is slow format. In fast format, only the links are removed marking all sectors/blocks available for write. While in the case of slow format the OS rewrites the blocks multiple times with random data. Higher the number of iterations of re-writing, the better it is. In *nix based systems you can use the command shred for the above task.

shread -n 50 filename

Here, the -n parameter allows the user to provide the number of iterations for random re-write. By default this value is 25.

Conclusion

What exactly happens when you delete the file is dependent on the OS, filesystem and harddisk technology you are using. It is important that we are well infomred with the technology we are using and then take actions accordingly. Afterall the best security and anti-virus you can ever get is you yourself.