Operating systems are complex beasts, full of many moving parts that allow the user to interact with the hardware. In this article, we’ll be talking about one of the more important and more easily overlooked services every OS has to provide: the file system.
By file system I don’t mean things like the explorers in OSX or Windows that let you see visually where all your files and directories are. I mean the code beneath all of that that makes the concept of “file” even make sense to the computer.
“Files” aren’t actually something a hard drive, a USB drive, or an SD card naturally understand. All these pieces of storage hardware know how to do is read data from & write data to different locations in the device’s storage, but nothing else. Anything beyond “this value is stored in this tiny box for data” is implemented by the file system and was coded as a part of the OS.
OSX, Windows, and Linux all have their own file systems. Linux actually has a bunch of file systems you can choose from. In the rest of this article, we’ll discuss some of the duties of a file system and how different file systems handle these duties.
What’s the first big complication for a file system? Defining what a “file” is in the first place and keeping track of where its data goes! A good file system makes a file look like it’s just one long chunk of continuous data. They’re frequently not, though! Imagine if you have an 8GB file, like a movie, that you want to download and you have a bunch of tiny spaces free on the drive that are each a few hundred megabytes in size. It’d be silly to not write the file to the drive just because you can’t find one big space.
Instead you’d split the file into a bunch of pieces and put them in all the open spaces. That’s what every modern file system does: it figures out a way to split the data up efficiently into all the spaces available and then keeps track of where all those pieces are.
What might not be obvious reading this is why there’d be weird gaps in the drive to begin with. Well, that’s a consequence of files needing to change! Imagine you’re recording a video to put on your YouTube channel. Raw video data tends to be pretty big! How big, though? You’ll only know after you’re done recording. Once you hit record your computer writes the data to its drive in some available space. It’ll fill up that gap in the drive while it can and then keep going in other available spaces.
Let’s say you record a whopping 20GB of high-definition video. When you edit it later, you’ll be cutting a lot out and deleting that data. Now you’ve created new gaps where other data will get put later!
One of the other big things a file system needs to do is handle the logical hierarchy of your files: directories, folders, sub-directories. The actual storage doesn’t reflect the hierarchy. Files in the same directory aren’t all near each other in storage. You wouldn’t want them to be! If they were it’d be so slow to move files around between directories. Have you ever noticed that moving files between directories happens really fast but copying is slow? This is why! The file system doesn’t “move” data when you “move” files, it just changes where they live in the hierarchy.
This brings up another question, though. Where is all this information about directories and hierarchies stored? Now we’re getting into the differences between file systems! On OSX’s APFS and all the Linux file systems, there’re these things called inodes which contain information about all the places where the pieces of files are stored and metadata like when the file was created and what users are allowed to read or modify the file.
The directories are then just special kinds of files that have lists of filenames paired with the inodes corresponding to those files. Windows, with its NTFS file system, has a master file table that stores where files are. In NTFS, directories are actually special entries in the master file table instead of files that hold special information.
This isn’t the only difference between file systems, though! There’s been a lot of development over the years to make them more efficient and add useful features. One of the big features that’s become standard is journaling. For a file system, journaling means keeping track of all the changes that are supposed to happen. Modifying things on a drive, even an SSD, is really slow in comparison to how fast the rest of the computer works. You can queue up a whole ton of work for the file system, like deleting hundreds of files, way faster than the drive can do it all.
What happens if something happens to your computer? What if your laptop runs out of power in the middle of all these changes? In the old days, you’d be in a really bad state because there’d be no way to go back and fix the corrupted data. Now that basically every file system has journaling even in the case of something catastrophic happening your computer can look at the journal of what it had planned to change then go back to doing it. The only real exceptions are the FAT32 and exFAT file systems, which Windows used to use but is now almost exclusively used for USB drives and SD cards.
The other differences between file systems are largely ones of efficiency in placing data across the drive and keeping track of it in a way that doesn’t eat up too much space. After all, the metadata about where all your files live is itself data that has to live on the drive! If you have a bunch of small files it can take up a lot of space to keep track of them all unless it’s clever about it.
So that’s a little taste of what goes into file systems and what they actually do. I think it’s really cool that so much work and research goes into every part of an operating system!