Actual Disk Usage and True Size
The size of a file and the space it occupies on your hard drive are rarely the same. Disk space is allocated in blocks. If a file is smaller than a block, an entire block is still allocated to it because the file system doesn’t have a smaller unit of real estate to use.
Unless a file’s size is an exact multiple of blocks, the space it uses on the hard drive must always be rounded up to the next whole block. For example, if a file is larger than two blocks but smaller than three, it still takes three blocks of space to store it.
Two measurements are used in relation to file size. The first is the actual size of the file, which is the number of bytes of content that make up the file. The second is the effective size of the file on the hard disk. This is the number of file system blocks necessary to store that file.
An Example
Let’s look at a simple example. We’ll redirect a single character into a file to create a small file:
Now, we’ll use the long format listing, ls, to look at the file length:
The length is the numeric value that follows the dave dave entries, which is two bytes. Why is it two bytes when we only sent one character to the file? Let’s take a look at what’s happening inside the file.
We’ll use the hexdump command, which will give us an exact byte count and allow us to “see” non-printing characters as hexadecimal values. We’ll also use the -C (canonical) option to force the output to show hexadecimal values in the body of the output, as well as their alphanumeric character equivalents:
The output shows us that, beginning at offset 00000000 in the file, there’s a byte that contains a hexadecimal value of 31, and a one that contains a hexadecimal value of 0A. The right-hand portion of the output depicts these values as alphanumeric characters, wherever possible.
The hexadecimal value of 31 is used to represent the digit one. The hexadecimal value of 0A is used to represent the Line Feed character, which cannot be shown as an alphanumeric character, so it’s shown as a period (.) instead. The Line Feed character is added by echo . By default, echostarts a new line after it displays the text it needs to write to the terminal window.
That tallies with the output from ls and agrees with the file length of two bytes.
RELATED: How to Use the ls Command to List Files and Directories on Linux
Now, we’ll use the du command to look at the file size:
It says the size is four, but four of what?
There Are Blocks, and Then There Are Blocks
When du reports file sizes in blocks, the size it uses depends on several factors. You can specify which block size it should use on the command line. If you don’t force du to use a particular block size, it follows a set of rules to decide which one to use.
First, it checks the following environment variables:
DU_BLOCK_SIZE BLOCK_SIZE BLOCKSIZE
If any of these exist, the block size is set, and du stops checking. If none are set, du defaults to a block size of 1,024 bytes. Unless, that is, an environment variable called POSIXLY_CORRECT is set. If that’s the case, du defaults to a block size of 512 bytes.
So, how do we find out which one is in use? You can check each environment variable to work it out, but there’s a quicker way. Let’s compare the results to the block size the file system uses instead.
To discover the block size the file system uses, we’ll use the tune2fs program. We’ll then use the -l (list superblock) option, pipe the output through grep, and then print lines that contain the word “Block.”
In this example, we’ll look at the file system on the first partition of the first hard drive, sda1, and we’ll need to use sudo:
The file system block size is 4,096 bytes. If we divide that by the result we got from du (four), it shows the du default block size is 1,024 bytes. We now know several important things.
First, we know the smallest amount of file system real estate that can be devoted to storing a file is 4,096 bytes. This means even our tiny, two-byte file is taking up 4 KB of hard drive space.
The second thing to keep in mind is applications dedicated to reporting on hard drive and file system statistics, such as du, ls, and tune2fs, can have different notions of what “block” means. The tune2fs application reports true file system block sizes, while ls and du can be configured or forced to use other block sizes. Those block sizes are not intended to relate to the file system block size; they’re just “chunks” those commands use in their output.
Finally, other than using different block sizes, the answers from du and tune2fs convey the same meaning. The tune2fs result was one block of 4,096 bytes, and the du result was four blocks of 1,024 bytes.
Using du
With no command line parameters or options, du lists the total disk space the current directory and all subdirectories are using.
Let’s take a look at an example:
The size is reported in the default block size of 1,024 bytes per block. The entire subdirectory tree is traversed.
Using du on a Different Directory
If you want du to report on a different directory than the current one, you can pass the path to the directory on the command line:
Using du on a Specific File
If you want du to report on a specific file, pass the path to that file on the command line. You can also pass a shell pattern to a select a group of files, such as *.txt:
Reporting on Files in Directories
To have du report on the files in the current directory and subdirectories, use the -a (all files) option:
For each directory, the size of each file is reported, as well as a total for each directory.
Limiting Directory Tree Depth
You can tell du to list the directory tree to a certain depth. To do so, use the -d (max depth) option and provide a depth value as a parameter. Note that all subdirectories are scanned and used to calculate the reported totals, but they’re not all listed. To set a maximum directory depth of one level, use this command:
The output lists the total size of that subdirectory in the current directory and also provides a total for each one.
To list directories one level deeper, use this command:
Setting the Block Size
You can use the block option to set a block size for du for the current operation. To use a block size of one byte, use the following command to get the exact sizes of the directories and files:
If you want to use a block size of one megabyte, you can use the -m (megabyte) option, which is the same as –block=1M:
If you want the sizes reported in the most appropriate block size according to the disk space used by the directories and files, use the -h (human-readable) option:
To see the apparent size of the file rather than the amount of hard drive space used to store the file, use the –apparent-size option:
You can combine this with the -a (all) option to see the apparent size of each file:
Each file is listed, along with its apparent size.
Displaying Only Totals
If you want du to report only the total for the directory, use the -s (summarize) option. You can also combine this with other options, such as the -h (human-readable) option:
Here, we’ll use it with the –apparent-size option:
Displaying Modification Times
To see the creation or last modification time and date, use the –time option:
Strange Results?
If you see strange results from du , especially when you cross-reference sizes to the output from other commands, it’s usually due to the different block sizes to which different commands can be set or those to which they default. It could also be due to the differences between real file sizes and the disk space required to store them.
If you need to match the output of other commands, experiment with the –block option in du.