Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Extended File System (ext) format

The Extended File System (ext) is one of the more common file system used in Linux.

There are multiple version of ext.

VersionRemarks
1Introduced in April 1992
2Introduced in January 1993
3Introduced in November 2001, which featured journaling, dynamic growth and large directory indexing (HTree)
4Introduces in October 2006 as unstable and becmae stable in October 2008, which featured extents and improved timestamps

Overview

An Extended File System (ext) consists of:

  • one or more block groups

Characteristics

CharacteristicsDescription
Byte orderlittle-endian, with the exception of UUID values that are stored in big-endian
Date and time valuesnumber of seconds since January 1, 1970 00:00:00 (POSIX epoch), disregarding leap seconds. Or number of nanoseconds, when extra precision is enabled. Date and time values are stored in UTC
Character stringsUTF-8 or a narrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a system defined codepage

Block group

A block group consists of:

  • optional 1024 bytes of boot code or zero bytes (at offset: 0)
  • optional superblock
  • optional group descriptor table
  • block bitmap
  • inode bitmap
  • allocated and unallocated blocks

The primary superblock is stored at offset 1024 relative from the start of the volume. Backup superblocks are stored at offset 1024 relative from the start of the block group if block size <= 1024 or otherwise at offset 0 from the start of the block group.

The group descriptor table is stored in the block after the superblock.

An ext2 file system with revision 0 stores a copy at the start of every block group, along with backups of the group descriptor table. Later revisions reduce the number of backup copies by only putting backups in specific groups (sparse superblock feature EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER).

Not all values in a backup superblock and backup group descriptor tables match those of the primary superblock and group descriptor table.

Note that backup superblocks can be empty (filled with 0-byte values) or contain remnant data on an Android ext file system with sparse_super.

Flex block groups

Flex (or flexible) block groups are a set of block groups that treated as a single logical block group. Metadata such as the superblock, group descriptors, data block bitmaps spans the entire logical block group and not the individual block groups part of the set.

Meta block groups

Meta block groups (META_BG) are a set (or cluster) of block groups, for which its group descriptor structures can be stored in a single block.

The first meta block group value in the superblock indicates what the first

meta block group value is 256, and the number of group descriptors that can be stored in a single block 64, then the group descriptors for the block groups [0, 16383] are stored in the group descriptor table after the primary superblock and corresponding locations of backups.

Successive group descriptor tables, for example [16384, 16447], are stored in the first block group of a meta block group and backups in the second and last block groups of the meta block group.

Blocks

The volume is devided in blocks:

block offset = block number * block size

The block size is defined in the superblock.

Note that mke2fs indicates the maximum block size is 65536.

The superblock

The ext2 superblock

The ext2 superblock is 208 bytes in size and consists of:

OffsetSizeValueDescription
04Number of inodes
44Number of blocks
84Number of reserved blocks. Reserved blocks are used to prevent the file system from filling up
124Number of unallocated blocks
164Number of unallocated inodes
204First data block number. The block number is relative from the start of the volume
244Block size, which contains the number of bits to shift 1024 to the MSB (left)
284Fragment size, which contains the number of bits to shift 1024 to the MSB (left)
324Number of blocks per block group
364Number of fragments per block group
404Number of inodes per block group
444Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
484Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
522The (current) mount count
542Maximum mount count
562"\x53\xef"Signature
582File system state flags
602Error-handling status
622Minor format revision
644Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
684Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
724Creator operating system
764Format revision
802Reserved block owner (or user) identifier (UID)
822Reserved block group identifier (GID)
Dynamic inode information, if major version is EXT2_DYNAMIC_REV
844First non-reserved inode
882Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024
902Block group, which contains a block group number
924Compatible feature flags
964Incompatible feature flags
1004Read-only compatible feature flags
10416File system identifier, which contains a big-endian UUID
12016Volume label, which contains a narrow character string without end-of-string character
13664Last mount path, which contains a narrow character string without end-of-string character
2004Algorithm usage bitmap
Performance hints, if EXT2_COMPAT_PREALLOC is set
2041Number of pre-allocated blocks per file
2051Number of pre-allocated blocks per directory
2062Unknown (padding)

The ext3 superblock

The ext3 superblock is 336 bytes in size and consists of:

OffsetSizeValueDescription
04Number of inodes
44Number of blocks
84Number of reserved blocks. Reserved blocks are used to prevent the file system from filling up
124Number of unallocated blocks
164Number of unallocated inodes
204First data block number. The block number is relative from the start of the volume
244Block size, which contains the number of bits to shift 1024 to the MSB (left)
284Fragment size, which contains the number of bits to shift 1024 to the MSB (left)
324Number of blocks per block group
364Number of fragments per block group
404Number of inodes per block group, which can be 0 in combination with EXT3_FEATURE_INCOMPAT_JOURNAL_DEV
444Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
484Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
522The (current) mount count
542Maximum mount count
562"\x53\xef"Signature
582File system state flags
602Error-handling status
622Minor format revision
644Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
684Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
724Creator operating system
764Format revision
802Reserved block owner (or user) identifier (UID)
822Reserved block group identifier (GID)
Dynamic inode information, if major version is EXT2_DYNAMIC_REV
844First non-reserved inode
882Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024
902Block group, which contains a block group number
924Compatible feature flags
964Incompatible feature flags
1004Read-only compatible feature flags
10416File system identifier, which contains a big-endian UUID
12016Volume label, which contains a narrow character string without end-of-string character
13664Last mount path, which contains a narrow character string without end-of-string character
2004Algorithm usage bitmap
Performance hints, if EXT2_COMPAT_PREALLOC is set
2041Number of pre-allocated blocks per file
2051Number of pre-allocated blocks per directory
2062Unknown (padding)
Journalling support, if EXT3_FEATURE_COMPAT_HAS_JOURNAL is set
20816Journal identifier, which contains a big-endian UUID
2244Journal inode
2284Unknown (Journal device)
2324Unknown (Head of orphan inode list). The orphan inode list is a list of inodes to delete
2364 x 4hash-tree seed
2521Default hash version
2531Journal backup type
2542Group descriptor size
2564Default mount options
2604First meta block group (or metablock)
2644File system creation time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
26817 x 4Backup journal inodes

The ext4 superblock

The superblock is 1024 bytes in size and consists of:

OffsetSizeValueDescription
04Number of inodes
44Number of blocks, which contains the lower 32-bit of the value
84Number of reserved blocks, which contains the lower 32-bit of the value. Reserved blocks are used to prevent the file system from filling up
124Number of unallocated blocks, which contains the lower 32-bit of the value
164Number of unallocated inodes, which contains the lower 32-bit of the value
204Root group block number. The block number is relative from the start of the volume
244Block size, which contains the number of bits to shift 1024 to the most-significant-bit (MSB)
284Fragment size, which contains the number of bits to shift 1024 to the most-significant-bit (MSB)
324Number of blocks per block group
364Number of fragments per block group
404Number of inodes per block group, which can be 0 in combination with EXT4_FEATURE_INCOMPAT_JOURNAL_DEV
444Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
484Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
522The (current) mount count
542Maximum mount count
562"\x53\xef"Signature
582File system state flags
602Error-handling status
622Minor format revision
644Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
684Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
724Creator operating system
764Format revision
802Reserved block owner (or user) identifier (UID)
822Reserved block group identifier (GID)
Dynamic inode information, if major version is EXT2_DYNAMIC_REV
844First non-reserved inode
882Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024
902Block group
924Compatible feature flags
964Incompatible feature flags
1004Read-only compatible feature flags
10416File system identifier, which contains a big-endian UUID
12016Volume label, which contains a narrow character string without end-of-string character
13664Last mount path, which contains a narrow character string without end-of-string character
2004Algorithm usage bitmap
Performance hints, if EXT2_COMPAT_PREALLOC is set
2041Number of pre-allocated blocks per file
2051Number of pre-allocated blocks per directory
2062Unknown (padding)
Journalling support, if EXT3_FEATURE_COMPAT_HAS_JOURNAL is set
20816Journal identifier, which contains a big-endian UUID
2244Journal inode
2284Unknown (Journal device)
2324Unknown (Head of orphan inode list). The orphan inode list is a list of inodes to delete
2364 x 4hash-tree seed
2521Default hash version
2531Journal backup type
2542Group descriptor size
2564Default mount options
2604First meta block group (or metablock)
2644File system creation time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
26817 x 4Backup journal inodes
If 64-bit support (EXT4_FEATURE_INCOMPAT_64BIT) is enabled
3364Number of blocks, which contains the upper 32-bit of the value
3404Number of reserved blocks, which contains the upper 32-bit of the value
3444Number of unallocated blocks, which contains the upper 32-bit of the value
3482Minimum inode size
3502Reserved inode size
3524Miscellaneous flags
3562RAID stride
3582Multiple mount protection (MMP) update interval in seconds
3608Block for multi-mount protection
3684Unknown (blocks on all data disks (N*stride))
3721Number of block groups per flex block group, which is stored as: 2 ^ value
3731Checksum type
3741Unknown (encryption level)
3751Unknown (padding)
3768Unknown (s_kbytes_written)
3844Inode number of active snapshot
3884Identifier of active snapshot
3928Unknown (reserved s_snapshot_r_blocks_count)
4004Inode number of snapshot list head
4044Unknown (s_error_count)
4084Unknown (s_first_error_time)
4124Unknown (s_first_error_ino)
4168Unknown (s_first_error_block)
42432Unknown (s_first_error_func)
4564Unknown (s_first_error_line)
4604Unknown (s_last_error_time)
4644Unknown (s_last_error_ino)
4684Unknown (s_last_error_line)
4728Unknown (s_last_error_block)
48032Unknown (s_last_error_func)
51264Unknown (s_mount_opts)
5764Unknown (s_usr_quota_inum)
5804Unknown (s_grp_quota_inum)
5844Unknown (s_overhead_clusters)
5882 x 4Unknown (s_backup_bgs)
5964Unknown (s_encrypt_algos)
60016Unknown (s_encrypt_pw_salt)
6164Unknown (s_lpf_ino)
6204Unknown (s_prj_quota_inum)
6244Metadata checksum seed
6281Unknown (s_wtime_hi)
6291Unknown (s_mtime_hi)
6301Unknown (s_mkfs_time_hi)
6311Unknown (s_lastcheck_hi)
6321Unknown (s_first_error_time_hi)
6331Unknown (s_last_error_time_hi)
6341Unknown (s_first_error_errcode)
6351Unknown (s_last_error_errcode)
6362Unknown (s_encoding)
6382Unknown (s_encoding_flags)
6404Unknown (s_orphan_file_inum)
64494 x 4 = 376Unknown (reserved)
10204Checksum

If checksum type is CRC-32C, the checksum is stored as 0xffffffff - CRC-32C.

Note that some versions of mkfs.ext set the file system creation time even for ext2 and when EXT3_FEATURE_COMPAT_HAS_JOURNAL is not set.

TODO: Is the only way to determine the file system version the compatibility and equivalent flags?

Checksum calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over the 1020 bytes of data of the suberblock.

Metadata checksum seed calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over:

  • the 16 byte file system identifier in the superblock

If EXT4_FEATURE_INCOMPAT_CSUM_SEED is set the metadata checksum seed value stored in the superblock should be used instead of calculating it based on the file system identifier.

If checksum type is CRC-32C, the metadata checksum seed is stored as 0xffffffff - CRC-32C.

File system state flags

ValueIdentifierDescription
0x0001Is clean
0x0002Has errors
0x0004Recovering orphan inodes

Error-handling status

ValueIdentifierDescription
1Continue
2Remount as read-only
3Panic

Creator operating system

ValueIdentifierDescription
0Linux
1GNU Hurd
2Masix
3FreeBSD
4Lites

Format revision

ValueIdentifierDescription
0EXT2_GOOD_OLD_REVOriginal version with a fixed inode size of 128 bytes
1EXT2_DYNAMIC_REVVersion with dynamic inode size support

Compatible feature flags

ValueIdentifierDescription
0x00000001EXT2_COMPAT_PREALLOCPre-allocate directory blocks, which is intended to reduce fragmentation
0x00000002EXT2_FEATURE_COMPAT_IMAGIC_INODESHas AFS server inodes
0x00000004EXT3_FEATURE_COMPAT_HAS_JOURNALHas a journal
0x00000008EXT2_FEATURE_COMPAT_EXT_ATTRHas extended attributes
0x00000010EXT2_FEATURE_COMPAT_RESIZE_INO, EXT2_FEATURE_COMPAT_RESIZE_INODEIs resizeable, the file system has reserved GDT blocks for expansion, which also requires RO_COMPAT_SPARSE_SUPER
0x00000020EXT2_FEATURE_COMPAT_DIR_INDEXHas indexed directories
0x00000040COMPAT_LAZY_BGUnknown (Lazy block group)
0x00000080COMPAT_EXCLUDE_INODEUnknown (Exclude inode), which is not yet implemented and intended for a future file system snapshot feature
0x00000100COMPAT_EXCLUDE_BITMAPUnknown (Exclude bitmap), which is not yet implemented and intended for a future file system snapshot feature
0x00000200EXT4_FEATURE_COMPAT_SPARSE_SUPER2Has sparse superblock version 2
0x00000400EXT4_FEATURE_COMPAT_FAST_COMMITUnknown (fast commit)
0x00000800EXT4_FEATURE_COMPAT_STABLE_INODESUnknown (stable inodes)
0x00001000EXT4_FEATURE_COMPAT_ORPHAN_FILEHas orphan file

Note that EXT2_FEATURE_COMPAT_, EXT3_FEATURE_COMPAT_, EXT4_FEATURE_COMPAT_ and COMPAT_ can be used interchangeably.

Incompatible feature flags

ValueIdentifierDescription
0x00000001EXT2_FEATURE_INCOMPAT_COMPRESSIONHas compression, which is not yet implemented
0x00000002EXT2_FEATURE_INCOMPAT_FILETYPEDirectory entry has file type
0x00000004EXT3_FEATURE_INCOMPAT_RECOVERNeeds recovery
0x00000008EXT3_FEATURE_INCOMPAT_JOURNAL_DEVJournal device
0x00000010EXT2_FEATURE_INCOMPAT_META_BGHas meta (or metadata) block groups
0x00000040EXT4_FEATURE_INCOMPAT_EXTENTSHas extents
0x00000080EXT4_FEATURE_INCOMPAT_64BITHas 64-bit support, which supports more than 2^32 blocks
0x00000100EXT4_FEATURE_INCOMPAT_MMPMultiple mount protection
0x00000200EXT4_FEATURE_INCOMPAT_FLEX_BGHas flex (or flexible) block groups
0x00000400EXT4_FEATURE_INCOMPAT_EA_INODEHas large inodes, which are larger than 128 bytes
0x00001000EXT4_FEATURE_INCOMPAT_DIRDATAData in directory entry, which is not yet implemented
0x00002000EXT4_FEATURE_INCOMPAT_CSUM_SEED, EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUMInitial metadata checksum value (or seed) is stored in the superblock
0x00004000EXT4_FEATURE_INCOMPAT_LARGEDIRLarge directory >2GB or 3-level hash tree (HTree)
0x00008000EXT4_FEATURE_INCOMPAT_INLINE_DATAHas data stored in inode
0x00010000EXT4_FEATURE_INCOMPAT_ENCRYPTHas encrypted inodes
0x00020000EXT4_FEATURE_INCOMPAT_CASEFOLDHash case folding

Note that EXT2_FEATURE_INCOMPAT_, EXT3_FEATURE_INCOMPAT_, EXT4_FEATURE_INCOMPAT_ and INCOMPAT_ can be used interchangeably.

Read-only compatible feature flags

ValueIdentifierDescription
0x00000001EXT2_FEATURE_RO_COMPAT_SPARSE_SUPERHas sparse superblocks and group descriptor tables. If set a superblock is stored in block groups 0, 1 and those that are powers of 3, 5 and 7. If not set a superblock is stored in every block group
0x00000002EXT2_FEATURE_RO_COMPAT_LARGE_FILEContains large files
0x00000004EXT2_FEATURE_RO_COMPAT_BTREE_DIRIntended for hash-tree directory (or directory B-tree), which is not yet implemented
0x00000008EXT4_FEATURE_RO_COMPAT_HUGE_FILEHas huge file support
0x00000010EXT4_FEATURE_RO_COMPAT_GDT_CSUMHas group descriptors with checksums
0x00000020EXT4_FEATURE_RO_COMPAT_DIR_NLINKThe ext3 32000 subdirectory limit does not apply. A directory's number of links will be set to 1 if it is incremented past 64999
0x00000040EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZEHas large inodes. The size of an inode can be larger than 128 bytes
0x00000080EXT4_FEATURE_RO_COMPAT_HAS_SNAPSHOTHas snapshots, which is not yet implemented and intended for a future file system snapshot feature
0x00000100EXT4_FEATURE_RO_COMPAT_QUOTAQuota is handled transactionally with the journal
0x00000200EXT4_FEATURE_RO_COMPAT_BIGALLOCHas big block allocation bitmaps. Block allocation bitmaps are tracked in units of clusters (of blocks) instead of blocks
0x00000400EXT4_FEATURE_RO_COMPAT_METADATA_CSUMFile system metadata has checksums
0x00000800EXT4_FEATURE_RO_COMPAT_REPLICASupports replicas
0x00001000EXT4_FEATURE_RO_COMPAT_READONLYRead-only file system image
0x00002000EXT4_FEATURE_RO_COMPAT_PROJECTFile system tracks project quotas
0x00004000EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKSFile system has (read-only) shared blocks
0x00008000EXT4_FEATURE_RO_COMPAT_VERITYUnknown (Verity inodes may be present on the filesystem)
0x00010000EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENTOrphan file may be non-empty

EXT2_FEATURE_RO_COMPAT_, EXT3_FEATURE_RO_COMPAT_, EXT4_FEATURE_RO_COMPAT_ and RO_COMPAT_ are used interchangeably.

Note that in some ext file systems used by ChromeOS it has been observed that the upper 8-bits of the read-only compatible feature flags are set as in 0xff000003. debugfs identifies these as FEATURE_R24 - FEATURE_R31.

Checksum types

ValueIdentifierDescription
1EXT4_CRC32C_CHKSUMCRC-32C (or CRC32-C), which uses the Castagnoli polynomial (0x1edc6f41)

The group descriptor table

The group descriptor table is stored in the block following the superblock.

The group descriptor table consist of:

  • one or more group descriptors

The ext2 and ext3 group descriptor

The ext2 and ext3 group descriptor is 32 bytes in size and consists of:

OffsetSizeValueDescription
04Block bitmap block number. The block number is relative from the start of the volume
44Inode bitmap block number. The block number is relative from the start of the volume
84Inode table block number. The block number is relative from the start of the volume
122Number of unallocated blocks
142Number of unallocated inodes
162Number of directories
182Unknown (padding)
203 x 4Unknown (reserved)

Note that it has been observed that implementations that support ext4 can set a value in the padding. It is currently assumed that this value contains block group flags.

The ext4 group descriptor

The ext4 group descriptor is 68 bytes in size and consists of:

OffsetSizeValueDescription
04Block bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
44Inode bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
84Inode table block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
122Number of unallocated blocks, which contains the lower 16-bit of the value
142Number of unallocated inodes, which contains the lower 16-bit of the value
162Number of directories, which contains the lower 16-bit of the value
182Block group flags
204Exclude bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
242Block bitmap checksum, which contains the lower 16-bit of the value
262Inode bitmap checksum, which contains the lower 16-bit of the value
282Number of unused inodes, which contains the lower 16-bit of the value
302Checksum
If 64-bit support (EXT4_FEATURE_INCOMPAT_64BIT) is enabled and group descriptor size > 32
324Block bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
364Inode bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
404Inode table block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
442Number of unallocated blocks, which contains the upper 16-bit of the value
462Number of unallocated inodes, which contains the upper 16-bit of the value
482Number of directories, which contains the upper 16-bit of the value
502Number of unused inodes, which contains the upper 16-bit of the value
524Exclude bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
562Block bitmap checksum, which contains the upper 16-bit of the value
602Inode bitmap checksum, which contains the upper 16-bit of the value
644Unknown (padding)

If checksum type is CRC-32C, the checksum is stored as the lower 16-bits of 0xffffffff - CRC-32C, otherwise the checksum is stored as a CRC-16.

Checksum calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over:

  • the 16 byte file system identifier in the superblock
  • the group number as a 32-bit little-endian integer
  • the data of the group descriptor with the checksum set to 0-byte values

TODO: describe the block bitmap checksum calculation: crc32c(s_uuid+grp_num+bbitmap)

TODO: describe the inode bitmap checksum calculation: crc32c(s_uuid+grp_num+ibitmap)

Block group flags

ValueIdentifierDescription
0x0001EXT4_BG_INODE_UNINITThe inode table and bitmap are not initialized
0x0002EXT4_BG_BLOCK_UNINITThe block bitmap is not initialized
0x0004EXT4_BG_INODE_ZEROEDThe inode table is filled with 0

Direct and indirect blocks

Direct blocks are blocks that part of the data stream of a file entry.

A direct block number is 0 that is part of the data stream represents a sparse data block.

Indirect blocks are blocks that refer to blocks containing direct or indirect block numbers. There are multiple levels of indirect block:

  • indirect blocks (level 1), that refer to direct blocks
  • double indirect blocks (level 2), that refer to indirect blocks
  • triple indirect blocks (level 3), that refer to double indirect blocks

An indirect block number is 0 that is part of the data stream represents sparse data blocks.

Extents

Extents were introduced in ext4 and are controlled by EXT4_FEATURE_INCOMPAT_EXTENTS.

Extents form an extent B-Tree, where:

An extents B-tree node consists of:

  • extents header
  • extents entries
  • extents footer

Note that inodes can have an implicit last sparse extent if the the inode data size is greater than the total data size defined by the extent descriptors.

The ext4 extents header

The ext4 extents header (ext4_extent_header) is 12 bytes in size and consists of:

OffsetSizeValueDescription
02"\x0a\xf3"Signature
22Number of entries
42Maximum number of entries
62Depth, where 0 reprensents a leaf node and 1 to 5 different levels of branch nodes
84Generation, which is used by Lustre, but not by standard ext4

The ext4 extent descriptor

The ext4 extent descriptor (ext4_extent) is 12 bytes in size and consists of:

OffsetSizeValueDescription
04Logical block number
42Number of blocks
62Upper 16-bits of physical block number
84Lower 32-bits of physical block number

If number of blocks > 32768 the extent is considered “uninitialized” which is (as far as currently known) comparable to extent being sparse. The number of blocks of the sparse extent can be determined as following:

sparse_number_of_blocks = number_of_blocks - 32768

Sparse extents can exist between the extent descriptors. In such a case the logical block number will not align with the information from the previous extent descriptors.

Note that the native Linux ext implementation expects the extents to be stored in order of logical block number.

The ext4 extents index

The ext4 extent index (ext4_extent_idx) is 12 bytes in size and consists of:

OffsetSizeValueDescription
04Logical block number, which contains the first logical block number of next depth extents block
44Lower 32-bits of physical block number, which contains the block number of the next depth extents block
82Upper 16-bits of physical block number, which contains the block number of the next depth extents block
102Unknown (unused)

The ext4 extents footer (ext4_extent_tail) is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Checksum of an extents block, which contains a CRC32

The inode

The size of the inode is defined in the superblock when dynamic inode information is present.

Note that the ext4 inode format can be used on ext2 formatted file system. This was observed in combination with format revision 1 and inode size > 128 created by mkfs.ext2.

The ext2 inode

The ext2 inode is 128 bytes in size and consists of:

OffsetSizeValueDescription
02File mode, which contains file type and permissions
22Lower 16-bits of owner (or user) identifier (UID)
44Data size
84(last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
124(last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
164(last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
204Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
242Lower 16-bits of group identifier (GID)
262Number of (hard) links
284Numer of blocks
324Flags
364Unknown (reserved)
4012 x 4Array of direct block numbers. A block number is relative from the start of the volume
884Indirect block number. A block number is relative from the start of the volume
924Double indirect block number. A block number is relative from the start of the volume
964Triple indirect block number. A block number is relative from the start of the volume
1004NFS generation number
1044File ACL (or extended attributes) block number
1084Unknown (Directory ACL)
1124Fragment block address
1161Fragment block index
1171Fragment size
1182Unknown (padding)
1202Upper 16-bits of owner (or user) identifier (UID)
1222Upper 16-bits of group identifier (GID)
1244Unknown (reserved)

Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.

If the inode size is larger than 128 bytes, the additional data can be stored using an ext4 inode extension.

The ext3 inode

The ext3 inode is 132 bytes in size and consists of:

OffsetSizeValueDescription
02File mode, which contains file type and permissions
22Lower 16-bits of owner (or user) identifier (UID)
44Data size
84(last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
124(last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
164(last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
204Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
242Lower 16-bits of group identifier (GID)
262Number of (hard) links
284Numer of blocks
324Flags
364Unknown (reserved)
4012 x 4Array of direct block numbers. A block number is relative from the start of the volume
884Indirect block number. A block number is relative from the start of the volume
924Double indirect block number. A block number is relative from the start of the volume
964Triple indirect block number. A block number is relative from the start of the volume
1004NFS generation number
1044File ACL (or extended attributes) block number
1084Unknown (Directory ACL)
1124Fragment block address
1161Fragment block index
1171Fragment size
1182Unknown (padding)
1202Upper 16-bits of owner (or user) identifier (UID)
1222Upper 16-bits of group identifier (GID)
1244Unknown (reserved)
Extension (if inode size > 128)
1282Extended inode size
1302Unknown (padding)

Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.

If the inode size is larger than 128 bytes, the additional data can be stored using an ext4 inode extension.

The ext4 inode

The ext4 inode is 160 bytes in size and consists of:

OffsetSizeValueDescription
02File mode, which contains file type and permissions
22Lower 16-bits of owner (or user) identifier (UID)
44Lower 32-bits of data size
If EXT4_EA_INODE_FL is not set
84(last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
124(last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
164(last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
If EXT4_EA_INODE_FL is set
84Unknown (extended attribute value data checksum)
124Unknown (lower 32-bits of extended attribute reference count)
164Unknown (inode number that owns the extended attribute)
Common
204Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
242Lower 16-bits of group identifier (GID)
262Number of (hard) links
284Lower 32-bits of number of blocks
324Flags
If EXT4_EA_INODE_FL is not set
364Lower 32-bits of version
If EXT4_EA_INODE_FL is set
364Unknown (upper 32-bits of extended attribute reference count)
If EXT4_EXTENTS_FL and EXT4_INLINE_DATA_FL are not set
4012 x 4Array of direct block numbers. A block number is relative from the start of the volume
884Indirect block number. A block number is relative from the start of the volume
924Double indirect block number. A block number is relative from the start of the volume
964Triple indirect block number. A block number is relative from the start of the volume
If EXT4_EXTENTS_FL is set
4012Extents header
524 x 12extent descriptors or extents indexes
If EXT4_INLINE_DATA_FL is set
4060File content data
Common
1004NFS generation number
1044Lower 32-bits of file ACL (or extended attributes) block number
1084Upper 32-bits of data size
1124Fragment block address
1162Upper 16-bits of number of blocks
1182Upper 16-bits of file ACL (or extended attributes) block number
1202Upper 16-bits of owner (or user) identifier (UID)
1222Upper 16-bits of group identifier (GID)
1242Lower 16-bits of checksum
1262Unknown (reserved)
Extension (if inode size > 128)
1282Extended inode size, which can vary, values of 4, 28 and 32 have been observed
1302Upper 16-bits of checksum
1324(last) inode change (or modification) time extra precision
1364(last) content modification time extra precision
1404(last) access time extra precision
1444Creation time
1484Creation time extra precision
1524Upper 32-bits of version
1564Unknown (i_projid)

If checksum type is CRC-32C, the checksum is stored as 0xffffffff - CRC-32C.

Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.

Checksum calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated from:

  • the 16 byte file system identifier in the superblock
  • the inode number as a 32-bit little-endian integer
  • the NFS generation number in the inode as a 32-bit little-endian integer
  • the data of the inode with the lower and upper part of the checksum set to 0-byte values.

Extra precision

The ext4 extra precision is 4 bytes in size and consists of:

OffsetSizeValueDescription
0.02 bitsExtra epoch value
0.230 bitsFraction of second in nanoseconds

The 34 bits extra precision timestamp (in number of seconds) can be calculated as following:

extra_precision_timestamp = (extra_epoch_value * 0x100000000) + timestamp

Notes

It has been observed that when EXT4_EA_INODE_FL is set the (last) modification time can contain a valid timestamp.

According to The Linux Kernel documentation

For backward compatibility with older versions of this feature, the i_mtime/i_generation may store a back-reference to the inode number and i_generation of the one owning inode (in cases where the EA inode is not referenced by multiple inodes) to verify that the EA inode is the correct one being accessed.

File mode

ValueIdentifierDescription
Access other, Bitmask: 0x0007 (S_IRWXO)
0x0001S_IXOTHX-access for other
0x0002S_IWOTHW-access for other
0x0004S_IROTHR-access for other
Access group, Bitmask: 0x0038 (S_IRWXG)
0x0008S_IXGRPX-access for group
0x0010S_IWGRPW-access for group
0x0020S_IRGRPR-access for group
Access owner (or user), Bitmask: 0x01c0 (S_IRWXU)
0x0040S_IXUSRX-access for owner (or user)
0x0080S_IWUSRW-access for owner (or user)
0x0100S_IRUSRR-access for owner (or user)
Other
0x0200S_ISTXTSticky bit
0x0400S_ISGIDSet group identifer (GID) on execution
0x0800S_ISUIDSet owner (or user) identifer (UID) on execution
Type of file, Bitmask: 0xf000 (S_IFMT)
0x1000S_IFIFONamed pipe (FIFO)
0x2000S_IFCHRCharacter device
0x4000S_IFDIRDirectory
0x6000S_IFBLKBlock device
0x8000S_IFREGRegular file
0xa000S_IFLNKSymbolic link
0xc000S_IFSOCKSocket

Inode flags

ValueIdentifierDescription
0x00000001EXT2_SECRM_FL, EXT3_SECRM_FL, EXT4_SECRM_FL, EXT4_INODE_SECRMSecure deletion
0x00000002EXT2_UNRM_FL, EXT3_UNRM_FL, EXT4_UNRM_FL, EXT4_INODE_UNRMUndelete
0x00000004EXT2_COMPR_FL, EXT3_COMPR_FL, EXT4_COMPR_FL, EXT4_INODE_COMPRCompressed file, which is not yet implemented
0x00000008EXT2_SYNC_FL, EXT3_SYNC_FL, EXT4_SYNC_FL, EXT4_INODE_SYNCSynchronous updates
0x00000010EXT2_IMMUTABLE_FL, EXT3_IMMUTABLE_FL, EXT4_IMMUTABLE_FL, EXT4_INODE_IMMUTABLEImmutable file
0x00000020EXT2_APPEND_FL, EXT3_APPEND_FL, EXT4_APPEND_FL, EXT4_INODE_APPENDWrites to file may only append
0x00000040EXT2_NODUMP_FL, EXT3_NODUMP_FL, EXT4_NODUMP_FL, EXT4_INODE_NODUMPDo not remove (or dump) file
0x00000080EXT2_NOATIME_FL, EXT3_NOATIME_FL, EXT4_NOATIME_FL, EXT4_INODE_NOATIMEDo not update access time (atime)
0x00000100EXT2_DIRTY_FL, EXT3_DIRTY_FL, EXT4_DIRTY_FL, EXT4_INODE_DIRTYDirty compressed file, which is not yet implemented
0x00000200EXT2_COMPRBLK_FL, EXT3_COMPRBLK_FL, EXT4_COMPRBLK_FL, EXT4_INODE_COMPRBLKOne or more compressed clusters, which is not yet implemented
0x00000400EXT2_NOCOMP_FL, EXT3_NOCOMP_FL, EXT4_NOCOMPR_FL, EXT4_INODE_NOCOMPRDo not compress, which is not yet implemented
ext2 and ext3
0x00000800EXT2_ECOMPR_FL, EXT3_ECOMPR_FLEncrypted Compression error
ext4
0x00000800EXT4_ENCRYPT_FL, EXT4_INODE_ENCRYPTEncrypted file
Common
0x00001000EXT2_BTREE_FL, EXT2_INDEX_FL, EXT3_INDEX_FL, EXT4_INDEX_FL, EXT4_INODE_INDEXHash-indexed directory (previously referred to as B-tree format)
0x00002000EXT2_IMAGIC_FL, EXT3_IMAGIC_FL, EXT4_IMAGIC_FL, EXT4_INODE_IMAGICAFS directory
0x00004000EXT2_JOURNAL_DATA_FL, EXT3_JOURNAL_DATA_FL, EXT4_JOURNAL_DATA_FL, EXT4_INODE_JOURNAL_DATAFile data must be written using the journal
0x00008000EXT2_NOTAIL_FL, EXT3_NOTAIL_FL, EXT4_NOTAIL_FL, EXT4_INODE_NOTAILFile tail should not be merged, which is not used by ext4
0x00010000EXT2_DIRSYNC_FL, EXT3_DIRSYNC_FL, EXT4_DIRSYNC_FL, EXT4_INODE_DIRSYNCDirectory entries should be written synchronously (dirsync)
0x00020000EXT2_TOPDIR_FL, EXT3_TOPDIR_FL, EXT4_TOPDIR_FL, EXT4_INODE_TOPDIRTop of directory hierarchy
ext4
0x00040000EXT4_HUGE_FILE_FL, EXT4_INODE_HUGE_FILEIs a huge file
0x00080000EXT4_EXTENTS_FL, EXT4_INODE_EXTENTSInode uses extents
0x00100000EXT4_INODE_VERITYVerity protected inode
0x00200000EXT4_EA_INODE_FL, EXT4_INODE_EA_INODEInode used for large extended attribute
0x00400000EXT4_EOFBLOCKS_FL, EXT4_INODE_EOFBLOCKSBlocks allocated beyond EOF
0x01000000EXT4_SNAPFILE_FLInode is a snapshot
0x02000000EXT4_INODE_DAXInode is direct-access (DAX)
0x04000000EXT4_SNAPFILE_DELETED_FLSnapshot is being deleted
0x08000000EXT4_SNAPFILE_SHRUNK_FLSnapshot shrink has completed
0x10000000EXT4_INLINE_DATA_FL, EXT4_INODE_INLINE_DATAInode has inline data
0x20000000EXT4_PROJINHERIT_FL, EXT4_INODE_PROJINHERITCreate sub file entries with the same project identifier
0x40000000EXT4_INODE_CASEFOLDCasefolded directory
0x80000000EXT4_INODE_RESERVEDUnknown (reserved)

Reserved inode numbers

ValueIdentifierDescription
1EXT2_BAD_INO, EXT3_BAD_INO, EXT4_BAD_INOBad blocks inode
2EXT2_ROOT_INO, EXT3_ROOT_INO, EXT4_ROOT_INORoot inode
3EXT4_USR_QUOTA_INOOwner (or user) quota inode
4EXT4_GRP_QUOTA_INOGroup quota inode
5EXT2_BOOT_LOADER_INO, EXT3_BOOT_LOADER_INO, EXT4_BOOT_LOADER_INOBoot loader inode
6EXT2_UNDEL_DIR_INO, EXT3_UNDEL_DIR_INO, EXT4_UNDEL_DIR_INOUndelete directory inode
7EXT3_RESIZE_INO, EXT4_RESIZE_INOReserved group descriptors inode
8EXT3_JOURNAL_INO, EXT4_JOURNAL_INOJournal inode

Inline data

ext4 supports storing file entry data inline when the inode flag EXT4_INLINE_DATA_FL is set.

Note that inodes can have an implicit last sparse extent if the the inode data size is greater than 60 bytes.

Huge files

TODO: complete section

Directory entries

Directories entries are stored in the data blocks of a directory inode. The directory entries can be stored in multiple ways:

  • as linear directory entries
  • as inline data directory entries
  • as hash-tree directory entries

Linear directory entries

Linear directories entries are stored in a series of allocation blocks.

Linear directory entries contain:

  • directory entry for “.” (self)
  • directory entry for “..” (parent)
  • directory entry for other file system entries

The directory entry

The directory entry is of variable size, at most 263 bytes, and consists of:

OffsetSizeValueDescription
04Inode number
42Directory entry size, which must be a multitude of 4
61Name size, which contains the size of the name without the end-of-string character and has a maximum of 255
71File type
8...Name, which contains a narrow character string without end-of-string character

Older directory entry structures considered the name size a 16-bit value, but the upper byte was never used.

The name can contain any character value except the path segment separator (‘/’) and the NUL-character (‘\0’).

File types

ValueIdentifierDescription
0EXT2_FT_UNKNOWNUnknown
1EXT2_FT_REG_FILERegular file
2EXT2_FT_DIRDirectory
3EXT2_FT_CHRDEVCharacter device
4EXT2_FT_BLKDEVBlock device
5EXT2_FT_FIFOFIFO queue
6EXT2_FT_SOCKSocket
7EXT2_FT_SYMLINKSymbolic link

Inline data directory entries

ext4 supports storing the directory entries as inline data when the inode flag EXT4_INLINE_DATA_FL is set.

The inline data directory entries is of variable size, at most 60 bytes, and consists of:

OffsetSizeValueDescription
04Parent inode number
4...Array of directory entries

Hash tree directory entries

The data of the hash tree (HTree) is stored in the data blocs or extent defined by the directory inode. The hash-indexed directory entries are read-compatible with the linear directory entry.

Hash tree root

The hash tree root consists of:

  • dx_root
    • directory entry for “.” (self)
    • directory entry for “..” (parent)
    • dx_root_info
    • Array of dx_entry
  • directory entry for other file system entries

dx_root_info

OffsetSizeValueDescription
040Unknown (reserved)
41Hash method (or version)
518Root information size
61Number of indirect levels in the hash tree
71Unknown (unused flags)

dx_entry

TODO: complete section

struct dx_entry
{
        __le32 hash;
        __le32 block;
};

If the target path of a symbolic link is less than 60 characters long, it is stored in the 60 bytes in the inode that are normally used for the 12 direct and 3 indirect block numbers. If the target path is longer than 60 characters, a block is allocated, and the block contains the target path. The inode data size contains the length of the target path.

Extended attributes

Extended attributes can be stored:

  • in the inode block after the inode data
  • in the block referenced by the file ACL (or extended attributes) block number, if not 0

Note that both should be read to get the all the extended attributes.

Extended attributes consists of:

  • An extended attributes header
  • Extended attributes entries with a terminator

The extended attributes inode header

The extended attributes inode header (ext2_xattr_ibody_header, ext3_xattr_ibody_header, ext4_xattr_ibody_header) is 4 bytes in size and consists of:

OffsetSizeValueDescription
04"\x00\x00\x02\xea"Signature

The extended attributes block header

The ext2 and ext3 extended attributes block header

The ext2 and ext3 extended attributes block header (ext2_xattr_header, ext3_xattr_header) is 32 bytes in size and consists of:

OffsetSizeValueDescription
04"\x00\x00\x02\xea"Signature
44Unknown (reference count)
84Number of blocks
124Attributes hash
164 x 4Unknown (reserved)

The ext4 extended attributes block header

The ext4 extended attributes block header (ext4_xattr_header) is 32 bytes of size and consists of:

OffsetSizeValueDescription
04"\x00\x00\x02\xea"Signature
44Unknown (Reference count)
84Number of blocks
124Attributes hash
164Checksum
203 x 4Unknown (reserved)

The extended attributes entry

The extended attributes entry (ext2_xattr_entry, ext3_xattr_entry, ext4_xattr_entry) is of variable size and consists of:

OffsetSizeValueDescription
01Name size, which contains the size of the name without the end-of-string character
11Name index
22Value data offset, which contains the offset of the value data relative from the start of the extended attributes block or after the extended attributes signature in the inode block data
44Value data inode number, which contains the inode number that contains the value data or 0 to indicate the current block
84Value data size
124Unknown (Attribute hash)
16...Name string, which contains an ASCII string without end-of-string character and can be empty, for example in combination with a prefix or with an encrypted file
......32-bit alignment padding

The last extended attributes entry has the first 4 values set to 0 (8 bytes) and is used as a terminator.

Note that some implementations of older Android versions of ext appear to only set the first 4 bytes to 0 for the terminator.

The extended attribute name index

The name index indicates the prefix of the extended attribute name.

Name indexName prefixDescription
0""No prefix
1"user."
2"system.posix_acl_access"
3"system.posix_acl_default"
4"trusted."
6"security."
7"system."
8"system.richacl"

Journal

The journal was introduced in ext3.

TODO: complete section

Exclude bitmap

TODO: complete section

Note that the excluded bitmap is used for snapshots.

Corruption scenarios

File entry with invalid extents header signature

File content inaccessible but file entry metadata and extended attributes accessible.

References