Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

Keramics provides read-only access to a collection of data formats.

This document is intended as a working document of specifications of data formats used by the Keramics project. These specifications are based on available documentation and analysis of data samples.

Note that these might differ from authorative format specifications and are works in progress.

Storage media image formats

A storage media image format is used to store data from storage media devices such as a hard disk, a floppy or optical disk like CD-ROM or DVD.

Formats

Expert Witness Compression Format (EWF)

EWF is short for Expert Witness Compression Format. It is a file type used to store storage media images for digital forensic purposes. It is currently widely used in the field of computer forensics in proprietary tooling like EnCase en FTK. The original specification of the format was provided by ASR Data for the SMART application.

The EWF format was succeeded by the Expert Witness Compression Format version 2 in EnCase 7 (EWF2-Ex01 and EWF2-Lx01). EnCase 7 also uses a different version of EWF-L01 then its predecessors.

Overview

The Expert Witness Compression Format (EWF) is used to store:

  • storage media images, such as hard disks, USB sticks, optical disks
  • individual volumes or partitions
  • “physical” RAM and process memory

EWF can store data compressed or uncompressed, in a single image in one or more segment files. Each segment file consist of a standard header, followed by multiple sections. A single section cannot span multiple files. Sections are arranged back-to-back.

Terminology

In this document when referred to the EWF format it refers to the original specification by ASR Data. The newer formats like that of EnCase are deducted from the original specification and will be referred to as the EWF-E01, because of the default file extension. Whereas the Logical File Evidence (LVF) format introduced in EnCase 5, which is also stored in the EWF format will be referred to as EWF-L01. The SMART format is viewed separately to allow for discussion if the implementation differs from the specification by ASR Data and will be referred to as the EWF-S01, because of the default file extension.

All offsets are relative to the beginning of an individual section, unless otherwise noted. EnCase allows a maximum size of a segment file to be 2000 MiB. This has to do with the size of the offset of the chunk of media data. This is a 32 bit value where the most significant bit (MSB) is used as a compression flag. Therefore the maximum offset size (31 bit) can address about 2048 MiB. In EnCase 6.7 an addition was made to the table value to provide for a base offset to allow for segment files greater than 2048 MiB.

A chunk is defined as the sector size (per default 512 bytes) multiplied by the block size, the number of sectors per chunk (block) (per default 64 sectors). The data within the EWF format is stored in little-endian. The terms block and chunk are used intermittently.

Segment file

EWF stores data in one or more segment files (or segments). Each segment file consists of:

  • A file header.
  • One or more sections.

File header

Each segment file starts with a file header.

EWF defines that the file header consists of 2 parts, namely:

  • a signature part
  • fields part

EWF, EWF-E01 and SMART (EWF-S01)

The file header, used by both the EWF-E01 and SMART (EWF-S01) formats, is 13 bytes in size and consists of:

OffsetSizeValueDescription
08"EVF\x09\x0d\x0a\xff\x00"Signature
810x01Start of fields
92Segment number, which must be 1 or higher
1120x0000End of fields

The segment number contains a number which refers to the number of the segment file, starting with 1 for the first file.

Note this means there could only be a maximum of 65535 (0xffff) files, if it is an unsigned value.

EWF-L01

The file header, used by the EWF-L01 format, is 13 bytes in size and consists of:

OffsetSizeValueDescription
08"LVF\x09\x0d\x0a\xff\x00"Signature
810x01Start of fields
92Segment number, which must be 1 or higher
1120x0000End of fields

The segment number contains a number which refers to the number of the segment file, starting with 1 for the first file.

Note this means there could only be a maximum of 65535 (0xffff) files, if it is an unsigned value.

Segment file extensions

The SMART (EWF-S01) and the EWF-E01 formats use a different naming convention for the segment files.

SMART (EWF-S01)

The SMART (EWF-S01) extension naming has two distinct parts.

  • The first segment file has the extension ‘.s01’.
    • The next segment file has the extension ’.s02.
    • This will continue up to ‘.s99’.
  • After which the next segment file has the extension ‘.saa’.
    • The next segment file has the extension ‘.sab’.
    • This will continue up to ‘.saz’.
    • The next segment file has the extension ‘.sba’.
    • This will continue up to ‘.szz’.
    • The next segment file has the extension ‘.faa’.
    • This will continue up to ‘.zzz’.
    • Not confirmed but other sources report it will even continue to the use the extensions ‘.{aa’.

Keramics supports extensions up to .zzz

EWF-E01

The EWF-E01 extension naming has two distinct parts.

  • The first segment file has the extension ‘.E01’.
    • The next segment file has the extension ’.E02.
    • This will continue up to ‘.E99’.
  • After which the next segment file has the extension ‘.EAA’.
    • The next segment file has the extension ‘.EAB’.
    • This will continue up to ‘.EAZ’.
    • The next segment file has the extension ‘.EBA’.
    • This will continue up to ‘.EZZ’.
    • The next segment file has the extension ‘.FAA’.
    • This will continue up to ‘.ZZZ’.
    • Not confirmed but other sources report it will even continue to the use the extensions ‘.[AA’.

Keramics supports extensions up to .ZZZ

EWF-L01

The EWF-L01 extension naming has two distinct parts.

  • The first segment file has the extension ‘.L01’.
    • The next segment file has the extension ’.L02.
    • This will continue up to ‘.L99’.
  • After which the next segment file has the extension ‘.LAA’.
    • The next segment file has the extension ‘.LAB’.
    • This will continue up to ‘.LAZ’.
    • The next segment file has the extension ‘.LBA’.
    • This will continue up to ‘.LZZ’.
    • The next segment file has the extension ‘.MAA’.
    • This will continue up to ‘.ZZZ’.
    • Not confirmed but other sources report it will even continue to the use the extensions ‘.[AA’.

Keramics supports extensions up to .ZZZ

Segment file set identifier GUID

Segment file sets do not have a strict unique identifier. However the volume section contains a GUID that can be used for this purpose. Where:

  • linen 5 to 6 use a time and MAC address based version (1) of the GUID
  • EnCase 5 to 7 and linen 6 to 7 use a random based version (4) of the GUID

Note that in linen 6 the switch from a version 1 to 4 GUID was somewhere made between version 6.01 and 6.19.

See RFC4122 for more information about the different GUID versions.

The sections

The remainder of the segment file consists of sections. Every section starts with the same data this will be referred to as the section header.

Section header

The section header consist of 76 bytes, it contains information about a specific section.

OffsetSizeValueDescription
016Section type, a string containing the section type definition, such as "header" or "volume"
168Next section offset, where the offset is relative from the start of the segment file
248Section size
32400x00Unknown (Padding)
724Checksum, which contains an Adler-32 of all the previous data within the section header

Some sections contain additional data, refer to paragraph section types for more information.

Note Expert Witness 1.35 (for Windows) does not set the section size.

Note that in EnCase 2 DOS version the padding itself does not contains 0-byte values but data, probably the memory is not filled with 0-byte values.

Section types

There are multiple section types. ASR Data - E01 Compression Format defines the following:

  • Header section
  • Volume section
  • Table section
  • Next and Done section

The following sections type were found analyzing more recent EnCase files (EWF-E01):

  • Header2 section
  • Disk section
  • Sectors section
  • Table2 section
  • Data section
  • Error2 section
  • Session section
  • Hash section
  • Digest section

The following sections type were found analyzing more recent EnCase files (EWF-L01):

  • Ltree section
  • Ltypes section

Header2 section

The header2 section is identified in the section data type field as “header2”. Some aspects of this section are:

  • Found in EWF-E01 in EnCase 4 to 7, and EWF-L01 in EnCase 5 to 7
  • Found at the start of the first segment file. Not found in subsequent segment files.
  • The same header2 section is found twice directly after one and other.

The additional data this section contains is the following:

OffsetSizeValueDescription
76 (0x4c)...Information about the acquired media

The information about the acquired media consists of zlib compressed data. It contains text in UTF16 format specifying information about the acquired media. The text multiple lines separated by an end of line character(s).

The first 2 bytes of the UTF16 string are the byte order mark (BOM):

  • 0xff 0xfe for UTF-16 litte-endian
  • 0xfe 0xff for UTF-16 big-endian

In the next paragraphs the various variants of the header2 section are described.

EnCase 4 (EWF-E01)

In EnCase 4 (EWF-E01) the header2 information consist of 5 lines, and contains the equivalent information as the header section.

Line numberValueDescription
11The number of categories provided
2mainThe name/type of the category provided
3Identifiers for the values in the 4th line
4The data for the different identifiers in the 3rd line
5(an empty line)

The end of line character(s) is a newline (0x0a).

Note this end of line character differs from the one used in the header section.

The 3rd and the 4th line consist of the following tab (0x09) separated values.

Identifier numberCharacter in 3rd lineValue in 4th line
1aUnique description
2cCase number
3nEvidence number
4eExaminer name
5tNotes
6avVersion, which contains the EnCase version used to acquire the media
7ovPlatform, which contains the platform/operating system used to acquire the media
8mAcquisition date and time
9uSystem date and time
10pPassword hash

Also see header2 values

Note the hashing algorithm is the same as for the header section.

EnCase 5 to 7 (EWF-E01)

In EnCase 5 to 7 (EWF-E01) the header2 information consist of 17 lines, and contains:

Line numberValueDescription
13The number of categories provided
2mainThe name/type of the category provided
3Identifier for the values in the category
4The data for the different identifiers in the category
5(an empty line)
6srceThe name/type of the category provided, also see sources category
7
8Identifier for the values in the category
9The data for the different identifiers in the category
10
11(an empty line)
12subThe name/type of the category provided, also see subjects category
13
14Identifier for the values in the category
15The data for the different identifiers in the category
16
17(an empty line)

The end of line character(s) is a newline (0x0a).

Main category

The 3rd and the 4th line consist of the following tab (0x09) separated values.

Note the actual values in this category are dependent on the version of EnCase.

Identifier numberCharacter in 3rd lineValue in 4th line
1aUnique description
2cCase number
3nEvidence number
4eExaminer name
5tNotes
6mdThe model of the media, such as hard disk model (introduced in EnCase 6)
7snThe serial number of media (introduced in EnCase 6)
8lThe device label (introduced in EnCase 6.19)
9avVersion, which contains the EnCase version used to acquire the media. EnCase limits this value to 12 characters
10ovPlatform, which contains the platform/operating system used to acquire the media
11mAcquisition date and time
12uSystem date and time
13pPassword hash
14pidProcess identifier, which contains the identifier of the process memory acquired (introduced in EnCase 6.12/Winen 6.11)
15dcUnknown
16extExtents, which contains the extents of the process memory acquired (introduced in EnCase 6.12/Winen 6.11)

Also see header2 values

Note that both the acquiry and system date and time are empty in a file created by winen.

Note that the date values in the header section (not the header2 section) are set to: “Thu Jan 1 00:00:00 1970”. Where the time is dependent on the time zone and daylight savings.

Note that in a Logicube Dossier generated header2 section an additional emtpy value in the 4th line was observed. The number of values in the 3rd and 4th can differ.

Sources category

Line 6 the srce category contains information about acquisition sources.

TODO: describe what a source is in the context of EnCase.

Line 7 consists of 2 values, namely the values are “0 1”.

The 8th line consist of the following tab (0x09) separated values.

Note that the actual values in this category are dependent on the version of EnCase.

Identifier numberCharacter in 8rd lineMeaning
1p
2n
3idIdentifier, which contains an integer identifying the source
4evEvidence number, which contains a string
5tbTotal bytes, which contains an integer
6loLogical offset, which contains an integer which is -1 when value is not set
7poPhysical offset, which contains an integer which is -1 when value is not set
8ahMD5 hash, which contains a string with the MD5 hash of the source
9shSHA1 hash, contains a string with the SHA1 hash of the source (introduced in EnCase 6.19)
10guDevice GUID, which contains a string with a GUID or "0" if not set
11pguPrimary device GUID, which contains a string with a GUID or "0" if not set (introduced in EnCase 7)
12aqAcquisition date and time, which contains an integer with a POSIX timestamp

Line 9 consists of 2 values, namely the values are “0 0”.

Line 10 contains the values defined by line 8.

Note the default values of some of these values has changed around EnCase 6.12.

If the “ha” value contains “00000000000000000000000000000000” this means the MD5 hash is not set. The same applies for the “sha” value when it contains “0000000000000000000000000000000000000000” the SHA1 has is not set.

Subjects category

Line 12 the sub category contains information about subjects.

TODO: describe what a subject is in the context of EnCase.

Line 13 consists of 2 values, namely the values are “0 1”.

The 14th line consist of the following tab (0x09) separated values.

Identifier numberCharacter in 14rd lineMeaning
1p
2n
3idIdentifier, which contains an integer identifying the subject
4nuUnknown (Number)
5coUnknown (Comment)
6guUnknown (GUID)

Line 15 consists of 2 values, namely the values are “0 0”.

Line 16 contains the values defined by line 14.

Note that the default values of some of these values has changed around EnCase 6.12.

EnCase 5 to 7 (EWF-L01)

The EnCase 5 to 7 (EWF-E01) header2 section specification also applies to the EnCase 5 to 7 (EWF-L01) format. However:

  • both the acquired and system date and time are not set

Header2 values

IdentifierDescriptionNotes
aUnique descriptionFree form string. Note that EnCase might not respond when this value is large e.g. >= 1 MiB
avVersionFree form string. EnCase limits this string to 12 - 1 characters
cCase numberFree form string. EnCase limits this string to 3000 - 1 characters
dcUnknown
eExaminer nameFree form string. EnCase limits this string to 3000 - 1 characters
extExtentsExtents header value
lDevice labelFree form string
mAcquisition date and timeString containing POSIX 32-bit epoch timestamp, e.g. "1142163845" which represents the date: March 12 2006, 11:44:05
mdModelFree form string. EnCase limits this string to 3000 - 1 characters
nEvidence numberFree form string. EnCase limits this string to 3000 - 1 characters
ovPlatformFree form string. EnCase limits this string to 24 - 1 characters
pidProcess identifierString containing the process identifier (pid) number
pPassword hashString containing the password hash. If no password is set it should be simply the character '0'
snSerial NumberFree form string. EnCase limits this string to 3000 - 1 characters
tNotesFree form string. EnCase limits this string to 3000 - 1 characters
uSystem date and timeString containing POSIX 32-bit epoch timestamp, e.g. "1142163845" which represents the date: March 12 2006, 11:44:05

Note the restrictions were tested with EnCase 7.02.01, older versions could have a restriction of 40 characters instead of 3000 characters.

Extents header value

An extents header value consist of:

number of entries
entries that consist of: S <1> <2> <3>

Header section

The header section is identified in the section data type field as “header”. Some aspects of this section are:

  • Defined in ASR Data - E01 Compression Format
  • Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)
  • Found at the start of the first segment file or in EnCase 4 to 7 after the header2 section in the first segment file. Typically not found in subsequent segment files with the exception of Logicube Dossier generated EWF-E01 files.

The additional data this section contains is the following:

OffsetSizeValueDescription
76 (0x4c)...Information about the acquired media

The information about the acquired media consists of zlib compressed data. It contains text in ASCII format specifying information about the acquired media. The text multiple lines separated by an end of line character(s).

In the next paragraphs the various variants of the header section are described. In all cases the information consists of at least 4 lines:

Line numberValueDescription
11The number of categories provided
2mainThe name/type of the category provided
3Identifiers for the values in the 4th line
4The data for the different identifiers in the 3rd line

An additional 5th line is found in FTK Imager, EnCase 1 to 7 (EWF-E01).

Line numberValueDescription
5(an empty line)

EWF format

Some aspects of this section are:

According to ASR Data - E01 Compression Format the 3rd and the 4th line consist of the following tab (0x09) separated values:

Identifier numberCharacter in 3rd lineValue in 4th line
1cCase number
2nEvidence number
3aUnique description
4eExaminer name
5tNotes
6mAcquisition date and time
7uSystem date and time
8pPassword hash
9rCompression level

Also see header values

ASR Data - E01 Compression Format states that the Expert Witness Compression uses ‘f’, fastest compression.

EnCase 1 (EWF-E01)

Some aspects of this section are:

  • The header section is defined only once.
  • It is the first section of the first segment file. It is not found in subsequent segment files.
  • The header data itself is compressed using zlib.
  • The end of line character(s) is a carriage return (0x0d) followed by a newline (0x0a).

The 3rd and the 4th line consist of the following tab (0x09) separated values“

Identifier numberCharacter in 3rd lineValue in 4th line
1cCase number
2nEvidence number
3aUnique description
4eExaminer name
5tNotes
6mAcquisition date and time
7uSystem date and time
8pPassword hash
9rCompression level

Also see header values

SMART (EWF-S01)

Some aspects of this section are:

  • The header section is defined once.
  • It is the first section of the first segment file. It is not found in subsequent segment files.
  • The header data is always processed by zlib, however the same compression level is used as for the chunks. This could mean compression level 0 which is no compression.

The SMART format uses the FTK Imager (EWF-E01) specification for this section. Note that this could be something FTK Imager specific.

EnCase 2 and 3 (EWF-E01)

Some aspects of this section are:

  • The same header section defined twice.
  • It is the first and second section of the first segment file. It is not found in subsequent segment files.
  • The header data itself is compressed using zlib.
  • The end of line character(s) is a carriage return (0x0d) followed by a newline (0x0a).

The 3rd and the 4th line consist of the following tab (0x09) separated values:

Identifier numberCharacter in 3rd lineValue in 4th line
1cCase number
2nEvidence number
3aUnique description
4eExaminer name
5tNotes
6avVersion, which contains the EnCase version used to acquire the media
7ovPlatform, which contains the platform/operating system used to acquire the media
8mAcquisition date and time
9uSystem date and time
10pPassword hash
11rCompression level

Also see header values

EnCase 4 to 7 (EWF-E01)

Some aspects of this section are:

  • The header is defined only once.
  • It resides after the header2 sections of the first segment file. It is not found in subsequent segment files.
  • The header data itself is compressed using zlib.
  • The end of line character(s) is a carriage return (0x0d) followed by a newline (0x0a).

The 3rd and the 4th line consist of the following tab (0x09) separated values:

Identifier numberCharacter in 3rd lineValue in 4th line
1cCase number
2nEvidence number
3aUnique description
4eExaminer name
5tNotes
6avVersion, which contains the EnCase version used to acquire the media
7ovPlatform, which contains the platform/operating system used to acquire the media
8mAcquisition date and time
9uSystem date and time
10pPassword hash

Also see header values

linen 5 to 7 (EWF-E01)

Some aspects of this section are:

  • The same header section defined twice.
  • It is the first and second section of the first segment file. It is not found in subsequent segment files.
  • The header data itself is compressed using zlib.
  • The end of line character(s) is a newline (0x0a).

The header information consist of 18 lines

The remainder of the string contains the following information:

Line numberValueDescription
13The number of categories provided
2mainThe name/type of the category provided
3Identifier for the values in the 4th line
4The data for the different identifiers in the 3rd line
5(an empty line)
6srceThe name/type of the section provided, also see Sources category
7
8Identifier for the values in the section
9
10
11(an empty line)
12subThe name/type of the section provided, also see Subjects category
13
14Identifier for the values in the section
15
16
17(an empty line)

The end of line character(s) is a newline (0x0a).

Main category - linen 5

The 3rd and the 4th line consist of the following tab (0x09) separated values.

Note the actual values in this category are dependent on the version of linen.

Identifier numberCharacter in 3rd lineValue in 4th line
1aUnique description
2cCase number
3nEvidence number
4eExaminer name
5tNotes
6avVersion, which contains the linen version used to acquire the media
7ovPlatform, which contains the platform/operating system used to acquire the media
8mAcquisition date and time
9uSystem date and time
10pPassword hash

Also see header values

Main category - linen 6 to 7

The 3rd and the 4th line consist of the following tab (0x09) separated values.

Note the actual values in this category are dependent on the version of linen.

Identifier numberCharacter in 3rd lineValue in 4th line
1aUnique description
2cCase number
3nEvidence number
4eExaminer name
5tNotes
6mdThe model of the media, such as hard disk model (Introduced in linen 6)
7snThe serial number of media (Introduced in linen 6)
8lThe device label (Introduced in linen 6.19)
9avVersion, which contains the linen version used to acquire the media
10ovPlatform, which contains the platform/operating system used to acquire the media
11mAcquisition date and time
12uSystem date and time
13pPassword hash
14pidProcess identifier, which contains the identifier of the process memory acquired (Introduced in linen 6.19 or earlier)
15dcUnknown (Introduced in linen 6)
16extExtents, which contains the extents of the process memory acquired (Introduced in linen 6.19 or earlier)

Note as of linen 6.19 the acquire date and time is in UTC and the system date and time is in local time. Where as before both values were in local time.

Also see header values

Sources category

Line 6 the srce category contains information about acquisition sources

TODO: describe what a source is in the context of EnCase.

Line 7 consists of 2 values, namely the values are “0 1”.

The 8th line consist of the following tab (0x09) separated values.

Identifier numberCharacter in 8rd lineMeaning
1p
2n
3idIdentifier, which contains an integer identifying the source
4evEvidence number, which contains a string
5tbTotal bytes, which contains an integer
6loLogical offset, which contains an integer which is -1 when value is not set
7poPhysical offset, which contains an integer which is -1 when value is not set
8ahUnknown (MD5?), which contains a string
9shUnknown (SHA1?), which contains a string (Introduced in linen 6.19 or earlier)
10guDevice GUID, which contains a string with a GUID or "0" if not set
11aqAcquisition date and time, which contains an integer with a POSIX timestamp

Line 9 consists of 2 values, namely the values are “0 0”.

Line 10 contains the values defined by line 8.

Note the default values of some of these values has changed around linen 6.19 or earlier.

Subjects category

Line 12 the sub category contains information about subjects.

TODO: describe what a subject is in the context of EnCase.

Line 13 consists of 2 values, namely the values are “0 1”.

The 14th line consist of the following tab (0x09) separated values.

Identifier numberCharacter in 14rd lineMeaning
1p
2n
3idIdentifier, which contains an integer identifying the subject
4nuUnknown (Number)
5coUnknown (Comment)
6guUnknown (GUID)

Line 15 consists of 2 values, namely the values are “0 0”.

Line 16 contains the values defined by line 14.

Note the default values of some of these values has changed around linen 6.19 or earlier.

FTK Imager (EWF-E01)

Some aspects of this section are:

  • In FTK Imager (EWF-E01) the same header section defined twice.
  • It is the first and second section of the first segment file. It is not found in subsequent segment files.
  • The header data itself is compressed using zlib. Note that the compression level can be none and therefore the header looks uncompressed.
  • In FTK Imager the end of line character(s) is a newline (0x0a).

The 3rd and the 4th line consist of the following tab (0x09) separated values:

Identifier numberCharacter in 3rd lineValue in 4th line
1cCase number
2nEvidence number
3aUnique description
4eExaminer name
5tNotes
6avVersion, which contains the FTK Imager version used to acquire the media
7ovPlatform, which contains the platform/operating system used to acquire the media
8mAcquisition date and time
9uSystem date and time
10pPassword hash
11rCompression level

Also see header values

EnCase 5 to 7 (EWF-L01)

The EnCase 4 to 7 (EWF-E01) header section specification is also used for the EnCase 5 to 7 (EWF-L01) format, with the following aspects:

  • In EnCase 5 both the acquired and system date and time are set to 0.
  • In EnCase 6 and 7 both the acquired and system date and time are set to Jan 1, 1970 00:00:00 (the time is dependent on the local timezone and daylight savings)

Header values

IdentifierDescriptionNotes
aUnique descriptionFree form string. Note that EnCase might not respond when this value is large e.g. >= 1 MiB
avVersionFree form string. EnCase limits this string to 12 - 1 characters
cCase numberFree form string. EnCase limits this string to 3000 - 1 characters
dcUnknown
eExaminer nameFree form string. EnCase limits this string to 3000 - 1 characters
extExtentsExtents header value
lDevice labelFree form string
mAcquisition date and timeContains a date and time header value
mdModelFree form string. EnCase limits this string to 3000 - 1 characters
nEvidence numberFree form string. EnCase limits this string to 3000 - 1 characters
ovPlatformFree form string. EnCase limits this string to 24 -1 characters
pidProcess identifierString containing the process identifier (pid) number
pPassword hashString containing the password hash. If no password is set it should be simply the character '0'
rCompression levelCompression header value
snSerial NumberFree form string. EnCase limits this string to 3000 - 1 characters
tNotesFree form string. EnCase limits this string to 3000 - 1 characters
uSystemdate and timeContains a date and time header value

Note the restrictions were tested with EnCase 7.02.01, older versions could have a restriction of 40 characters instead of 3000 characters.

Date and time header value

In EnCase a date and time contains a string of individual values separated by a space, e.g. “2002 3 4 10 19 59”, which represents March 4, 2002 10:19:59.

In linen a date and time contains a string with a POSIX 32-bit epoch timestamp, e.g. “1142163845” which represents the date: March 12 2006, 11:44:05

Extents header value

An extents header value consist of:

number of entries
entries that consist of: S <1> <2> <3>
Compression header value

A compression header value consist of a single character that represent the compression level.

Character valueMeaning
bBest compression is used
fFastest compression is used
nNo compression is used
Notes

There should not be a tab, carriage return and newline characters within the text in the 4th line. Or is there a method to escape these characters?

ASR Data - E01 Compression Format states that these characters should not be used in the free form text. Need to confirm this, the specification only speaks of a newline character.

Currently the password has no a additional value than allow an application check it. The data itself is not protected using the password. The password hashing algorithm is unknown. Need to find out. And does the algorithm differ per EnCase version? probably not. The algorithm does not differ in EnCase 1 to 7. FTK Imager does not bother with a password.

Volume section

The volume section is identified in the section data type field as “volume”. Some aspects of this section are:

  • Defined in ASR Data - E01 Compression Format
  • Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)
  • Found after the header section of the first segment file. Not found in subsequent segment files.

In the next paragraphs the various versions of the volume section are described.

EWF specification

The specification according to ASR Data - E01 Compression Format.

The volume section data is 94 bytes in size and consists of:

OffsetSizeValueDescription
040x01Unknown (Reserved)
44The number of chunks within the all segment files
84The number of sectors per chunk, which contains 64 per default
124The number of bytes per sectors, which contains 512 per default
164The sectors count, the number of sectors within all segment files
20200x00Unknown (Reserved)
40450x00Unknown (Padding)
855Signature, which contains the EWF file header signature
904Checksum, which contains an Adler-32 of all the previous data within the volume section data

The number of chunks is a 32-bit value this means it maximum of addressable chunks would be: 4294967295 (= 2^32 - 1). For a chunk size of 32768 x 4294967295 = about 127 TiB. The maximum segment file amount is 2^16 - 1 = 65535. This allows for an equal number of storage if a segment file is filled to its maximum number of chunks.

However Keramics is restricted at 14295 segment files, due to the extension naming schema of the segment files.

SMART (EWF-S01)

The SMART format uses the EWF specification for this section.

In SMART the signature (reverse) value is the string “SMART” (0x53 0x4d 0x41 0x52 0x54) instead of the file header signature.

FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01)

The specification for FTK Imager, EnCase 1 to 7 and linen 5 to 7.

The volume section data is 1052 bytes in size and consists of:

OffsetSizeValueDescription
01Media type
130x00Unknown (empty values)
44The number of chunks within the all segment files
84The number of sectors per chunk (or block size), which contains 64 per default. EnCase 5 is the first version which allows this value to be different than 64
124The number of bytes per sector
168The sectors count, which contains the number of sectors within all segment files. This value probably has been changed in EnCase 6 from a 32-bit value to a 64-bit value to support media >2TiB
244The number of cylinders of the C:H:S value, which most of the time this value is empty (0x00)
284The number of heads of the C:H:S value, which most of the time this value is empty (0x00)
324The number of sectors of the C:H:S value, which most of the time this value is empty (0x00)
361Media flags
3730x00Unknown (empty values)
404PALM volume start sector
4440x00Unknown (empty values)
484SMART logs start sector, which contains an offset relative from the end of media, e.g. a value of 10 would refer to sector = number of sectors - 10
521Compression level (Introduced in EnCase 5)
5330x00Unknown (empty values, these values seem to be part of the compression level)
564The sector error granularity, which contains the error block size (Introduced in EnCase 5)
6040x00Unknown (empty values)
6416Segment file set identifier, which contains a GUID/UUID generated on the acquiry system probably used to uniquely identify a set of segment files (Introduced in EnCase 5)
809630x00Unknown (empty values)
104350x00Unknown (Signature)
10484Checksum, which contains an Adler-32 of all the previous data within the volume section data

TODO: a value that could be in the volume is the RAID stripe size

Note that EnCase requires for media that contains no partition table that the is physical media flag is not set and vice versa. Other tools like FTK check the actual storage media data.

EnCase 5 to 7 (EWF-L01)

The EWF-L01 format uses the EnCase 5 (EWF-E01) volume section specification. However:

  • the volume type contains 0x0e
  • the number of chunks is 0
  • the number of bytes per sectors is some kind of block size value (4096), perhaps the source file system block size
  • the sectors count, represents some other value because (sector_size x sector_amount != total_size). The total size is in the ltree section.

Media type

ValueIdentifierDescription
0x00A removable storage media device
0x01A fixed storage media device
0x03An optical disc (CD/DVD/BD)
0x0eLogical Evidence (LEF or L01)
0x10Physical Memory (RAM) or process memory

Note that FTK imager versions, before version 2.9, set the storage media to fixed (0x01). The exact version of FTK imager where this behavior changed is unknown.

Media flags

ValueIdentifierDescription
0x01Is an image file. In FTK Imager, EnCase 1 to 7 this bit is always set, when not set EnCase seems to see the image file as a device
0x02Is physical device or device type, where 0 represents a non physical device (logical) and 1 represents a physical device
0x04Fastbloc write blocker used
0x08Tableau write blocker used. This was added in EnCase 6.13

Note that if both the the Fastbloc and Tableau write blocker media flags are set EnCase only shows the Fastbloc.

Compression level

ValueIdentifierDescription
0x00no compression
0x01good compression
0x02best compression

Note that EnCase 7 no longer provides the fast and best compression options.

Disk section

The disk section is identified in the section data type field as “disk”. Some aspects of this section are:

With a disk section in an FTK Imager 2.3 (EWF-E01) image it was confirmed that the disk section is the same as the volume section.

Note that the disk section was found only in FTK Imager 2.3 when acquiring a physical disk not a floppy. This requires additional research, it is currently assumed that the disk section some old method to differentiate between a partition (volume) image or a physical disk image.

Data section

The data section is identified in the section data type field as “data”. Some aspects of this section are:

  • Not defined in ASR Data - E01 Compression Format.
  • Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, and EWF-L01 in EnCase 5 to 7. Not found in SMART (EWF-S01).
  • For multiple segment files it does not reside in the first segment file. For a single segment file it does.
  • Found after the last table2 section in a single segment file or for multiple segment files at the start of the segment files, except for the first.
  • The data section has data it should should contain the same information as the volume section.

The data section is a copy of the volume section.

FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01)

Note that in Logicube products (Talon (firmware predating April 2013) and Forensic dossier (before version 3.3.3RC16)) the checksum is not calculated and set to 0.

Sectors section

The sectors section is identified in the section data type field as “sectors”. Some aspects of this section are:

  • Not defined in ASR Data - E01 Compression Format.
  • Found in EWF-E01 in EnCase 2 to 7, or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7. Not found in EnCase 1 (EWF-E01) or SMART (EWF-S01).
  • The first sectors section can be found after the volume section in the first segment file or at the after the data section in subsequent segment files. Successive sector data sections are found after the sector table2 section.

The sectors section contains the actual chunks of media data.

  • The sectors section can contain multiple chunks.
  • The default size of a chunk is 32768 bytes of data (64 standard sectors, with a size of 512 bytes per sector). It is possible in EnCase 5 and 6 and linen 5 and 6 to change the number of sectors per block to 64, 128, 256, 1024, 2048, 4096, 8192, 16384 or 32768. In EnCase 7 and linen 7 this has been reduced to 64, 128, 256, 1024.

Data chunk

The first chunk is often located directly after the section header, although the format does not require this.

When the data is compressed and the compressed data (with checksum) is larger than the uncompressed data (without the checksum) the data chunk is stored uncompressed. The default size of a chunk is 32768 bytes of data (64 standard sectors).

An uncompressed data chunk is of variable size and consists of:

OffsetSizeValueDescription
0...Uncompressed chunk data
...4Checksum, which contains an Adler-32 of the chunk data

The compressed data chunk consist of zlib compressed data. The checksum of the compressed data chunk is part the zlib compressed data format.

Optical disc images

For a MODE-1 CD-ROM optical disc image EnCase only seems to support 2048 bytes per sector (the data).

The raw sector size of a MODE-1 CD-ROM is 2352 bytes in size and consists of:

OffsetSizeValueDescription
016Synchronization bytes
162048Data
20544Error detection
205880x00Unknown (Empty values)
2066276Error correction

TODO: add information about Mode-2 and Mode-XA

Table section

The table section is identified in the section data type field as “table”. Some aspects of this section are:

Note that the offsets within the section header are 8 bytes (64 bits) of size while the offsets in the table entry array are 4 bytes (32 bits) in size.

In the next paragraphs the various versions of the table section are described.

EWF specification

Some aspects of the table section according to the EWF specification are:

  • The first table section resides after the volume section in the first segment file or after the file header in subsequent segment files.
  • It can be found in every segment file.

The table section consists of:

  • the table header
  • an array of table entries
  • the data chunks
Table header

The table header is 24 bytes in size and consists of:

OffsetSizeValueDescription
04The number of entries
4160x00Unknown (Padding)
204Checksum, which contains an Adler-32 of all the previous data within the table header data

According to ASR Data - E01 Compression Format

  • the number of entries, contains 0x01
  • the table can hold 16375 entries if more entries are required an additional table section should be created.
Table entry

The table entry is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Chunk data offset

The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).

A chunk data offset points to the start of the chunk of media data, which resides in the same table section within the segment file. The offset contains a value relative to the start of the file.

Data chunk

The first chunk is often located directly after the last table entry, although the format does not require this.

A data chunk is always compressed even when no compression is required. This approach provides a checksum for each chunk. The default size of a chunk is 32768 bytes of data (64 standard sectors). The resulting size of the “compressed” chunk can therefore be larger than the default chunk size.

Note that this was deducted from the behavior of FTK Imager for SMART (EWF-S01).

The compressed data chunk consist of zlib compressed data. The checksum of the compressed data chunk is part the zlib compressed data format.

SMART (EWF-S01)

The table section in the SMART (EWF-S01) format is equivalent to that of the EWF specification.

EnCase 1 (EWF-E01)

Some aspects of this section are:

  • The table section resides after the volume section in the first segment file or after the file header in subsequent segment files.
  • It can be found in every segment file.

The table section consists of:

  • the table header
  • an array of table entries
  • the table footer
  • the data chunks
Table header

The table header is 24 bytes in size and consists of:

OffsetSizeValueDescription
04The number of entries
4160x00Unknown (Padding)
204Checksum, which contains an Adler-32 of all the previous data within the table header data

The table can hold 16375 entries if more entries are required an additional table section should be created.

Table entry

The table entry is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Chunk data offset

The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).

A chunk data offset points to the start of the chunk of media data, which resides in the same table section within the segment file. The offset contains a value relative to the start of the file.

The table footer is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Checksum, which contains an Adler-32 of the offset array
Data chunk

The first chunk is often located directly after the table footer, although the format does not require this.

When the data is compressed and the compressed data (with checksum) is larger than the uncompressed data (without the checksum) the data chunk is stored uncompressed. The default size of a chunk is 32768 bytes of data (64 standard sectors).

An uncompressed data chunk is of variable size and consists of:

OffsetSizeValueDescription
0...Uncompressed chunk data
...4Checksum, which contains an Adler-32 of the chunk data

The compressed data chunk consist of zlib compressed data. The checksum of the compressed data chunk is part the zlib compressed data format.

FTK Imager and EnCase 2 to 5 and linen 5 (EWF-E01)

Some aspects of this section are:

  • The table section resides after the sectors section.
  • It can be found in every segment file.
  • The data chunks are no longer stored in this section but in the sectors section instead.
  • The table2 section contains a mirror copy of the table section. In EWF-E01 it is always present.

The table section consists of:

  • the table header
  • an array of table entries
  • the table footer
Table header

The sector table header is 24 bytes in size and consists of:

OffsetSizeValueDescription
04The number of entries
4160x00Unknown (Padding)
204Checksum, which contains an Adler-32 of all the previous data within the table header data

The table section can hold 16375 entries. A new table section should be created to hold more entries. Both FTK Imager and EnCase 5 can handle more than 16375, FTK 1 cannot. To contain more than 16375 chunks new sectors, table and table2 sections need to be created after the table2 section.

Table entry

The table entry is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Chunk data offset

The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).

A chunk data offset points to the start of the chunk of media data, which resides in the preceding sectors section within the segment file. The offset contains a value relative to the start of the file.

The table footer is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Checksum, which contains an Adler-32 of the offset array

EnCase 6 to 7 and linen 6 to 7 (EWF-E01)

Some aspects of this section are:

  • Every segment file contains its own table section.
  • It resides after the sectors section.
  • The data chunks are no longer stored in this section but in the sectors section instead.
  • The table2 section contains a mirror copy of the table section. In EWF-E01 it is always present.

The table section consists of:

  • the table header
  • an array of table entries
  • the table footer
Table header

The sector table header is 24 bytes in size and consists of:

OffsetSizeValueDescription
04The number of entries
440x00Unknown (Padding)
88The table base offset
1640x00Unknown (Padding)
204Checksum, which contains an Adler-32 of all the previous data within the table header data

As of EnCase 6 the number of entries is no longer restricted to 16375 entries. The new limit seems to be 65534.

Table entry

The table entry is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Chunk data offset

The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).

A chunk data offset points to the start of the chunk of media data, which resides in the preceding sectors section within the segment file. The offset contains a value relative to the table base offset.

In EnCase 6.7.1 the sectors section can be larger than 2048Mb. The table entries offsets are 31 bit values in EnCase6 the offset in a table entry value will actually use the full 32 bit if the 2048Mb has been exceeded. This behavior is no longer present in EnCase 6.8 so it is assumed to be a bug. Libewf currently assumes that the if the 31 bit value overflows the following chunks are uncompressed. This allows EnCase 6.7.1 faulty EWF files to be converted.

The table footer is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Checksum, which contains an Adler-32 of the offset array

EnCase 6 to 7 (EWF-L01)

The EWF-L01 format uses the EnCase 6 to 7 (EWF-E01) table section specification.

Table2 section

The table2 section is identified in the section data type field as “table2”. Some aspects of this section are:

  • Not defined in ASR Data - E01 Compression Format.
  • Found in EWF-E01 in EnCase 2 to 7, or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7. Not found in EnCase 1 (EWF-E01) or SMART (EWF-S01).
  • Uses the same format as the table section.
  • Resides directly after the table section.

FTK Imager and EnCase 2 to 7 and linen 5 to 7 (EWF-E01)

The table2 section contains a mirror copy of the table section. Probably intended for recovery purposes.

EnCase 5 to 7 (EWF-L01)

The EWF-L01 format uses the EWF-E01 table2 section specification.

Next section

The next section is identified in the section data type field as “next”. Some aspects of this section are:

  • Defined in ASR Data - E01 Compression Format.
  • Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)
  • The last section within a segment other than the last segment file.
  • The offset to the next section in the section header of the next section point to itself (the start of the next section).
  • It should be the last section in a segment file, other than the last segment file.

SMART (EWF-S01)

It resides after the table or table2 section.

FTK Imager, EnCase and linen (EWF-E01)

It resides after the data section in a single segment file or for multiple segment files after the table2 section.

In the EnCase (EWF-E01) format the size in the section header is 0 instead of 76 (the size of the section header).

Note that FTK imager versions before 2.9 sets the section size to 76. At the moment it is unknown in which version this behavior was changed.

Ltypes section

The ltypes section is identifier in the section data type field as “ltypes”. Some aspects of this section are:

  • Found in EWF-L01 in of EnCase 7
  • Found in the last segment file after table2 section before tree section.

The additional ltypes section data is 6 bytes in size and consists of:

OffsetSizeValueDescription
02Unknown
22Unknown
42Unknown

Ltree section

The ltree section is identifier in the section data type field as “ltree”. Some aspects of this section are:

  • Found in EWF-L01 in of EnCase 5 to 7
  • Found in the last segment file after ltypes section and before data section.

The ltree section consists of:

  • ltree header
  • ltree data

Ltree header

The ltree header is 48 bytes in size and consists of:

OffsetSizeValueDescription
016Integrity hash, which contains the MD5 of the ltree data
168Data size
244Checksum, which contains an Adler-32 of all the data within the ltree header where the checksum value itself is zeroed out
2820Unknown (empty values)

Ltree data

The ltree data string consists of an UTF-16 little-endian encoded string without byte order mark. The ltree data is not strict UTF-16 since it allows for unpaired surrogates, such as “U+d800” and “U+dc00”.

Other observed characteristics where the names in the ltree deviate from the original source:

  • [U+0001-U+0008] were converted to U+00ba
  • [U+0009, U+000a] were stripped
  • [U+000b, U+000c] were converted to U+0020
  • U+000d was converted to U+0002
  • U+00ba remained the same

Note that this behavior could be related to EnCase as well and might not be specific for EWF-L01.

The ltree data string contains the following information:

Line numberValueDescription
15The number of categories provided
2recInformation about unknown, also see Records category
...(an empty line)
...permInformation about file permissions, also see Permissions category
...(an empty line)
...srceInformation about acquisition sources, also see sources category
...(an empty line)
...subInformation about unknown, also see subjects category
...(an empty line)
...entryInformation about file entries, also see File entries category
...(an empty line)

The end of line character(s) is a newline (0x0a).

Records category

The rec category contains information about records.

The 1st line of the category contains the string “rec”.

The 2nd line of the category contains tab (0x09) separated type indicators.

Identifier numberType indicatorDescription
1tbTotal bytes, which contains an integer with size of the logical file data (media data)
2clUnknown (Clusters?)
3nUnknown (introduced in EnCase 6.19)
4fpUnknown (introduced in EnCase 7)
5pgUnknown (introduced in EnCase 7)
6lgUnknown (introduced in EnCase 7)
7igUnknown (introduced in EnCase 7)

The 3rd line of the category consist of the tab (0x09) separated values.

Permissions category

The perm category contains information about file permissions.

The 1st line of the category contains the string “perm”.

The 2nd line consists of the following 2 values:

Value numberValueDescription
1The number of permission groups in the category
21Unknown

The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.

The remaining lines in the category consist of:

  • category root entry
    • zero or more permissions group entries
      • zero or more permission entries

Each entry consist of 2 lines:

Line numberValueDescription
1Number of entries
2Tab (0x09) separated values that correspond to the type indicators

The 1st line of the category root entry consists of the following 2 values:

Value numberValueDescription
10Unknown
2The number of permission groups in the category

The 1st line of the permission group entry consists of the following 2 values:

Value numberValueDescription
10Unknown
2The number of permissions in the group

The 1st line of the permission entry consists of the following 2 values:

Value numberValueDescription
10Unknown
20Unknown
Permission type indicators
Identifier numberType indicatorDescription
1pIs parent, where 1 represents if the entry is a category root or permissions group and 0 represents if the entry is a permission
2nName, which contains a string
3sSecurity identifier, which contains a string with either a Windows NT security identifier (SID) or a POSIX user (uid) or group identifier (gid) in the format " number:" such as " 99:"
4prProperty type, also see permission types
5ntaAccess mask
6ntiUnknown (Windows NT access control entry (ACE) flags?, which contains an integer with a Windows NT access control entry (ACE) flags)
7ntsUnknown (Permission?) (Removed in EnCase 6)
Permission types
ValueIdentifierDescription
(empty)Owner or category root
1Group
2Allow
 
6Other
 
10Unknown (permissions group?)
Access mask

Access mask seen in combination with property types 0, 1 and 6

ValueIdentifierDescription
(empty)Owner or category root
0x00000001[Lst Fldr/Rd Data]List folder / Read data
0x00000002[Crt Fl/W Data]Create file / Write data
 
0x00000020[Trav Fldr/X Fl]Traverse folder / Execute file

Access mask seen in combination with property type 2

[0x001200a9] [R&X] [R] [Sync]
[0x001301bf] [M] [R&X] [R] [W] [Sync]
[0x001f01ff] [FC] [M] [R&X] [R] [W] [Sync]
ValueIdentifierDescription
(empty)Owner or category root
0x00000001
0x00000002
0x00000004
0x00000008
0x00000010
0x00000020
0x00000040
0x00000080
0x00000100
 
0x00010000
0x00020000
0x00040000
0x00080000
0x00100000

Sources category

The srce category contains information about acquisition sources of the file entries.

TODO: describe what an acquisition source is in the context of EnCase.

The 1st line of the category contains the string “srce”.

The 2nd line consists of 2 values.

Value indexValueDescription
1The number of sources in the category
21Unknown

The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.

The remaining lines in the category consist of:

  • category root
    • zero or more source entries

Each entry consist of 2 lines:

Line numberValueDescription
1Number of entries
2Tab (0x09) separated values that correspond to the type indicators

The 1st line of the category root entry consists of the following 2 values:

Value numberValueDescription
10Unknown
2The number of sources in the category

The 1st line of the source entry consists of the following 2 values:

Value numberValueDescription
10Unknown
20Unknown
Source type indicators
Identifier numberType indicatorDescription
1p
2n
3idIdentifier, which contains an integer identifying the source
4evEvidence number, which contains a string
5doDomain, which contains a string (introduced in EnCase 7.9)
6locLocation, which contains a string (introduced in EnCase 7.9)
7seSerial number, which contains a string (introduced in EnCase 7.9)
8mfrManufacturer, which contains a string (introduced in EnCase 7.9)
9moModel, which contains a string (introduced in EnCase 7.9)
10tbTotal bytes, which contains an integer
11loLogical offset, which contains an integer which is -1 when value is not set
12poPhysical offset, which contains an integer which is -1 when value is not set
13ahMD5 hash, which contains a string with the MD5 hash of the source
14shSHA1 hash, which contains a string with the SHA1 hash of the source (introduced in EnCase 6.19)
15guDevice GUID, which contains a string with a GUID or "0" if not set
16pguPrimary device GUID, which contains a string with a GUID or "0" if not set (introduced in EnCase 7)
17aqAcquisition date and time, which contains an integer with a POSIX timestamp
18ipIP address, which contains a string (introduced in EnCase 7.9)
19siUnknown (Static IP address?), Contains 1 if static, empty otherwise (introduced in EnCase 7.9)
20maMAC address, which contains a string without separator characters (introduced in EnCase 7.9)
21dtDrive type, which contains a single character (introduced in EnCase 7.9)

The acquisition date and time is in the form of: “1142163845”, which is a POSIX epoch timestamp and represents the date: March 12 2006, 11:44:05.

If the “ha” value contains “00000000000000000000000000000000” this means the MD5 hash is not set. The same applies for the “sha” value when it contains “0000000000000000000000000000000000000000” the SHA1 has is not set.

If the “ma” value contains “000000000000” this means the MAC address is not set.

Drive type
Character valueMeaning
fFixed drive

Subjects category

The sub category contains information about TODO

TODO: describe what a subject is in the context of EnCase.

The 1st line of the category contains the string “sub”.

The 2nd line consists of 2 values.

Value indexValueDescription
1The number of subjects in the category
21Unknown

The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.

The remaining lines in the category consist of:

  • category root
    • zero or more subject entries

Each entry consist of 2 lines:

Line numberValueDescription
1Number of entries
2Tab (0x09) separated values that correspond to the type indicators

The 1st line of the category root entry consists of the following 2 values:

Value numberValueDescription
10Unknown
2The number of subject in the category

The 1st line of the subject entry consists of the following 2 values:

Value numberValueDescription
10Unknown
20Unknown
Subject type indicators
Identifier numberType indicatorDescription
1p
2n
3idIdentifier, which contains an integer identifying the subject
4nuUnknown (Number)
5coUnknown (Comment)
6guUnknown (GUID)

File entries category

The entry category contains information about the file entries.

The 1st line of the category contains the string “entry”.

The 2nd line consists of 2 values.

Value indexValueDescription
1The number of file entries in the category or 1 if unknown
21Unknown

The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.

The remaining lines in the category consist of:

  • category root
    • zero or more file entries
      • zero or more sub file entries

Each entry consist of 2 lines:

Line numberValueDescription
1Number of entries
2Tab (0x09) separated values that correspond to the type indicators

The 1st line of the category root entry consists of the following 2 values:

Value numberValueDescription
10 if not set or 26 if Unknown
2The number of file entries in the category

The 1st line of the file entry consists of the following 2 values:

Value numberValueDescription
1Number of file entries in the parent file entry or 0 if not set
2The number of sub file entries in the file entry
EnCase 5 and 6 (EWF-L01) file entry type indicators
Identifier numberCharacter in 29th lineMeaning
1pIs parent, where 1 => if the entry is a directory and (empty) => if the entry is a file
2nName
3idIdentifier, contains an integer identifying the file entry
4oprFile entry flags
5srcSource identifier, which contains an integer that corresponds to an identifier in the Sources category
6subSubject identifier, which contains an integer that corresponds to an identifier in the Subjects category
7cidUnknown (record type)
8jqUnknown
9crCreation date and time
10acAccess date and time, for which currently is assumed the precision is date only
11wr(File) modification (last written) date and time
12mo(File system) entry modification date and time
13dlDeletion date and time
14aqAcquisition date and time, which contains an integer with a POSIX timestamp
15haMD5 hash, which contains a string with the MD5 hash of the file data
16lsFile size in bytes. If the file size is 0 the data size should be 1
17duDuplicate data offset, relative from the start of the media data
18loLogical offset, which contains an integer which is -1 when value is not set
19poPhysical offset, which contains an integer which is -1 when value is not set (or does this value contain the segment file in which the start of the data is stored, -1 for a single segment file?)
20midGUID, which contains a string with a GUID (introduced in EnCase 6.19)
21cfiUnknown (introduced in EnCase 6.14)
22beBinary extents
23pmPermissions group index, which contains an integer that corresponds to an identifier in the Permissions category or -1 if not set. The value is 0 by default
24lptUnknown (introduced in EnCase 6.19)

The creation, access and last written date and time are in the form of: “1142163845”, which is a POSIX epoch timestamp and represents the date: March 12 2006, 11:44:05.

The “ha” value (Hash) consist of a MD5 hash string when file entries are hashed. If the “ha” value contains “00000000000000000000000000000000” this means the MD5 hash is not set.

Ltree file entries

The ltree entries of files and directories consist of entries starting with: 0 followed by the number of sub file entries.

The entries of files and directories:

Line numberValueDescription
1(empty)The root directory
2The target drive/mount point
3The actual single file entries
EnCase 7 (EWF-L01) file entry type indicators
Identifier numberCharacter in 29th lineMeaning
1midGUID, which contains a string with a GUID
2lsFile size, in bytes. If the file size is 0 the data size should be 1
3beBinary extents
4idIdentifier, which contains an integer identifying the file entry
5crCreation date and time
6acAccess date and time
7wr(File) modification (last written) date and time
8mo(File system) entry modification date and time
9dlDeletion date and time
10sigUnknown (Introduced in EnCase 7)
11haMD5 hash, which contains a string with the MD5 hash of the file data
12shaSHA1 hash, which contains a string with the SHA1 hash of the file data. (Introduced in EnCase 7)
13entUnknown, seen "B" (Introduced in EnCase 7.9)
14snhShort name (or DOS 8.3 name) (Introduced in EnCase 7.9)
15pIs parent, where "1" represents that the entry is a directory and "" (an empty string) that the entry is a file
16nName
17duDuplicate data offset, relative from the start of the media data
18loLogical offset, which contains an integer which is -1 when value is not set
19poPhysical offset, which contains an integer which is -1 when value is not set (or does this value contain the segment file in which the start of the data is stored, -1 for a single segment file?)
20pmPermissions group index, which contains an integer that corresponds to an identifier in the Permissions category or -1 if not set. The value is 0 by default
21oesUnknown (Original extents?) (Introduced in EnCase 7)
22oprFile entry flags
23srcSource identifier, which contains an integer that corresponds to an identifier in the Sources category
24subSubject identifier, which contains an integer that corresponds to an identifier in the Subjects category
25cidUnknown (record type?)
26jqUnknown
27altUnknown (Introduced in EnCase 7)
28epUnknown (Introduced in EnCase 7)
29aqAcquisition date and time, which contains an integer with a POSIX timestamp
30cfiUnknown
31sgUnknown (Introduced in EnCase 7)
32eaExtended attributes (Introduced in EnCase 7.9)
33lptUnknown

If the “ha” value contains “00000000000000000000000000000000” this means the MD5 hash is not set. The same applies for the “sha” value when it contains “0000000000000000000000000000000000000000” the SHA1 has is not set.

File entry name

A file entry name (“n” value):

  • can contain path segment separator characters like “\” and “/”
  • uses the “MIDDLE DOT” Unicode character (U+00b7) as a (NTFS) alternative data stream (ADS) name seperator

Note that a regular “MIDDLE DOT” Unicode character will be encoded in the same way so no real way to reliably tell the difference.

An empty name has been observed to be represented as “NoName”.

Short name

The short name (“snh”) value contains 2 values:

Value numberValueDescription
1The number of characters in the short name including the end-of-string character
2The short name string, without an end-of-string character

For example: “13 FILE10~1.TXT”

Original extents

TODO: add some text

1 30a555b 30a6000 12011ae00 9008d7 3f 43 1 12011ae00 30a6000 120113 30a6 9008d7 18530
Ltree file entries

The ltree entries of files and directories consist of entries starting with: 26 followed by the number of sub file entries.

The entries of files and directories:

Line numberValueDescription
1LogicalEntriesThe root directory
2The target drive/mount point
3The actual single file entries
File entry flags
ValueIdentifierDescription
0x00000001Unknown (Is read-only?)
0x00000002HiddenIs hidden
0x00000004SystemIs system
0x00000008ArchiveIs archive
0x00000010Sym LinkIs symbolic link, junction or reparse point
0x00000080DeletedIs deleted
0x00001000Hard LinkedIs hard link
0x00002000StreamIs stream
0x00100000InternalIs internal (used in combination with 0x00000006?)
0x00200000Unallocated ClustersUnknown
0x00400000Unknown
0x01000000Unknown
0x02000000FolderIs folder
0x04000000Data is sparse

If 0x00002000 or 0x02000000 are not set the file entry is of type “File”.

If the sparse data flag is set:

  • the data size should be 1 and data should consist of a single byte value.
  • the data size should be equal to the file size and data should be the same.

If the duplicate data offset value is not set the single byte value in the data should be used to reconstruct the file data. E.g. if the file size is 4096 and the data contains the byte value 0x00 the resulting file should consists of 4096 x 0x00 byte values.

If the duplicate data offset value is set the single byte in the data is ignored and the duplicate data offset refers to the location where the data stored.

Binary extents value

The binary extents value contains 3 values separated by a space:

Unknown Offset Size

Where:

  • unknown always is 1, could this be the number of extents?
  • extent data offset, relative from the start of the media data
  • extent data size

The offset and size are specified in hexadecimal values.

Note that the binary extents value contains only 1 value for the first single file entry.

Extended attributes value

The extended attributes value contains base-16 encoded data, which consists of:

  • Extended attributes header (stored as an extended attribute)
  • One or more extended attributes
Extended attributes header

The extended attributes header is 37 bytes in size and consists of:

OffsetSizeValueDescription
040Unknown (0 => root, 1 => otherwise)
411Unknown (0 => is leaf node, 1 => is branch node?)
5411Number of characters in name string including the end-of-string character
941Number of characters in value string including the end-of-string character
1322"Attributes\0"Name string, which contains an UTF-16 little-endian encoded string including end-of-string character
352"\0"Value string, which contains an UTF-16 little-endian encoded string including end-of-string character
Extended attribute

An extended attributes is of variable size and consists of:

OffsetSizeValueDescription
04Unknown (0 => root, 1 => otherwise)
41Unknown (0 => is leaf node, 1 => is branch node?)
54Number of characters in name string including the end-of-string character
94Number of characters in value string including the end-of-string character
13...Name string, which contains an UTF-16 little-endian encoded string including end-of-string character
......Value string, which contains an UTF-16 little-endian encoded string including end-of-string character

TODO: complete section

Note that branch nodes are presuably used to group attributes, however these are not used consistently and are not shown by EnCase 7.

Map section

Some aspects of this section are:

  • Found in EWF-L01 in of EnCase 7 (First seen in EnCase 7.4.1.10)
  • Found in the last segment file after data section before done section.

The map consists of:

  • map string
  • map entries array

Map string

The map string consists of an UTF-16 little-endian encoded string without the UTF-16 endian byte order mark.

The map string contains the following information:

Line numberValueDescription
11The number of categories provided
2rProbably the type of information provided
3cIdentifier for the values in the 4th line
4The data for the different identifiers in the 3rd line
5(an empty line)
Map string values
Identifier numberCharacter in 29th lineMeaning
1CNumber of map entries (count)

The number of map entries should match the number of file entries in the ltree.

Map entry

A map entry is 24 bytes in size and consists of:

OffsetSizeValueDescription
04Unknown
44Unknown (empty values or part of previous value)
816Unknown

Session section

The session section is identifier in the section data type field as “session”. Some aspects of this section are:

  • Not defined in ASR Data - E01 Compression Format.
  • It is not found in SMART (EWF-S01) and FTK Imager (EWF-E01).
  • It is found in EnCase 5 and 6 (EWF-E01) files.
  • It is only added to the last segment file for images of optical disc (CD/DVD/BD) media.
  • It is found after the data section and before the error2 section.

The session section data consists of:

  • The session header
  • The session entries array
  • The session footer

Session header

The session header is 36 byte in size and consists of:

OffsetSizeValueDescription
04Number of sessions
428Unknown (empty values)
324Checksum, which contains an Adler-32 of all the previous data within the additional session section data

Session entry

A session entry is 32 byte in size and consists of:

OffsetSizeValueDescription
04Flags
44Start sector
824Unknown (empty values)

EnCase stores audio tracks as 0 byte data with a sector size of 2048.

Note that for a CD the first session sector is stored as 16, although the actual session starts at sector 0. Could this value be overloaded to indicate the size of the reserved space between the start of the session and the ISO 9660 volume descriptor.

Session flags

ValueIdentifierDescription
0x00000001If set the track is an audio track otherwise the track is a data track

The session footer is 4 byte in size and consists of:

OffsetSizeValueDescription
04Checksum, which contains an Adler-32 of all the data within the session entries array

Error2 section

The error2 section is identifier in the section data type field as “error2”. Some aspects of this section are:

  • Not defined in ASR Data - E01 Compression Format.
  • It is not found in SMART (EWF-S01).
  • It is found in, EnCase 3 to 7 and linen 5 to 7 (EWF-E01) files.
  • It is only added to the last segment file when errors were encountered while reading the input.

TODO: check FTK Imager, EnCase 1 and 2 for presence of the error2 section.

It contains the sectors that have read errors. The sector where a read error occurred are filled with zero’s during acquiry by EnCase.

The error2 section data consists of:

  • The error2 header
  • The error2 entries array
  • The error2 footer

Error2 header

The error2 header is 520 byte in size and consists of:

OffsetSizeValueDescription
04Number of entries
4512Unknown (empty values)
5164Checksum, which contains an Adler-32 of all the previous data within the error2 header data

Error2 entry

An error2 entry is 8 byte in size and consists of:

OffsetSizeValueDescription
04Start sector
44The number of sectors

The error2 footer is 4 byte in size and consists of:

OffsetSizeValueDescription
04Checksum, which contains an Adler-32 of all the data within the error2 entries array

Digest section

The digest section is identified in the section data type field as “digest”. Some aspects of this section are:

  • It is found in EnCase 6 to 7 files, as of EnCase 6.12 and linen 6.12 (EWF-E01).

The digest section contains a MD5 and/or SHA1 hash of the data within the chunks.

The digest section data is 80 byte in size and consists of:

OffsetSizeValueDescription
016MD5 hash of the media data
1620SHA1 hash of the media data
36400x00Unknown (Padding)
764Checksum, which contains an Adler-32 of all the previous data within the digest section data

Hash section

The hash section is identified in the section data type field as “hash”. Some aspects of this section are:

  • Defined in ASR Data - E01 Compression Format.
  • It is found in SMART (EWF-S01) and FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01) files.
  • It is not found in EnCase 5 (EWF-L01).
  • The hash section is optional, it does not need to be present. If it does it resides in the last segment file before the done section.

The hash section contains a MD5 hash of the data within the chunks.

The hash section data is 36 byte in size and consists of:

OffsetSizeValueDescription
016MD5 hash of the media data
1616Unknown
324Checksum, which contains an Adler-32 of all the previous data within the additional hash section data

Notes

Observations regarding the unknown value:

  • is zero in SMART
  • is zero in EnCase 3 and below
  • in EnCase 4 the first 4 bytes are 0, the next 8 bytes seem random, the last 4 bytes seem fixed
  • in EnCase 5 and 6 the first 8 bytes seem random, the last 8 bytes equal the file header signature
  • in linen 5 the first and last set of 4 bytes seem the same, the second set of 4 bytes seem to be random, the third set of 4 bytes seem to contain a piece of the file header signature
  • in linen 6 the first and third set of 4 bytes seem random, the second and last set of 4 bytes seem to be the same
  • EnCase5 seems to contain a GUID of the acquired device?

Test with EnCase 4 show that:

  • The value does not equal the checksum of the media data
  • Does not differentiate for the same media acquired within the same program session, using different formats, but differ for different media and different program sessions

Done section

The done section is identified in the section data type field as “done”. Some aspects of this section are:

  • Defined in ASR Data - E01 Compression Format.
  • It is found in SMART (EWF-S01), FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01) and EnCase 5 (EWF-L01) files.
  • The done section is the last section within the last segment file.
  • The offset to the next section in the section header of the done section point to itself (the start of the done section).
  • It should be the last section in the last segment file.

SMART (EWF-S01)

It resides after the table or table2 section.

FTK Imager, EnCase and linen (EWF-E01)

It resides after the data section in a single segment file or for multiple segment files after the table2 section.

In the EnCase (EWF-E01) format the size in the section header is 0 instead of 76 (the size of the section header).

Note that FTK imager versions before 2.9 sets the section size to 76. At the moment it is unknown in which version this behavior was changed.

Incomplete section

The incomplete section is identified in the section data type field as “incomplete”.

This section is seen rarely. It was seen in an EnCase 6.13 (EWF-E01) file as the last last section within the last segment file. The incomplete section was preceded by a hash and digest section, although later in the set of EWF files another hash and digest section were defined.

It is currently assumed that the incomplete section indicates an incomplete image created using remote imaging. The incomplete section contains data but currently there is no indication what purpose the data has.

EWF-X

EWF-X (extended) is an experimental format to enhance the EWF format. EWF-X is based on the EWF-E01 format. EWF-X does not limit the table entries to 16375. EWF-X is not the same as version 2 of EWF.

TODO: add note about the table entry limit.

Sections

Additional sections provided in the EWF-X format are:

  • xheader
  • xhash

Xheader

The xheader section contains zlib compressed data containing XML data containing the header values.

<?xml version="1.0" encoding="UTF-8"?>
<xheader>
    <case_number>1</case_number>
    <description>Description</description>
    <examiner_name>John D.</examiner_name>
    <evidence_number>1.1</evidence_number>
    <notes>Just a floppy in my system</notes>
    <acquiry_operating_system>Linux</acquiry_operating_system>
    <acquiry_date>Sat Jan 20 18:32:08 2007 CET</acquiry_date>
    <acquiry_software>ewfacquire</acquiry_software>
    <acquiry_software_version>20070120</acquiry_software_version>
</xheader>

Xhash

The xhash section contains zlib compressed data containing XML data containing the hash values.

<?xml version="1.0" encoding="UTF-8"?>
<xhash>
    <md5>ae1ce8f5ac079d3ee93f97fe3792bda3</md5>
    <sha1>31a58f090460b92220d724b28eeb2838a1df6184</sha1>
</xhash>

GUID

EWF-X uses a random based version of the GUID

Corruption scenarios

This chapter contains several corruption scenarios that have been encountered “in the wild”.

Corrupt uncompressed chunk

TODO: add description

Corrupt compressed chunk

TODO: add description

DEFLATE uncompressed block data with copy of uncompressed data size of 0

Seen in combination with some firmware versions of Tableau TD3 forensic imager.

In this corruption scenarion the copy of uncompressed data size value of the DEFLATE uncompressed block data is set to 0 instead of the 1s complement of the uncompressed data size.

Libewf currently does not handle this corruption scenario.

Corrupt section header

TODO: add description

reading section header from file IO pool entry: 1 at offset: 415912423
type                      : table2
next offset               : 415978027
size                      : 65604
checksum                  : 0xf35f03e0
number of offsets         : 16375
base offset               : 0x00000000
checksum                  : 0x180d0137

reading section header from file IO pool entry: 1 at offset: 415978027
type                      : sectors
next offset               : 415978027
size                      : 0
checksum                  : 0x1ad00464

Corrupt table section

TODO: add description

Scenarios:

  • with and with out table 2
  • corruption in number of entries
  • corruption in entry data

Corrupted segment file header

TODO: add description

Partial segment file

TODO: add description

Missing segment file(s)

TODO: add description

Dual image: section size versus offset

The section headers define both the next section offset and the size of the section. If an implementation reads only one of the two to determine the next section, a dual EWF image can be crafted that consists of two separate images including hashes.

Keramics will mark such an image as corrupted.

Table entries offset overflow

In EnCase 6.7.1 the sectors section can be larger than 2048 MiB. The table entries offsets are 31 bit values in EnCase6 the offset in a table entry value will actually use the full 32 bit if the 2048 MiB has been exceeded. This behavior is no longer present in EnCase 6.8 so it is assumed to be a bug.

Libewf currently assumes that the if the 31 bit value overflows the following chunks are uncompressed. This allows EnCase 6.7.1 faulty EWF files to be converted by Keramics.

Multiple incomplete segment file set identifiers

Although rare it can occur that a set of EWF image files changes its segment file set identifier. This was seen in an image created by EnCase 6.13, presumably using remote imaging. The image contained 3 different segment file set identifiers. The first changes after an incomplete section. The second one changed without any clear indication. The corresponding data section also changed in some extent e.g. compression method and media flags, the is physical flag being dropped. The change was consistent across multiple segment files. It is unlikely that deliberate manipulation is involved. EnCase considers the image as invalid.

Although with some tweaking of the individual segment file sets could be read. In this case the data read from the segment file sets was heavily corrupted. For now Keramics does not support reading multiple segment files sets from a single image, but this might change in the future.

AD encryption

As of version 2.8 FTK Imager supports “AD encryption”. Although the output file uses the EWF extensions the file actually is a AES-256 encrypted container. The EWF can be encrypted using a pass-phrase or a certificate.

TODO: link to format definition

References

Expert Witness Compression Format version 2 (EWF2)

TODO: add description

Mac OS sparse bundle (.sparsebundle) format

The Mac OS sparse bundle (.sparsebundle) format is one of the disk image formats supported natively by Mac OS.

The sparse bundle disk image was introduced in Mac OS X 10.5.

Overview

A sparse bundle consists of a directory (bundle) with the .sparsbundle suffix containing:

  • “Info.bckup” file
  • “Info.plist” file
  • “token” file
  • “bands” directory containing the band files

Characteristics

CharacteristicsDescription
Byte orderN/A
Date and time valuesN/A
Character stringsN/A

Info.plist and Info.bckup files

The Info.plist and its backup (Info.bckup) contain a XML plist.

This plist is also referred to as “Information Property List” and contains a single dictionary with the following key-value pairs.

IdentifierValueDescription
CFBundleInfoDictionaryVersion"6.0"The information property list format version
band-sizeThe maximum size of a band file in bytes
bundle-backingstore-version1Unknown
diskimage-bundle-type"com.apple.diskimage.sparsebundle"The bundle type
sizeThe media size in bytes
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>CFBundleInfoDictionaryVersion</key>
    <string>6.0</string>
    <key>band-size</key>
    <integer>8388608</integer>
    <key>bundle-backingstore-version</key>
    <integer>1</integer>
    <key>diskimage-bundle-type</key>
    <string>com.apple.diskimage.sparsebundle</string>
    <key>size</key>
    <integer>4194304</integer>
</dict>
</plist>

Token file

The token file is empty.

Bands directory

The bands directory contains files containing the actual data of the bands. The files are named using a hexadecimal naming scheme where “0” is the 1st band, “a” the 10th, “f” the 15th, “10” the 16th, etc.

Mac OS sparse image (.sparseimage) format

The Mac OS sparse image (.sparseimage) format is one of the disk image formats supported natively by Mac OS.

Overview

A sparse disk image consists of:

  • header data
  • bands data

Characteristics

CharacteristicsDescription
Byte orderbig-endian
Date and time valuesN/A
Character stringsN/A

The number of bytes per sector is 512.

Header data

The header data is 4096 bytes in size and consist of:

  • file header
  • band numbers array
  • trailing data, which should be filled with 0-byte values

File header

The file header is 64 bytes in size and consist of:

OffsetSizeValueDescription
04"sprs"Signature
44Unknown (format version?), seen 3
84Number of sectors per band
124Unknown, seen 1
164The media data size in sectors
20120Unknown (0-byte values)
324Unknown
36280Unknown (0-byte values)

Band numbers array

The band numbers array consists of:

  • one or more band numbers

Band number

A band number is 4 bytes in size and consist of:

OffsetSizeValueDescription
04Band number, where 0 indicates a sparse range and any other value refers to a location in the media data

Where the corresponding media offset can be calculated as following:

media_offset = (band_number - 1) * sectors_per_band * 512

The offset of band data can be calculated as following:

band_data_offset = 4096 + (array_index * sectors_per_band * 512)

For example if the first array entry contains a band number of 4, then the band data is located at offset 4096 and the corresponding media offset is: 3 * sectors_per_band * 512.

Parallels Disk Image (PDI) format

The Parallels Disk Image format used in Parallels virtualization products as one of its image formats. It is both used the store hard disk images and snapshots.

Overview

A Parallels Disk Image consists of a directory, typically named “{NAME}.hdd” containing:

  • Descriptor file (DiskDescriptor.xml) and backup (DiskDescriptor.xml.Backup)
  • {NAME}.hdd file
  • Storage data file ({NAME}.hdd.0.{GUID}.hds)
  • {NAME}.hdd.drh

Where {NAME} is an arbitrary name and {GUID} is a unique identifier.

Disk types

The Parallels Disk Image format support multiple disk types:

IdentifierDescription
ExpandingDisk that consists of a single (dynamic size) sparse storage data file
PlainDisk that consists of a single single (fixed size) raw storage data file
SplitDisk that consists of a one or more split storage data files, either expanding or plain, holding upto 2G of data

Characteristics

CharacteristicsDescription
Byte orderlittle-endian
Character stringsUTF-8 by default, the encoding is defined in the disk descriptor XML file

The number of bytes per sector is 512.

Descriptor file

The DiskDescriptor.xml and its backup (DiskDescriptor.xml.Backup) contain the “Parallels_disk_image” XML element tha consists of the following values:

IdentifierDescription
Disk_ParametersThe disk parameters
StorageDataInformation about the storage data files
SnapshotsInformation about snapshots
<?xml version='1.0' encoding='UTF-8'?>
<Parallels_disk_image Version="1.0">
    <Disk_Parameters>
        <Disk_size>134217728</Disk_size>
        <Cylinders>262144</Cylinders>
        <PhysicalSectorSize>4096</PhysicalSectorSize>
        <LogicSectorSize>512</LogicSectorSize>
        <Heads>16</Heads>
        <Sectors>32</Sectors>
        <Padding>0</Padding>
        <Encryption>
            <Engine>{00000000-0000-0000-0000-000000000000}</Engine>
            <Data></Data>
        </Encryption>
        <UID>{GUID}</UID>
        <Name>{NAME}</Name>
        <Miscellaneous>
            <CompatLevel>level2</CompatLevel>
            <Bootable>1</Bootable>
            <ChangeState>0</ChangeState>
            <SuspendState>0</SuspendState>
        </Miscellaneous>
    </Disk_Parameters>
    <StorageData>
        <Storage>
            <Start>0</Start>
            <End>134217728</End>
            <Blocksize>2048</Blocksize>
            <Image>
                <GUID>{GUID}</GUID>
                <Type>Compressed</Type>
                <File>{NAME}.hdd.0.{GUID}.hds</File>
            </Image>
            ...
        </Storage>
        ...
    </StorageData>
    <Snapshots>
        <Shot>
            <GUID>{GUID}</GUID>
            <ParentGUID>{GUID}</ParentGUID>
        </Shot>
        ...
    </Snapshots>
</Parallels_disk_image>

Disk parameters

The disk parameters are stored in the “Disk_Parameters” XML element and contains the following values.

IdentifierDescription
CylindersNumber of cylinders
Disk_sizeDisk size, in number of sectors
Encryption"Encryption" sub XML element
HeadsNumber of heads
Miscellaneous"Miscellaneous" sub XML element
NameName of the disk
LogicSectorSizeOptional logical sector size, which is 512 bytes by default
PaddingUnknown (padding)
PhysicalSectorSizeOptional physical sector size, which is 4096 bytes by default
SectorsNumber of sectors per cylinder
UIDUnknown (identifier)

Encryption

<Encryption>
    <Engine>{00000000-0000-0000-0000-000000000000}</Engine>
    <Data></Data>
    <Salt></Salt>
</Encryption>

Miscellaneous

<Miscellaneous>
    <CompatLevel>level2</CompatLevel>
    <Bootable>1</Bootable>
    <ChangeState>0</ChangeState>
    <SuspendState>0</SuspendState>
    <DupBlocksCnt>0</DupBlocksCnt>
    <CorruptBlocksCnt>0</CorruptBlocksCnt>
    <UnrefBlocksCnt>0</UnrefBlocksCnt>
    <OutOfDiskBlocksCnt>0</OutOfDiskBlocksCnt>
    <BatOverlapBlocksCnt>0</BatOverlapBlocksCnt>
    <BlocksCnt>0</BlocksCnt>
    <TruncatedBlocksCnt>0</TruncatedBlocksCnt>
    <ReferencedBlocksCnt>0</ReferencedBlocksCnt>
    <ShutdownState>0</ShutdownState>
    <GuestToolsVersion>17.1.1-51537</GuestToolsVersion>
</Miscellaneous>
CompatLevel

Seen: level0 and level2

Storage data

The “StorageData” XML element contains the following values.

IdentifierDescription
StorageOne or more "Storage" XML sub elements

Note that a split disks contains multiple “Storage” XML sub elements.

Storage

The “Storage” XML element contains the following values.

IdentifierDescription
StartStart sector number of the segment stored in the storage data file
EndEnd sector number of the segment stored in the storage data file
BlocksizeBlock size, in number of sectors
ImageOne or more "Image" sub XML elements
Image

The “Image” XML element contains the following values.

IdentifierDescription
GUIDIdentifier of snapshot (or layer)
TypeStorage data file type
FileName (or path) of the storage data file

Snapshots data

The “Snapshots” XML element contains the following values.

IdentifierDescription
ShotOne or more "Shot" sub XML elements

Shot

The “Shot” XML element contains the following values.

IdentifierDescription
GUIDIdentifier of snapshot (or layer)
ParentGUIDIdentifier of parent snapshot (or layer), which contains "{00000000-0000-0000-0000-000000000000}" if not set

Storage data file

Storage data file types

ValueDescription
"Compressed"Sparse storage data file
"Plain"Raw storage data file

Raw storage data file

The raw (or plain) storage data file contains the disk image data including free space.

Sparse storage data file

The sparse storage data file contains the actual disk image data without free space.

A sparse storage data file consists of:

  • file header
  • block allocation table (BAT)
  • data blocks

Sparse storage data file header

The sparse storage data file header is 64 bytes in size and consists of:

OffsetSizeValueDescription
016"WithoutFreeSpace" or "WithouFreSpacExt"Signature
1642Format version
204Number of heads
244Number of cylinders
284Block size (or number of tracks) in number of sectors
324Number of blocks, which is equivalent to the number of block allocation table entries
368Number of sectors
444Unknown (Creator?), seen: "\x00\x00\x00\x00", "pd17", "pd22"
484Data start sector number, which is relative to the start of the sparse storage data file
524Unknown (Flags?)
568Unknown (Features start sector?)

Block allocation table (BAT)

The block allocation table consists of 32-bit entries. An entry contains the sector number where the data block starts is set to 0 if the block is sparse or stored in the parent disk image.

For example block allocation table entry 0 corresponds to disk image offset 0. If contains a value of 0x800 the corresponding data block is stored at file offset 0x100000 (0x800 x 512).

QEMU Copy-On-Write (QCOW) image file format

The QEMU Copy-On-Write (QCOW) image file format is used by the QEMU Open Source Process Emulator to store disk images (storage media)

Overview

A QCOW image file consists of:

  • the file header
    • optional file header extensions
  • the level 1 table (cluster aligned)
  • the reference count table (cluster aligned)
  • reference count blocks
  • snapshot headers (8-byte aligned on cluster boundary)
  • clusters containing:
    • level 2 tables
    • storage media data

The storage media data is stored in clusters. Each cluster is a multitude of 512 bytes. The level 1 (L1) table contains level 1 reference of level 2 (L2) tables. The level 2 tables contain level 2 references of the storage media clusters.

There are multiple versions of the QCOW image file format. QCOW (version 1) and QCOW2 (version 2 and later) are sometimes considered even as separate image formats. Version 3 is considered as an extended version of QCOW2.

Characteristics

CharacteristicsDescription
Byte orderbig-endian in most cases, note that some values are in little-endian
Date and time valuesNumber of seconds since Jan 1, 1970 00:00:00 UTC (POSIX epoch)
Character stringsUTF-8

Note that this docuement assumes that character strings are stored in UTF-8

The number of bytes per sector is 512.

Encryption

The QCOW image format can encrypted the media data stored in the image format. Currently supported encryption methods are:

  • AES-CBC 128-bit
  • Linux Unified Key Setup (LUKS)

If no encryption is used the encryption method in the file header is set to none (0).

Note it is currently unknown if the format supports compression and encryption at the same time. It does not appear to be supported by qemu-img.

AES-CBC 128-bit

Both encryption and decryption use:

  • AES-CBC with a 128-bits key decryption of sector data

The key is direct copy of the first 16 characters of a user provided (narrow character) password. If the password is smaller than 16 characters. The remaining key data is set to 0-byte values.

Note that it is currently unclear which character sets are allowed and how characters outside the 7-bit ASCII set should be handled.

The initialization vector of the AES-CBC is using media data sector number (relative to the start of the disk) in little-endian format as the first 64 bits of the 128 bit initialization vector. The remaining initialization vector data is set to 0-byte values. The first sector number is 0 and the bytes per sector are 512.

Linux Unified Key Setup (LUKS)

TODO: complete section

File header

File header – version 1

The file header - version 1 is 48 bytes in size and consist of:

OffsetSizeValueDescription
04"QFI\xfb" or "\x51\x46\x49\xfb"The signature
441Format version
88Backing file name offset
164Backing file name size
204Modification date and time, which contains a POSIX timestamp
248Storage media size
321Number of cluster block bits
331Number of level 2 table bits
342[yellow-background]Unknown (empty values)
364Encryption method
408Level 1 table offset

The cluster block size is calculated as:

cluster_block_size = 1 << number_of_cluster_block_bits

The level 2 table size is calculated as:

level2_table_size = (1 << number_of_level2_table_bits) * 8

The level 1 table size is calculated as:

level1_table_entry_size = cluster_block_size * (1 << number_of_level2_table_bits)

level1_table_size = media_size / level1_table_entry_size
if media_size % level1_table_entry_size != 0:
    level1_table_size += 1

level1_table_size *= 8

The backing file name is set in snapshot image files and is normally stored after the file header.

File header – version 2

The file header - version 2 is 72 bytes in size and consist of:

OffsetSizeValueDescription
04"QFI\xfb" or "\x51\x46\x49\xfb"The signature
442Format version
88Backing file name offset
164Backing file name size
204Number of cluster block bits
248Storage media size
324Encryption method
364Number of level 1 table references
408Level 1 table offset
488Reference count table offset
564Reference count table clusters
604Number of snapshots
648Snapshots offset

The cluster block size is calculated as:

cluster_block_size = 1 << number_of_cluster_block_bits

The number of level 2 table bits is calculated as:

number_of_level2_table_bits = number_of_cluster_block_bits - 3

The level 2 table size is calculated as:

level_table2_size = (1 << number_of_level2_table_bits) * 8

The level 1 table size is calculated as:

level1_table_size = number_of_level1_table_references * 8

The backing file name is set in snapshot image files and is normally stored after the file header.

File header – version 3

The file header - version 3 is 104 or 112 bytes in size and consist of:

OffsetSizeValueDescription
04"QFI\xfb" or "\x51\x46\x49\xfb"The signature
443Format version
88Backing file name offset
164Backing file name size
204Number of cluster block bits
248Storage media size
324Encryption method
364Number of level 1 table references
408Level 1 table offset
488Reference count table offset
564Reference count table clusters
604Number of snapshots
648Snapshots offset
728Incompatible feature flags
808Compatible feature flags
888Auto-clear feature flags
964Reference count order
1004104 or 112File header size, which contains the size of the file header, this value does not include the size of the file header extensions
If file header size equals 112
1041Compression method
1057Unknown (padding)

The cluster block size is calculated as:

cluster_block_size = 1 << number_of_cluster_block_bits

The number of level 2 table bits is calculated as:

number_of_level2_table_bits = number_of_cluster_block_bits - 3

The level 2 table size is calculated as:

level_table2_size = (1 << number_of_level2_table_bits) * 8

The level 1 table size is calculated as:

level1_table_size = number_of_level1_table_references * 8

The backing file name is set in snapshot image files and is normally stored after the file header.

Encryption methods

ValueIdentifierDescription
0QCOW_CRYPT_NONENo encryption
1QCOW_CRYPT_AESAES-CBC 128-bits encryption
2QCOW_CRYPT_LUKSLinux Unified Key Setup (LUKS) encryption

Incompatible feature flags

ValueIdentifierDescription
0x00000001QCOW2_INCOMPAT_DIRTY
0x00000002QCOW2_INCOMPAT_CORRUPT
0x00000004QCOW2_INCOMPAT_DATA_FILE
0x00000008QCOW2_INCOMPAT_COMPRESSION
0x00000010QCOW2_INCOMPAT_EXTL2

Compatible feature flags

ValueIdentifierDescription
0x00000001QCOW2_COMPAT_LAZY_REFCOUNTS

Auto-clear feature flags

ValueIdentifierDescription
0x00000001QCOW2_AUTOCLEAR_BITMAPS
0x00000002QCOW2_AUTOCLEAR_DATA_FILE_RAW

Compression methods

ValueIdentifierDescription
0ZLIB compression

File header extensions

A file header extension consist of:

  • file header extension header
  • file header extension data

File header extension header

The file header extension header is 8 bytes in size and consist of:

OffsetSizeValueDescription
04The extension type (signature)
44The extension data size

File header extension types

ValueIdentifierDescription
0x0537be77QCOW2_EXT_MAGIC_CRYPTO_HEADERCrypto header
0x23852875QCOW2_EXT_MAGIC_BITMAPSBitmaps
0x44415441 or "DATA"QCOW2_EXT_MAGIC_DATA_FILEData-file
0x6803f857QCOW2_EXT_MAGIC_FEATURE_TABLEFeature table
0xe2792acaQCOW2_EXT_MAGIC_BACKING_FORMATBacking format

Backing format file header extension

The backing format file header extension header is of variable size and consist of:

OffsetSizeValueDescription
0...Backing format identifier, which contains an UTF-8 string without end-of-string character

Bitmaps file header extension

TODO: complete section

Crypto header file header extension

The crypto header file header extension header is 16 bytes in size and consist of:

OffsetSizeValueDescription
08The crypto data offset
88The crypto data size

Data-file file header extension

The data-file file header extension header is of variable size and consist of:

OffsetSizeValueDescription
0...Data-file file name, which contains an UTF-8 string without end-of-string character

Feature table file header extension

TODO: complete section

Level 1 table

The level 1 table contains level 2 table references.

A reference value of 0 represents unused or unallocated and is considered as sparse or stored in a corresponding backing file.

Level 2 table reference – version 1

The level 2 table reference is 8-bytes in size and consists of:

OffsetSizeValueDescription
0.063 bitsLevel 2 table offset, which contains an offset relative from the start of the file
7.71 bitQCOW_OFLAG_COMPRESSEDIs compressed flag

Level 2 table reference – version 2 or 3

The level 2 table reference is 8-bytes in size and consists of:

OffsetSizeValueDescription
0.062 bitsLevel 2 table offset, which contains an offset relative from the start of the file
7.61 bitQCOW_OFLAG_COMPRESSEDIs compressed flag
7.71 bitQCOW_OFLAG_COPIEDIs copied flag

The is copied flag indicates that the reference count of the corresponding level 2 table is exactly one.

Level 2 table

The level 2 table contains cluster block references.

The level 2 table size is calculated as:

level2_table_size = (1 << number_of_level2_table_bits) * 8

A reference value of 0 represents unused or unallocated and is considered as sparse or stored in a corresponding backing file.

Cluster block reference – version 1

The cluster block reference - version 1 is 8-bytes in size and consists of:

OffsetSizeValueDescription
0.063 bitsCluster block offset, which contains an offset relative to the start of the cluster block
7.71 bitQCOW_OFLAG_COMPRESSEDIs compressed flag

Cluster block reference – version 2 or 3

The cluster block reference - version 2 or 3 is 8-bytes in size and consists of:

OffsetSizeValueDescription
0.062 bitsCluster block offset, which contains an offset relative to the start of the cluster block
7.61 bitQCOW_OFLAG_COMPRESSEDIs compressed flag
7.71 bitQCOW_OFLAG_COPIEDIs copied flag

The is copied flag indicates that the reference count of the corresponding cluster block is exactly one.

Reference count table

The cluster data blocks are referenced counted. For every cluster data block a 16-bit reference count is stored in the reference count table.

The reference count table is stored in cluster block sizes. The file header contains the number of blocks (or reference count table clusters).

TODO: complete section

Notes

reference count cluster block offset = cluster data block offset /
reference count table offset = cluster data block /

In order to obtain the reference count of a given cluster, you split the
cluster offset into a refcount table offset and refcount block offset.

Since a refcount block is a single cluster of 2 byte entries, the lower
cluster_size - 1 bits is used as the block offset and the rest of the bits are
used as the table offset.

One optimization is that if any cluster pointed to by an L1 or L2 table entry
has a refcount exactly equal to one, the most significant bit of the L1/L2
entry is set as a "copied" flag. This indicates that no snapshots are using
this cluster and it can be immediately written to without having to make a copy
for any snapshots referencing it.

Cluster data block

To retrieve a cluster data block corresponding a certain storage media offset:

Determine the level 1 table index from the offset:

level1_table_index_bit_shift = number_of_cluster_block_bits + number_of_level2_table_bits

For version 1:

level1_table_index = (offset & 0x7fffffffffffffff) >> level1_table_index_bit_shift

For version 2 and 3:

level1_table_index = (offset & 0x3fffffffffffffff) >> level1_table_index_bit_shift

Retrieve the level 2 table offset from the level 1 table. If the level 2 table offset is 0 and the image has a backing file the cluster data block is stored in the backing file otherwise the cluster block is considered sparse.

Read the corresponding level 2 table.

Determine the level 2 table index from the offset:

level2_table_index_bit_mask = ~(0xffffffffffffffff << number_of_level2_table_bits)
level2_table_index = (offset >> number_of_cluster_block_bits) >> level2_table_index_bit_mask

Retrieve the cluster block offset from the level 2 table. If the cluster block offset is 0 and the image has a backing file the cluster data block is stored in the backing file otherwise the cluster block is considered sparse.

Uncompressed cluster data block

If the is compressed flag (QCOW_OFLAG_COMPRESSED) is not set:

cluster_block_bit_mask = ~(0xffffffffffffffff << number_of_cluster_block_bits)
cluster_block_data_offset = (offset & cluster_block_bit_mask) + cluster_block_offset

Note that in version 2 or 3 the last cluster block in the file can be smaller than the cluster block size defined by the number of cluster block bits in the file header. This does not seem to be the case for version 1.

Compressed cluster data block

If the is compressed flag (QCOW_OFLAG_COMPRESSED) is set the cluster block data is stored using the compression method defined by the file header or DEFLATE by default.

Multiple compressed cluster data blocks are stored together in cluster block sizes. The compressed cluster data blocks are sector (512 bytes) aligned.

The compressed data uses a DEFLATE (inflate) window bits value of -12

Compressed chunk data block – version 1

compressed_size_bit_shift = 63 - number_of_cluster_block_bits
compressed_block_size = (
    (cluster_block_offset & 0x7fffffffffffffff) >> compressed_size_bit_shift)
compressed_block_offset &= ~(0xffffffffffffffff << compressed_size_bit_shift)

Compressed chunk data block – version 2 or 3

compressed_size_bit_shift = 62 - (number_of_cluster_block_bits – 8)

According to “the QCOW2 Image Format” the compressed block size is calculated as following:

compressed_block_size = (
    (((cluster_block_offset & 0x3fffffffffffffff) >> compressed_size_bit_shift) + 1) * 512)

Since the compressed block size is stored in 512 byte sectors this value does not contain the exact byte size of the compressed cluster block data. It sometimes lacks the size of the last partially filled sector and one sector should be added if possible within the bounds of the cluster blocks size and the file size.

cluster_block_offset &= ~(0xffffffffffffffff << compressed_size_bit_shift)

Snapshots

As of version 1 QCOW can use the backing file name in the file header to point to a backing file (or parent image) that contains the snapshot image where the current image only contains the modifications. Version 2 adds support to store snapshot inside the image.

Snapshot header - version 2 or 3

An in-image snapshot is created by adding a snapshot header, copying the L1 table and incrementing the reference counts of all L2 tables and data clusters referenced by the L1 table.

The snapshot header is of variable size and consists of:

OffsetSizeValueDescription
08Level 1 table offset
84Level 1 size
122Identifier string size
142Name size
164Date in seconds
204Date in nano seconds
248VM clock in nano seconds
324VM state size
364Extra data size
40...Extra data
......Identifier string size
......Name

TODO: complete section

References

Universal Disk Image Format (UDIF)

The Universal Disk Image Format (UDIF) (.dmg) is one of the disk image formats supported natively by Mac OS.

Overview

Known UDIF image types are:

IdentifierDescription
UDBZbzip2 compressed UDIF
UDCOApple Data Compression (ADC) compressed UDIF
UDIFRead-write uncompressed UDIF
UDRORead-only uncompressed UDIF
UDxxUncompressed UDIF
UDZOzlib/DEFLATE compressed UDIF
ULFOLZFSE compressed UDIF
ULMOLZMA compressed UDIF

UDIF images are either uncompressed or compressed.

Uncompressed image format

An uncompressed UDIF image consist of:

  • data
  • optional file footer

Note that an uncompressed UDIF image without file footer is equivalent to a RAW storage media image (CRawDiskImage).

Compressed image format

A compressed UDIF image consist of:

  • Data fork
  • Optional resource fork
  • Optional XML plist
  • File footer the end of the image file

Characteristics

CharacteristicsDescription
Byte orderbig-endian
Date and time valuesN/A
Character stringsN/A

The number of bytes per sector is 512.

The file footer (also known as resource file or metadata) is 512 bytes in size and consists of:

OffsetSizeValueDescription
04"koly"Signature
444Format version
84512File footer size in bytes
124Image flags
168Unknown (RunningDataForkOffset)
248Data fork offset, where the offset is relative from the start of the image file
328Data fork size
408Resource fork offset, where the offset is relative from the start of the image file
488Resource fork size
564Unknown (SegmentNumber)
604Number of segments, which contains 0 if not set
6416Segment identifier, which contains an UUID
804Data checksum type
844Data checksum size, as number of bits
88128Data checksum
2168XML plist offset, where the offset is relative from the start of the image file
2248XML plist size
232120Unknown (Reserved)
3524Master checksum type
3564Master checksum size, as number of bits
360128Master checksum
4884Image type (or variant)
4928Number of sectors
5004Unknown (reserved)
5044Unknown (reserved)
5084Unknown (reserved)

Note that the XML plist size can be 0, such as in an UDIF stub (UDxx) image.

Image flags

ValueIdentifierDescription
0x00000001kUDIFFlagsFlattenedUnknown (flattened?)
0x00000004kUDIFFlagsInternetEnabledUnknown (internet enabled?)

Image types

ValueIdentifierDescription
1kUDIFDeviceImageTypeDevice image
2kUDIFPartitionImageTypeParitition image

XML plist

TODO: complete section

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>resource-fork</key>
    <dict>
        <key>blkx</key>
        <array>
            <dict>
                <key>Attributes</key>
                <string>0x0050</string>
                <key>CFName</key>
                <string>Protective Master Boot Record (MBR : 0)</string>
                <key>Data</key>
                <data>
                bWlzaAAAAAEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAA
                AAgIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAIAAAAgQfL6MwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAACgAAABQAAAAMAAAAAAAAAAAAAAAAAAAABAAAA
                AAAAIA0AAAAAAAAAH/////8AAAAAAAAAAAAAAAEAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAA=
                </data>
                <key>ID</key>
                <string>-1</string>
                <key>Name</key>
                <string>Protective Master Boot Record (MBR : 0)</string>
            </dict>
            ...
        </array>
        <key>plst</key>
        <array>
            <dict>
                <key>Attributes</key>
                <string>0x0050</string>
                <key>Data</key>
                <data>
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAQAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAA
                </data>
                <key>ID</key>
                <string>0</string>
                <key>Name</key>
                <string></string>
            </dict>
        </array>
    </dict>
</dict>
</plist>

The XML plist contains the following key-value pairs:

IdentifierDescription
resource-forkdictionary

XML plist resource-fork dictionary

The resource-fork dictionary contains the following key-value pairs:

IdentifierDescription
blkxarray of dictionaries
plstarray of dictionaries

XML plist blkx array entry

A blkx array entry contains the following key-value pairs:

IdentifierDescription
Attributesstring that contains a hexadecimal formatted integer value
CFNamestring
Datastring that contains base-64 encoded data of a block table
IDstring that contains a decimal formatted integer value
Namestring

Block table

The block table (BLKXTable) is of variable size and consists of:

  • block table header
  • block table entries

The block table header

The block table header is 204 bytes in size and consists of:

OffsetSizeValueDescription
04"mish"Signature
441Format version
88Start sector, which contains the sector number relative to the start of the media data
168Number of sectors
248Unknown (DataOffset), which seems to be always 0
324Unknown (BuffersNeeded)
364Unknown (BlockDescriptors). Does this value correspond to the number of block table entries?
4040Unknown (reserved)
4440Unknown (reserved)
4840Unknown (reserved)
5240Unknown (reserved)
5640Unknown (reserved)
6040Unknown (reserved)
644Checksum type
684Checksum size
72128Checksum
2004Number of entries

Block table entry

The block table entry (BLKXChunkEntry) is 40 bytes in size and consists of:

OffsetSizeValueDescription
04Entry type
44Unknown (comment)
88Start sector, which contains the sector number relative to the start of the start sector of the block table
168Number of sectors
248Data offset, which contains the byte offset relative to the start of the UDIF image file
328Data size, which contain the number of bytes of data stored, which is 0 for sparse data

UDIF block table entry types

ValueIdentifierDescription
0x00000000Unknown (sparse)
0x00000001Uncompressed (raw) data
0x00000002Sparse (used for Apple_Free)
0x7ffffffeComment
0x80000004ADC compressed data
0x80000005zlib compressed data
0x80000006bzip2 compressed data
0x80000007LZFSE compressed data
0x80000008LZMA compressed data
0xffffffffBlock table entries terminator

UDIF comment

TODO: complete section

UDIF data fork

TODO: complete section

UDIF resource fork

TODO: complete section

Notes

Is the maximum compressed chunk size 2048 sectors?

Comment seems to reference compressed data but has no size or number of sectors value.

Virtual Hard Disk (VHD) image format

The Virtual Hard Disk (VHD) format is used by Microsoft vitualization products as one of its image formats. It is both used the store hard disk images and snapshots.

Overview

There are multiple types of VHD images, namely:

  • Fixed-size VHD image
  • Dynamic-size (or sparse) VHD image
  • Differential (or differencing) VHD image

Fixed-size hard disk image

A fixed-size VHD image consists of:

  • data
  • file footer

Note that a fixed-size VHD image is equivalent to a raw storage media image with an additional footer.

Dynamic-size (or sparse) hard disk image

A dynamic-size (or sparse) VHD image consists of:

  • copy of file footer
  • dynamic disk header
  • block allocation table
  • data in blocks
  • file footer

Differential hard disk image

A differential (or differencing) VHD image consists of:

  • copy of file footer
  • dynamic disk header
  • block allocation table
  • data in blocks
  • file footer

Characteristics

CharacteristicsDescription
Byte orderbig-endian
Date and time valuesNumber of seconds since January 1, 2000 00:00:00 UTC
Character stringsUCS-2 big-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00"

The number of bytes per sector is 512.

Undo disk image

Virtual PC has a feature to create “Undo Disks”. This undo disk feature stores a differential hard disk image in files named something similar like:

VirtualPCUndo_<name>_0_0_hhmmssMMDDYYYY.vud

Where the date and time seems to be stored in UTC and <name> represents the name of the parent image.

The file footer is 512 bytes in size and consists of:

OffsetSizeValueDescription
08"conectix"Signature (also referred to as cookie)
84Features
1240x00010000Format version, where the upper 16-bit are the major version and the lower 16-bit the minor version
168Next offset, which contains the offset to the next (metadata) structure. The offset is relative from the start of the file. It should only be set in dynamic and differential disk images. In fixed disk images it should be set to 0xffffffffffffffff (-1)
244Modification time, which contains the number of seconds since January 1, 2000 00:00:00 UTC
284Creator application
324Creator version, where the upper 16-bit are the major version and the lower 16-bit the minor version
364Creator (host) operating system
408Disk size, which contains the size of the disk in bytes
488Data size, which contains the size of the data in bytes
564Disk geometry
604Disk type
644Checksum, which contains a one's complement of the sum of the file footer excluding the checksum itself
6816Identifier, which contains a big-endian UUID
841Saved state, which contains a flag to indicate the image is in saved state
854270Unknown (Reserved should contain 0-byte values)

Features

OffsetSizeValueDescription
0.01 bitIs temporary disk, which indicates that this disk is a candidate for deletion on shutdown
0.11 bitUnknown (Reserved, must be set to 1)
0.230 bitsUnknown (Reserved, must be set to 0)

A value of 0 represents no features are enabled.

Creator application

ValueIdentifierDescription
"d2v\x00"Disk2vhd
"qemu"Qemu
"vpc\x20"Virtual PC
"vs\x20\x20"Virtual Server
"win\x20"Windows (Disk Management)

Creator host operating system

ValueIdentifierDescription
"Mac\x20"Macintosh
"Wi2k"Windows

Disk geometry

The disk geometry is 4 bytes in size and consists of:

OffsetSizeValueDescription
02Number of cylinders
21Number of heads
31Number of sectors per track (cylinder)

Disk type

ValueIdentifierDescription
0None
1Unknown (Deprecated)
2Fixed hard disk
3Dynamic hard disk
4Differential hard disk
5Unknown (Deprecated)
6Unknown (Deprecated)

Dynamic disk header

The dynamic disk header is 1024 bytes in size and consists of:

OffsetSizeValueDescription
08"cxsparse"Signature (Cookie)
88Next offset, which contains the offset to the next (metadata) structure. The offset is relative from the start of the file. Currently this is unused and should be set to 0xffffffffffffffff (-1)
168Block allocation table offset, whic contains the offset to the block allocation table structure. The offset is relative from the start of the file
2440x00010000Format version, where the upper 16-bit are the major version and the lower 16-bit the minor version
284Number of blocks, which is equivalent to the number of block allocation table entries
324Block size. The block size must be a power-of-two multitude of the sector size and does not include the size of the sector bitmap. The default block size is 4096 x 512-byte sectors (2 MiB)
364Checksum, which contains a one's complement of the sum of the dynamic disk header excluding the checksum itself
4016Parent identifier, which contains a big-endian UUID that identifies the parent image. Only used by differential hard disk images
564Parent last modification time, which contains the number of seconds since January 1, 2000 00:00:00 UTC. Only used by differential hard disk images
6040Unknown (Reserved should contain 0-byte values)
64512Parent name, which contains an UCS-2 big-endian string. Only used by differential hard disk images
5768 x 24 = 192Array of parent locator entries. Only used by differential hard disk images
7682560Unknown (Reserved should contain 0-byte values)

The maximum number of block allocation table entries should match the maximum possible number of blocks in the disk.

Note that the parent name can also contain a full path, e.g. in .avhd files. The part segments are separated by the \ character.

Parent locator entry

The parent locator entry is 24 bytes in size and consists of:

OffsetSizeValueDescription
04Locator platform code
44Platform data space, which contains the number of 512-byte sectors needed to store the parent hard disk locator
84Locator data size
1240Unknown (Reserved should contain 0-byte values)
168Locator data offset, which contains the offset to the locator data. The offset is relative from the start of the file

Locator platform code

ValueIdentifierDescription
0None
"Mac\x20"Mac OS alias stored as a blob
"MacX"File URL with UTF-8 encoding conforming to RFC 2396
"W2ku"Absolute Windows path, which contains an UCS-2 big-endian string
"W2ru"Windows path relative to the differential image, which contains an UCS-2 big-endian string
"Wi2k"Unknown (Deprecated)
"Wi2r"Unknown (Deprecated)

Block allocation table

The block allocation table is only used in dynamic and differential disk images.

The block allocation table consists of 32-bit entries. An entry contains the sector number where the data block starts or is set to 0xffffffff (-1) if the block is sparse or stored in the parent disk image.

if block_allocation_table_entry == 0xffffffff:
    block is sparse or stored in parent
else:
    file_offset = (block_allocation_table_entry * 512 ) + sector_bitmap_size

Unused block in a dynamic disk are sparse and should be filled with zero byte values. In a differential disk the block is stored in the parent disk image.

Data blocks

Data blocks are only used in dynamic and differential disk images.

A data block consists of:

  • sector bitmap
  • sector data
size_of_bitmap (in bytes) = block_size / (512 * 8)

The size of the bitmap is rounded up to the next multitude of the sector size.

Sector bitmap

In dynamic disk images the sector bitmap indicates which sectors contain data (bit set to 1) or are sparse (bit set to 0).

In differential disk images the sector bitmap indicates which sectors are stored within the image (bit set to 1) or in the parent (bit set to 0).

The bitmap is padded to a 512-byte sector boundary.

The bitmap is stored on a per-byte basis with the MSB represents the first bit in the bitmap.

References

Virtual Hard Disk version 2 (VHDX) image format

The Virtual Hard Disk version 2 (VHDX) format is used by Microsoft vitualization products as one of its image formats. It is both used the store hard disk images and snapshots.

Overview

A VHDX image file consist of:

  • file header
  • 2x image headers
  • 2x region tables
  • log or metadata journal
  • block allocation table (BAT) region
  • metadata region
    • metadata table
    • metadata items
  • image (content) data

The elements are stored in 64 KiB (65536 bytes) aligned blocks

Characteristics

CharacteristicsDescription
Byte orderlittle-endian
Date and time valuesN/A
Character stringsUCS-2 little-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00"

The number of bytes per sector is 512 or 4096 depending on the logical sector size.

File hader

The file header of (file type identifier) is 64 KiB (65536 bytes) in size and consists of:

OffsetSizeValueDescription
08"vhdxfile"Signature
8512Creator application and version, with contains an UCS-2 little-endian string with end-of-string character
52065016Unknown (reserved)

Image header

The image header is 4 KiB (4096 bytes) in size and consists of:

OffsetSizeValueDescription
04"head"Signature
44Checksum
88Sequence number
1616File write identifier, which contains a GUID
3216Data write identifier, which contains a GUID
4816Log identifier, which contains a GUID
642Log format version
6621Format version
684Log size, which according to MS-VHDX this value must be a multitude of 1 MiB
728Log offset, which according to MS-VHDX this value must be a multitude of 1 MiB and greater than or equal to 1 MiB
8040160Unknown (reserved), which according to MS-VHDX this value must be set to 0

Checksum calculation

The CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over the 4 KiB bytes of data of the image header, where the image header checkum value is considered to be 0 during calculation.

Region table

The region table is stored in a block of 64 KiB (65536 bytes) and consists of:

  • region table header
  • 0 or more region table entries
  • Unknown (reserved)

TODO: determine if 0 entries is actually supported

Region table header

The region table header is 16 bytes in size and consists of:

OffsetSizeValueDescription
04"regi"Signature
44Checksum
84Number of table entries, which according to MS-VHDX this value must be less than or equal to 2047
1240Unknown (reserved), which according to MS-VHDX this value must be set to 0

The CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over the 64 KiB bytes of data of the region table where the image header checkum value is considered to be 0 during calculation.

Region table entry

The region table entry is 32 bytes in size and consists of:

OffsetSizeValueDescription
016Region type identifier, which contains a GUID
168Region data offset, which contains an offset relative to the start of the file. According to MS-VHDX this value must be a multitude of 1 MiB and greater than or equal to 1 MiB
244Region data size, which according to MS-VHDX this value must be a multitude of 1 MiB
284Is required flag, which contains 1 to indicate the region type needs to be supported

Region type identifiers

ValueIdentifierDescription
2dc27766-f623-4200-9d64-115e9bfd4a08Block allocation table (BAT) region
8b7ca206-4790-4b9a-b8fe-575f050f886eMetadata region

Metadata region

The metadata region contains:

  • metadata table
  • metadata items

Metadata table

The metadata table is stored in a block of 64 KiB (65536 bytes) and consists of:

  • metadata table header
  • 0 or more metadata table entries
  • Unknown (reserved)

TODO: determine if 0 entries is actually supported

Metadata table header

The metadata table header is 32 bytes in size and consists of:

OffsetSizeValueDescription
08"metadata"Signature
820Unknown (reserved), which according to MS-VHDX this value must be set to 0
102Number of table entries, which according to MS-VHDX this value must be less than or equal to 2047
12200Unknown (reserved), which according to MS-VHDX this value must be set to 0

Metadata table entry

The metdata table entry is 32 bytes in size and consists of:

OffsetSizeValueDescription
016Metadata item identifier, which contains a GUID
164Metadata item offset, which contains an offset relative to the start of the metadata region. According to MS-VHDX this value must be greater than 64 KiB
204Metadata item size
248Unknown

TODO: describe last 8 bytes

ValueIdentifierDescription
0x00000001IsUser
0x00000002IsVirtualDisk
0x00000004IsRequired

Metadata items

Metadata item identifiers

ValueIdentifierDescription
2fa54224-cd1b-4876-b211-5dbed83bf4b8Virtual disk size
8141bf1d-a96f-4709-ba47-f233a8faab5fLogical sector size
a8d35f2d-b30b-454d-abf7-d3d84834ab0cParent locator
beca12ab-b2e6-4523-93ef-c309e000c746Virtual disk identifier
caa16737-fa36-4d43-b3b6-33f0aa44e76bFile parameters
cda348c7-445d-4471-9cc9-e9885251c556Physical sector size

File parameters metadata item

The file parameters metadata item is 8 bytes in size and consists of:

OffsetSizeValueDescription
04Block size, which according to MS-VHDX this value must be a power of 2 and greater than or equal to 1 MiB and not greater than 256 MiB
4.01 bitBlocks remain allocated flag, which is used to indicate the file is a fixed-size image
4.11 bitHas parent flag, which indicates if the VHDX file contains a differential image that has a parent image
4.230 bits0Unknown (reserved), which according to MS-VHDX this value must be set to 0

Logical sector size metadata item

The logical sector size metadata item is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Logical sector size, which according to MS-VHDX this value must be either 512 or 4096

Parent locator metadata item

The parent locator metadata item is of variable size and consits of:

  • parent locator header
  • 0 or more parent locator entry
  • parent locator key and value data

TODO: determine if 0 entries is actually supported

Parent locator header

The parent locator header is 20 bytes in size and consists of:

OffsetSizeValueDescription
016Parent locator type indicator, which contains the GUID: b04aefb7-d19e-4a81-b789-25b8e9445913
1620Unknown (reserved), which according to MS-VHDX this value must be set to 0
182Number of entries (or key-value pairs)
Parent locator entry

The parent locator entry is 12 bytes in size and consists of:

OffsetSizeValueDescription
04Key data offset, which contains the offset relative from the start of the parent locator header
44Value data offset, which contains the offset relative from the start of the parent locator header
82Key data size
102Value data size
Parent locator key and value data

A parent locator key or value is stored as UCS-2 little-endian string without end-of-string character.

Known keys are:

ValueDescription
absolute_win32_pathThe value contains an absolute drive Windows path "\?\c:\file.vhdx"
parent_linkageThe value contains a string of a GUID. This GUID should correspond to the data write identifier of the parent image
parent_linkage2The value contains a string of a GUID
relative_pathThe value contains a relative Windows path "..\file.vhdx"
volume_pathThe value contains an absolute volume Windows path with "\?\Volume{%GUID%}\file.vhdx"

Physical sector size metadata item

The physical sector size metadata item is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Physical sector size, which according to MS-VHDX this value must be either 512 or 4096

Virtual disk identifier metadata item

The virtual disk identifier metadata item is 16 bytes in size and consists of:

OffsetSizeValueDescription
016Virtual disk identifier, which contains a GUID

Note that in contrast to VHD (version 1) the virtual disk identifier does not change between a differential image and its parent. The data write identifier seems to be used instead.

Virtual disk size metadata item

The virtual disk size metadata item is 8 bytes in size and consists of:

OffsetSizeValueDescription
08Virtual disk size

Block allocation table (BAT) region

The block allocation table (BAT) region contains the block allocation table. The entries of this table describe the location of either blocks containing image content data (or payload blocks) or blocks containing a sector bitmap.

The size of an individual sector bitmap block is 1 MiB which allows for 2^23 sectors to be represented by the bitmap.

Block allocation table (BAT) entries are grouped in chunks. The size of a chunk can be calculated as following:

number_of_entries_per_chunk = (2^23 * logical_sector_size) / block_size

The block allocation table (BAT) consists of:

  • one or more chunks containing:
    • number of entries per chunk x BAT entry describing image content data
    • 1 x BAT entry describing the a sector bitmap

Unused BAT entries are filled with 0-byte values.

The block allocation table (BAT) of:

  • a fixed-size image does not contain sector bitmap entries;
  • a dynamic-size image does contain sector bitmap entries, although according to MS-VHDX are not used;
  • a differential image does contain sector bitmap entries.

Block allocation table (BAT) entry

The block allocation table (BAT) entry is 64 bits in size and consists of:

OffsetSizeValueDescription
0.03 bitsBlock state
0.317 bits0Unknown (reserved), which according to MS-VHDX this value must be set to 0
2.444 bitsBlock offset, which contains the offset relative from the start of the file as a multitude of 1 MiB

Block states

Payload block states

ValueIdentifierDescription
0PAYLOAD_BLOCK_NOT_PRESENTBlock is new and therefore not (yet) stored in the file
1PAYLOAD_BLOCK_UNDEFINEDBlock is not stored in the file
2PAYLOAD_BLOCK_ZEROBlock is sparse and therefore filled with 0-byte values
3PAYLOAD_BLOCK_UNMAPPEDBlock has been unmapped
6PAYLOAD_BLOCK_FULLY_PRESENTBlock is stored in the file
7PAYLOAD_BLOCK_PARTIALLY_PRESENTBlock is stored in the parent

Sector bitmap block states

ValueIdentifierDescription
0SB_BLOCK_NOT_PRESENTBlock is new and therefore not (yet) stored in the file
6SB_BLOCK_PRESENTBlock is stored in the file

Sector bitmap

In differential disk images the sector bitmap indicates which sectors are stored within the image (bit set to 1) or in the parent (bit set to 0).

The bitmap is stored in a 1 MiB block.

The bitmap is stored on a per-byte basis with the LSB represents the first bit in the bitmap.

Log (metadata journal)

TODO: complete section

The log serves as metadata journal is of variable size and consist of contiguous circular (ring) buffer that contains log entries.

Log entry

TODO: complete section

4 KiB (4096 bytes) in size

Log entry header

TODO: complete section

Zero descriptor

TODO: complete section

Data descriptor

TODO: complete section

Data sector

TODO: complete section

References

VMware Virtual Disk (VMDK) format

The VMware Virtual Disk (VMDK) format is used by VMware virtualization products as one of its image format.

Overview

A VMDK disk image can consist of multiple files, such as:

  • descriptor file
  • extent data files
  • raw extent data file
  • VMDK sparse extent data file
  • COWD sparse extent data file

Characteristics

CharacteristicsDescription
Byte orderlittle-endian
Date and time values
Character stringsnarrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a codepage defined in the descriptor file

The number of bytes per sector is 512.

Disk types

There are multiple types of VMKD images, namely:

The 2GbMaxExtentFlat (or twoGbMaxExtentFlat) disk image, which consists of:

  • a descriptor file (<name>.vmdk)
  • raw data extent files (<name>-f###.vmdk), where ### is contains a decimal value starting with 1.

The 2GbMaxExtentSparse (or twoGbMaxExtentSparse) disk image, which consists of:

  • a descriptor file (<name>.vmdk)
  • VMDK sparse data extent files (<name>-s###.vmdk), where ### is contains a decimal value starting with 1.

The monolithicFlat disk image, which consists of:

  • a descriptor file (<name>.vmdk)
  • raw data extent file (<name>-f001.vmdk)

The monolithicSparse disk image, which consists of:

  • VMDK sparse data extent file (<name>.vmdk) also contains the descriptor file data.

The vmfs disk image, which consists of:

  • a descriptor file (<name>.vmdk)
  • raw data extent file (<name>-flat.vmdk)

The vmfsSparse differential disk image, which consists of:

  • a descriptor file (<name>.vmdk)
  • COWD sparse data extent files (<name>-delta.vmdk)

TODO: describe more disk types

A delta link is similar to a differential image where the image contains the changes (or delta) in comparison of a parent image. According to the Virtual Disk Format 5.0 specification one delta image can chain to another delta image.

TODO: Name <name>-delta.vmdk

Descriptor file

The descriptor file is a case-insensitive text based file that contains the following information:

  • optional comment and empty lines
  • header
  • extent descriptions
  • optional change tracking file
  • disk data base (DDB)

Note that the descriptor file can contains leading and trailing whitespace. Lines are separated by a line feed character (0x0a). And leading comment (starting with #) and empty lines.

The header of a descriptor file looks similar to the data below.

# Disk DescriptorFile
version=1
CID=12345678
parentCID=ffffffff
createType="twoGbMaxExtentSparse"

The header consists of the following values:

ValueDescription
"# Disk DescriptorFile"Section header (or file signature)
versionFormat version
encodingEncoding
CIDContent identifier, which contains a random 32-bit value updated the first time the content of the virtual disk is modified after the virtual disk is opened
parentCIDThe content identifier of the parent, which contains a 32-bit value identifying the parent content, where a value of 'ffffffff' (-1) represents no parent content
isNativeSnapshotTODO: add description. A value of "no" has been observed in a VMWare Player 9 descriptor file
createTypeDisk type
parentFileNameHintContains the path to the parent image, which is only present if the image is a differential image (delta link)

TODO: confirm if a content identifier of ‘fffffffe’ (-2) represents that the long content identifier should be used

Format versions

ValueDescription
1TODO: add description
2TODO: add description
3TODO: add description

Encodings

Note that it is currently unknown which encodings are supported, currently it is assumed that at least the Windows codepages are supported and that the default is UTF-8.

ValueDescription
Big5Big5 assumed to be equivalent to Windows codepage 950
GBKGBK assumed to be equivalent to Windows codepage 936, which was observed in VMWare Workstation for Windows, Chinese edition
Shift_JISShift_JIS assumed to be equivalent to Windows codepage 932, which was observed in VMWare Workstation for Windows, Japanese edition
UTF-8UTF-8
windows-949-2000Windows codepage 949, 2000 version
windows-1252Windows codepage 1252, which was observed in VMWare Player 9 descriptor file

Disk types

ValueDescription
2GbMaxExtentFlat, twoGbMaxExtentFlatThe disk is split into fixed-size extents of maximum 2 GB, which consists of raw extent data files
2GbMaxExtentSparse, twoGbMaxExtentSparseThe disk is split into sparse (dynamic-size) extents of maximum 2 GB, which consists of VMDK sparse extent data files
customTODO: add description. Descriptor file with arbitrary extents, used to mount v2i-format
fullDeviceThe disk uses a full physical disk device
monolithicFlatThe disk is a single raw extent data file
monolithicSparseThe disk is a single VMDK sparse extent data file
partitionedDeviceThe disk uses a full physical disk device, using access per partition
streamOptimizedThe disk is a single compressed VMDK sparse extent data file
vmfsThe disk is a single raw extent data file, which is similar to the "monolithicFlat"
vmfsEagerZeroedThickThe disk is a single raw extent data file
vmfsPreallocatedThe disk is a single raw extent data file
vmfsRawThe disk uses a full physical disk device
vmfsRDM, vmfsRawDeviceMapThe disk uses a full physical disk device, which is also referred to as Raw Device Map (RDM)
vmfsRDMP, vmfsPassthroughRawDeviceMapThe disk uses a full physical disk device, which is similar to the Raw Device Map (RDM), but sends SCSI commands to underlying hardware
vmfsSparseThe disk is split into COWD sparse (dynamic-size) extents
vmfsThinThe disk is split into COWD sparse (dynamic-size) extents

Extent descriptions

The extent descriptions of a descriptor file looks similar to the data below.

# Extent description
RW 4192256 SPARSE "test-s001.vmdk"
# Extent description
RW 1048576 FLAT "test-f001.vmdk" 0

The extent descriptions consists of the following values:

ValueDescription
"# Extent description"Section header
Extent descriptors

Extent descriptor

The extent descriptor consists of the following values:

ValueDescription
1stAccess mode
2ndThe number of sectors
3rdExtent type
If extent type is not ZERO
4thPath of the VMDK extent data file, relative to the location of the VMDK descriptor file
Optional
5thThe extent start sector
Seen in VMWare Player 9 in combination with a physical device extent on Windows
6th and 7th"partitionUUID" followed by a device identifier

The extent offset is specified only for flat extents and corresponds to the offset in the file or device where the extent data is located. For device-backed virtual disks (physical or raw disks) the extent offset can be non-zero. For raw extent data files the extent offset should be zero.

Extent access mode

The extent access mode consists of the following values:

ValueDescription
NOACCESSNo access
RDONLYRead only
RWRead write

Extent types

The extent type consists of the following values:

ValueDescription
FLATraw extent data file
SPARSEVMDK sparse extent data file
ZEROSparse extent that consists of 0-byte values
VMFSraw extent data file
VMFSSPARSECOWD sparse extent data file
VMFSRDMUnknown (Physical disk device that uses RDM?)
VMFSRAWUnknown (Physical disk device?)

Note that VMWare Player 9 has been observed to use “FLAT” for Windows devices

Change tracking file section

The change tracking file section was introduced in version 3 and looks similar to:

# Change Tracking File
changeTrackPath="test-flat.vmdk"

The change tracking file section consists of the following values:

ValueDescription
"# Change Tracking File"Section header
changeTrackPathUnknown (The path to the change tracking file?)

Disk database

The disk data base of a descriptor file looks similar to the data below.

# The Disk Data Base
#DDB

ddb.virtualHWVersion = "4"
ddb.geometry.cylinders = "16383"
ddb.geometry.heads = "16"
ddb.geometry.sectors = "63"
ddb.adapterType = "ide"
ddb.toolsVersion = "0"

The disk data base consists of the following values:

ValueDescription
"# The Disk Data Base"Section header
"#DDB"Currently assumed to be part of the section header
ddb.deletableUnknown (seen: "true")
ddb.virtualHWVersionThe virtual hardware version. For VMWare Player and Workstation this seems to correspond with the application version
ddb.longContentIDThe long content identifier, which contains a 128-bit base16 encoded value, without spaces
ddb.uuidUUIDm which contains a 128-bit base16 encoded value, with spaces between bytes
ddb.geometry.cylindersThe number of cylinders
ddb.geometry.headsThe number of heads
ddb.geometry.sectorsThe number of sectors
ddb.geometry.biosCylindersThe number of cylinders as reported by the BIOS
ddb.geometry.biosHeadsThe number of heads as reported by the BIOS
ddb.geometry.biosSectorsThe number of sectors as reported by the BIOS
ddb.adapterTypeDisk adapter type
ddb.toolsVersionString containing the version of the installed VMWare tools version
ddb.thinProvisionedUnknown (seen: "1")

VirtualBox has been observed to use a different case for “disk” in the section header:

# The disk Data Base

Virtual hardware version

ValueDescription
4TODO: add description
 
6TODO: add description
7TODO: add description
 
9VMWare Player/Workstation 9.0

Disk adapter types

ValueDescription
ideTODO: add description
buslogicTODO: add description
lsilogicTODO: add description
legacyESXTODO: add description

The buslogic and lsilogic values are for SCSI disks and show which virtual SCSI adapter is configured for the virtual machine. The legacyESX value is for older ESX Server virtual machines when the adapter type used in creating the virtual machine is not known.

The raw extent data file

The raw extent data file contains the actual disk data. The raw extent data file can be a file or a device.

This type of extent data file is also known as “Simple” or “Flat Extent”.

The VMDK sparse extent data file

The VMDK sparse extent data file contains the actual disk data. A VMDK sparse extent data file consists of:

  • file header
  • optional embedded descriptor file
  • optional secondary grain directory
    • optional secondary grain tables
  • (primary) grain directory
    • (primary) grain tables
  • grains
  • optional backup file header

This type of extent data file is also known as “Hosted Sparse Extent” or “Stream-Optimized Compressed Sparse Extent” when markers are used.

Note that the actual layout can vary per file, Stream-Optimized Compressed Sparse Extent have been observed to use secondary file headers.

Changes in format version 2:

  • added encrypted disk support (though this feature never seem to never have been implemented).

Changes in format version 3:

  • the size of extent files is no longer limited to 2 GiB;
  • added support for persistent changed block tracking (CBT).

Note that “CBT”, the changeTrackPath value in the descriptor file references a file that describes changed areas on the virtual disk.

File header

The file header is 512 bytes in size and consists of:

OffsetSizeValueDescription
04"KDMV"Signature
441, 2 or 3Format version
84Flags
128Maximum data number of sectors (capacity)
208Sectors per grain, which must be a power of 2 and > 8
288Embedded descriptor file start sector, which is relative from the start of the file or 0 if not set
368Embedded descriptor file size in sectors
444512The number of grains table entries
488Secondary grain directory start sector, which is relative from the start of the file or 0 if not set
568Primary grain directory start sector, which is relative from the start of the file, 0 if not set or 0xffffffffffffffff (GD_AT_END) if relative from the end of the file
648Metadata size in sectors
721Value to determine if the extent data file was cleanly closed (or dirty flag)
731'\n'Single end of line character
741' 'Non end of line character
751'\r'First double end of line character
761'\n'Second double end of line character
772Compression method
794330Unknown (Padding)

The end of line characters are used to detect corruption due to file transfers that alter line end characters.

According to Virtual Disk Format 5.0 specification the maximum data number of sectors (capacity) should be a multitude of the sectors per grain. Note that it has been observed that this is not always the case.

If the primary grain directory start sector is 0xffffffffffffffff (GD_AT_END) in a Stream-Optimized Compressed Sparse Extent there should be a secondary file header stored at offset -1024 relative from the end of the file (stream) that contains the correct grain directory start sector.

Flags

The flags consist of the following values:

ValueIdentifierDescription
0x00000001Valid new line detection test
0x00000002Use secondary grain directory. The secondary (redundant) grain directory should be used instead of the primary grain directory
As of format version 2
0x00000004Use zeroed-grain table entry. The zeroed-grain table entry overloads grain data sector number 1 to indicate the grain is sparse
Common
0x00010000Has compressed grain data
0x00020000Contains metadata, where the file contains markers to identify metadata or data blocks

Compression method

The compression method consist of the following values:

ValueIdentifierDescription
0x00000000COMPRESSION_NONENo compression
0x00000001COMPRESSION_DEFLATECompression using Deflate (RFC1951)

Markers

The markers are used in Stream-Optimized Compressed Sparse Extents. The corresponding flag must be set for markers to be present. An example of the layout of a Stream-Optimized Compressed Sparse Extent that uses markers is:

  • file header
  • embedded descriptor
  • compressed grain markers
  • grain table marker
  • grain table
  • grain directory marker
  • grain directory
  • footer marker
  • secondary file header
  • end-of-stream marker

The marker

The marker is 512 bytes in size and consists of:

OffsetSizeValueDescription
08Value
84Marker data size
If marker data size equals 0
124Marker type
164960Unknown (Padding)
If marker data size > 0
12...Compressed grain data

If the marker data size > 0 the marker is a compressed grain marker.

Marker types

ValueIdentifierDescription
0x00000000MARKER_EOSEnd-of-stream marker
0x00000001MARKER_GTGrain table (metadata) marker
0x00000002MARKER_GDGrain directory (metadata) marker
0x00000003MARKER_FOOTERFooter (metadata) marker

Compressed grain marker

The compressed grain marker indicates that compressed data follows.

OffsetSizeValueDescription
Compressed grain header
080Logical sector number
84Compressed data size
 
12...Compressed data, which contains Deflate compressed data

Note that the compressed grain data can be larger than the grain data size.

End of stream marker

The end-of-stream marker indicates the end of the virtual disk. Basically the end-of-stream marker is an empty sector block.

OffsetSizeValueDescription
080Value
840Marker data size
124MARKER_EOSMarker type
164960Unknown (Padding)

Grain table marker

The grain table marker indicates that a grain table follows the marker sector block.

OffsetSizeValueDescription
080Value
840Marker data size
124MARKER_GTMarker type
164960Unknown (Padding)
512...Grain table

Grain directory marker

The grain directory marker indicates that a grain directory follows the marker sector block.

OffsetSizeValueDescription
080Value
840Marker data size
124MARKER_GDMarker type
164960Unknown (Padding)
512...Grain directory

The footer marker indicates that a footer follows the marker sector block.

OffsetSizeValueDescription
080Value
840Marker data size
124MARKER_FOOTERMarker type
164960Unknown (Padding)
512...Footer

Grain directory

The grain directory is also referred to as level 0 metadata.

The size of the grain directory is dependent on the number of grains in the extent data file. The number of entries in the grain directory can be determined as following:

grain table size = number of grain table entries * grain size

number of grain directory entries = maximum data size / grain table size
if maximum data size % grain table size > 0:
	number of grain directory entries += 1

The grain directory consists of 32-bit grain table offsets:

OffsetSizeValueDescription
04Grain table start sector, which is relative from the start of the file or 0 if sparse or the sector is stored in the parent image

The grain directory is stored in a multitude of 512 byte sized blocks.

Note that as of VMDK sparse extent data file version 2 if the “use zeroed-grain table entry” flag is set, a start sector of 1 indicates the grain table is sparse.

Grain table

The grain table is also referred to as level 1 metadata.

The size of the grain table is of variable size. The number of entries in the grain table is stored in the file header. Note that the number of entries in the last grain table is dependent on the maximum data size and not necessarily the same as the value stored in the file header.

The grain directory consists of 32-bit grain table offsets:

OffsetSizeValueDescription
04Grain data sector number, which is relative from the start of the file or 0 if sparse or the sector is stored in the parent image

The number of entries in a grain table and should be 512, therefore the size of the grain table is 512 x 4 = 2048 bytes.

The grain table is stored in a multitude of 512 byte sized blocks.

Note that as of VMDK sparse extent data file version 2 if the “use zeroed-grain table entry” flag is set, a sector number of 1 indicates the grain table is sparse.

Grain data

In an uncompressed sparse extent data file the data is stored at the grain data sector number.

In a compressed sparse extent data file every non-sparse grain is assumed to be stored compressed.

Compressed grain data

The compressed grain data is of variable size and consists of:

OffsetSizeValueDescription
Compressed grain header
080Logical sector number
84Compressed data size
 
12...Compressed data, which contains zlib compressed data
......Unknown (Padding)

The uncompressed data size should be the grain size or less for the last grain.

The footer is only used in Stream-Optimized Compressed Sparse Extents. The footer is the same as the file header. The footer should be the last block of the disk and immediately followed by the end-of-stream marker so that they together make up the last two sectors of the disk.

The header and footer differ in that the grain directory offset value in the header is set to 0xffffffffffffffff (GD_AT_END) and in the footer to the correct value.

Changed block tracking (CBT)

TODO: complete section

The COWD sparse extent data file

The copy-on-write disk (COWD) sparse extent data file contains the actual disk data. The COW sparse extent data file consists of:

  • file header
  • grain directory
  • grain tables
  • grains

This type of extent data file is also known as ESX Server Sparse Extent.

File header

The file header is 2048 bytes in size and consists of:

OffsetSizeValueDescription
04"COWD"Signature
441Format version
840x00000003Unknown (Flags)
124Maximum data number of sectors (capacity)
164Sectors per grain
2044Grain directory start sector, which is relative from the start of the file or 0 if not set
244Number of grain directory entries
284The next free sector
In root extent data file
324The number of cylinders
364The number of heads
404The number of sectors
441016Unknown (Empty values)
In child extent data files
321024Parent file name
10564Parent generation
Common
10604Generation
106460Name
1124512Description
16364Saved generation
16408Unknown (Reserved)
16484Value to determine if the extent data file was cleanly closed (or dirty flag)
1652396Unknown (Padding)

Note that the parent file name seems not to be set in recent delta sparse extent files.

Grain directory

The grain directory is also referred to as level 0 metadata.

The size of the grain directory is dependent on the number of grains in the extent data file. The number of entries in the grain directory is stored in the file header.

The grain directory consists of 32-bit grain table offsets:

OffsetSizeValueDescription
04Grain table start sector, which is relative from the start of the file or 0 if not set

The grain directory is stored in a multitude of 512 byte sized blocks. Unused bytes are set to 0.

Grain table

The grain table is also referred to as level 1 metadata.

The size of the grain table is of variable size. The number of entries in a grain table is the fixed value of 4096.

The grain directory consists of 32-bit grain table offsets:

OffsetSizeValueDescription
04Grain sector number, which is relative from the start of the file or 0 if not set

The grain table is stored in a multitude of 512 byte sized blocks. Unused bytes are set to 0.

Change tracking file

TODO: complete section

OffsetSizeValueDescription
04"\xa2\x72\x19\xf6"Unknown (signature?)
441Unknown (version?)
84Unknown (empty values)
1240x200Unknown
168Unknown
248Unknown
324Unknown
364Unknown
404Unknown
4416Unknown (UUID?)
60...Unknown (empty values?)

Corruption scenarios

The total size specified by the number of grain table entries is lager than size specified by the maximum number of sectors. Seen in VMDK images generated by qemu-img.

Notes

The markers can be used to scan for the individual parts of the VMDK sparse extent data file if the stream has been truncated, but not that this can be very expensive process IO-wise.

References

Volume system formats

A volume (or logical drive) is a single continous accessible storage area, typically containing a file system. A volume system format is used to manage the storage of one or more volumes.

Although related, a volume is a different concept as a partition.

Formats

Apple Partition Map (APM) format

The Apple Partition Map (APM) format is used on Motorola based Macintosh computers. On Intel based Macintosh computers the GUID Partition Table (GPT) format is used.

Overview

An Apple Partition Map (APM) consists of:

  • a drive descriptor
  • partition map entry of type “Apple_partition_map”
  • zero or more partition map entries

Characteristics

CharacteristicsDescription
Byte orderbig-endian
Date and time valuesN/A
Character stringsASCII

Terminology

TermDescription
Physical blockA fixed location on the storage media defined by the storage media
Logical blockAn abstract location on the storage media defined by software

The drive descriptor

The driver descriptor identifies the device drivers installed on a storage medium. The driver descriptor can contain refer to multiple device drivers. Every device driver is stored in a separate partition.

The drive descriptor is situated in the first block of the storage medium. This block is referred to as the device driver block. The driver descriptor block is not considered part of any partition.

The drive descriptor is 512 bytes in size and consists of:

OffsetSizeValueDescription
02"\x45\x52" or "ER"Signature
22The block size of the device in bytes
44The number of blocks on the device
82Device type (Reserved)
102Device identifier (Reserved)
124Device data (Reserved)
162The number of driver descriptors
188The first device driver descriptor
26484Additional driver descriptors, where unused entries are 16-bit integer values filled with 0

The device driver descriptor

The device driver descriptor is 8 bytes in size and consists of:

OffsetSizeValueDescription
04Start block of the device driver
42Device driver number of blocks
62Operating system type, where is 1 represents "Mac OS"

The partition map

The partition map is stored after the drive descriptor. The partition map consists of multiple entries that must be stored continuously. The partition map itself is considered a partition therefore the first entry in the partition map describes the partition map itself.

The partition map entry

A partition map entry is 512 bytes in size and consists of:

OffsetSizeValueDescription
02"\x50\x4d" or "PM"Signature
220x00Unknown (Reserved)
44Total number of entries in the partition map
84Partition start sector
124Partition number of sectors
1632Partition name, which contains an ASCII string
4832Partition type, which contains an ASCII string
804Data area start sector
844Data area number of sectors
884Status flags
924Boot code start sector
964Boot code number of sectors
1004Boot code address
1044Unknown (Reserved)
1084Boot code entry point
1124Unknown (Reserved)
1164Boot code checksum
12016Processor type
136188 x 2 = 3760x00Unknown (Reserved)

Note that the partition name can be empty.

Partition types

The partition types consist of the following values:

ValueIdentifierDescription
"Apple_Boot"
"Apple_Boot_RAID"
"Apple_Bootstrap"
"Apple_Driver"
"Apple_Driver43"
"Apple_Driver43_CD"
"Apple_Driver_ATA"
"Apple_Driver_ATAPI"
"Apple_Driver_IOKit"
"Apple_Driver_OpenFirmware"
"Apple_Extra"
"Apple_Free"
"Apple_FWDriver"
"Apple_HFS"
"Apple_HFSX"
"Apple_Loader"
"Apple_MDFW"
"Apple_MFS"
"Apple_partition_map"
"Apple_Patches"
"Apple_PRODOS"
"Apple_RAID"
"Apple_Rhapsody_UFS"
"Apple_Scratch"
"Apple_Second"
"Apple_UFS"
"Apple_UNIX_SVR2"
"Apple_Void"
"Be_BFS"
"MFS"

Status flags

The partition status flags consist of the following values:

ValueIdentifierDescription
0x00000001Is valid
0x00000002Is allocated
0x00000004Is in use
0x00000008Contains boot information
0x00000010Is readable
0x00000020Is writable
0x00000040Boot code is position independent
0x00000100Contains a chain-compatible driver
0x00000200Contains a real driver
0x00000400Contains a chain driver
0x40000000Automatic mount at startup
0x80000000Is startup partition

Note that the “is in use” status flags does not appear to be used consistently.

GUID Partition Table (GPT) format

The GUID Partition Table (GPT) is a partitioning schema that is the successor to the Master Boot Record (MBR) Partition Table for Intel x86 based computers.

Overview

A GUID Partition Table (GPT) consists of:

  • A protective or hybrid Master Boot Record (MBR) stored in block (LBA) 0
  • A GPT partition table header stored in block (LBA) 1
  • GPT partition entries stored in blocks (LBA) 2 - 33
  • paritions area
    • GPT partitions
    • MBR partitions if hybrid MBR/GPT
  • backup GPT partition entries (typically stored the blocks (LBA) before the last block -33 - -2)
  • A backup GPT partition table header (typically stored in the last block (LBA) -1)

The GPT partition table header signature can be used to determine the block (LBA) (or sector) size.

Characteristics

CharacteristicsDescription
Byte orderlittle-endian
Date and time valuesN/A
Character stringsUTF-16 little-endian without byte order mark (BOM)

Master Boot Record (MBR)

Hybrid Master Boot Record (MBR)

In hybrid configuration both GPT and MBR are used concurrently. Depending on the operating system one might have precedence over the other.

Protective Master Boot Record (MBR)

The Protective Master Boot Record (MBR) is an MBR with a single partition of type “EFI GPT protective partition” (0xee) that allocated as much of the drive as possible.

GPT partition table header

The GPT partition table header is 92 bytes in size and consists of:

OffsetSizeValueDescription
08"EFI PART"Signature
820Minor format version
1021Major format version
12492Header data size, which contains the size of the GPT partition table header data
164Header data checksum
2040Unknown (Reserved)
248Partition header block number (LBA)
328Backup partition header block number (LBA)
408Partitions area start block number (LBA)
488Partitions area end block number (LBA), where the block number is included in the partitions area block range
5616Disk identifier (GUID)
728Partition entries start block number (LBA)
804Number of partition entries
844128Partition entry data size
884Partition entries data checksum
92...0Unknown (Reserved)

The partition entries start block number (LBA) of the backup GPT partition table header points to backup GPT partition entries.

Note that the number of partition entries value contains the number of available partition entries not the number of used partition entries. Empty partition entries have a unused entry partition type identifier.

Checksum calculation

The CRC-32 algorithm with polynominal 0x04c11db7 and initial value of 0 is used to calculate the checksums.

The checksum is calculated over the 92 bytes of the table header data, where the header data checkum value is considered to be 0 during calculation.

GPT partition entries

GPT Partition entry

The GPT partition entry is 128 bytes in size and consists of:

OffsetSizeValueDescription
016Partition type identifier (GUID)
1616Partition identifier (GUID)
328Partition start block number (LBA)
408Partition end block number (LBA), where the block number is included in the partition block range
488Attribute flags
5672Partition name, which contains a UTF-16 little-endian string

Partition types

ValueIdentifierDescription
00000000-0000-0000-0000-000000000000Unused entry
024dee41-33e7-11d3-9d69-0008c781f39fMBR partition scheme
c12a7328-f81f-11d2-ba4b-00a0c93ec93bEFI System
21686148-6449-6e6f-744e-656564454649BIOS boot partition
d3bfe2de-3daf-11df-ba40-e3a556d89593Intel Fast Flash (iFFS) partition (for Intel Rapid Start technology)
f4019732-066e-4e12-8273-346c5641494fSony boot partition
bfbfafe7-a34f-448a-9a5b-6213eb736c22Lenovo boot partition
Windows
e3c9e316-0b5c-4db8-817d-f92df00215aeMicrosoft reserved
ebd0a0a2-b9e5-4433-87c0-68b6b72699c7(Microsoft) Basic data
5808c8aa-7e8f-42e0-85d2-e1e90434cfb3Logical Disk Manager (LDM) metadata partition
af9b60a0-1431-4f62-bc68-3311714a69adLogical Disk Manager data partition
de94bba4-06d1-4d40-a16a-bfd50179d6acWindows recovery environment
37affc90-ef7d-4e96-91c3-2d7ae055b174IBM General Parallel File System (GPFS) partition
e75caf8f-f680-4cee-afa3-b001e56efc2dStorage Spaces partition
HP-UX
75894c1e-3aeb-11d3-b7c1-7b03a0000000Data partition
e2a1e728-32e3-11d6-a682-7b03a0000000Service Partition
Linux
0fc63daf-8483-4772-8e79-3d69d8477de4Linux filesystem data
a19d880f-05fc-4d3b-a006-743f0f84911eRAID partition
44479540-f297-41b2-9af7-d131d5f0458aRoot partition (x86)
4f68bce3-e8cd-4db1-96e7-fbcaf984b709Root partition (x86-64)
69dad710-2ce4-4e3c-b16c-21a1d49abed3Root partition (32-bit ARM)
b921b045-1df0-41c3-af44-4c6f280d3faeRoot partition (64-bit ARM/AArch64)
0657fd6d-a4ab-43c4-84e5-0933c84b4f4fSwap partition
e6d6d379-f507-44c2-a23c-238f2a3df928Logical Volume Manager (LVM) partition
933ac7e1-2eb4-4f13-b844-0e14e2aef915/home partition
3b8f8425-20e0-4f3b-907f-1a25a76f98e8/srv (server data) partition
7ffec5c9-2d00-49b7-8941-3ea10a5586b7Plain dm-crypt partition
ca7d7ccb-63ed-4c53-861c-1742536059ccLUKS partition
8da63339-0007-60c0-c436-083ac8230908Reserved
FreeBSD
83bd6b9d-7f41-11dc-be0b-001560b84f0fBoot partition
516e7cb4-6ecf-11d6-8ff8-00022d09712bData partition
516e7cb5-6ecf-11d6-8ff8-00022d09712bSwap partition
516e7cb6-6ecf-11d6-8ff8-00022d09712bUnix File System (UFS) partition
516e7cb8-6ecf-11d6-8ff8-00022d09712bVinum volume manager partition
516e7cba-6ecf-11d6-8ff8-00022d09712bZFS partition
Darwin / Mac OS
48465300-0000-11aa-aa11-00306543ecacHierarchical File System Plus (HFS+) partition
7c3457ef-0000-11aa-aa11-00306543ecacApple APFS
55465300-0000-11aa-aa11-00306543ecacApple UFS container
6a898cc3-1dd2-11b2-99a6-080020736631ZFS
52414944-0000-11aa-aa11-00306543ecacApple RAID partition
52414944-5f4f-11aa-aa11-00306543ecacApple RAID partition, offline
426f6f74-0000-11aa-aa11-00306543ecacApple Boot partition (Recovery HD)
4c616265-6c00-11aa-aa11-00306543ecacApple Label
5265636f-7665-11aa-aa11-00306543ecacApple TV Recovery partition
53746f72-6167-11aa-aa11-00306543ecacApple Core Storage (i.e. Lion FileVault) partition
b6fa30da-92d2-4a9a-96f1-871ec6486200SoftRAID_Status
2e313465-19b9-463f-8126-8a7993773801SoftRAID_Scratch
fa709c7e-65b1-4593-bfd5-e71d61de9b02SoftRAID_Volume
bbba6df5-f46f-4a89-8f59-8765b2727503SoftRAID_Cache
Solaris / illumos
6a82cb45-1dd2-11b2-99a6-080020736631Boot partition
6a85cf4d-1dd2-11b2-99a6-080020736631Root partition
6a87c46f-1dd2-11b2-99a6-080020736631Swap partition
6a8b642b-1dd2-11b2-99a6-080020736631Backup partition
6a898cc3-1dd2-11b2-99a6-080020736631/usr partition
6a8ef2e9-1dd2-11b2-99a6-080020736631/var partition
6a90ba39-1dd2-11b2-99a6-080020736631/home partition
6a9283a5-1dd2-11b2-99a6-080020736631Alternate sector
6a8d2ac7-1dd2-11b2-99a6-080020736631Reserved partition
6a945a3b-1dd2-11b2-99a6-080020736631Reserved partition
6a96237f-1dd2-11b2-99a6-080020736631Reserved partition
6a9630d1-1dd2-11b2-99a6-080020736631Reserved partition
6a980767-1dd2-11b2-99a6-080020736631Reserved partition
NetBSD
49f48d32-b10e-11dc-b99b-0019d1879648Swap partition
49f48d5a-b10e-11dc-b99b-0019d1879648FFS partition
49f48d82-b10e-11dc-b99b-0019d1879648LFS partition
49f48daa-b10e-11dc-b99b-0019d1879648RAID partition
2db519c4-b10f-11dc-b99b-0019d1879648Concatenated partition
2db519ec-b10f-11dc-b99b-0019d1879648Encrypted partition
Chrome OS
fe3a2a5d-4f32-41a7-b725-accc3285a309Chrome OS kernel
3cb8e202-3b7e-47dd-8a3c-7ff2a13cfcecChrome OS rootfs
2e0a753d-9e48-43b0-8337-b15192cb1b5eChrome OS future use
Container Linux by CoreOS
5dfbf5f4-2848-4bac-aa5e-0d9a20b745a6/usr partition (coreos-usr)
3884dd41-8582-4404-b9a8-e9b84f2df50eResizable rootfs (coreos-resize)
c95dc21a-df0e-4340-8d7b-26cbfa9a03e0OEM customizations (coreos-reserved)
be9067b9-ea49-4f15-b4f6-f36f8c9e1818Root filesystem on RAID (coreos-root-raid)
Haiku
42465331-3ba3-10f1-802a-4861696b7521Haiku BFS
MidnightBSD
85d5e45e-237c-11e1-b4b3-e89a8f7fc3a7Boot partition
85d5e45a-237c-11e1-b4b3-e89a8f7fc3a7Data partition
85d5e45b-237c-11e1-b4b3-e89a8f7fc3a7Swap partition
0394ef8b-237e-11e1-b4b3-e89a8f7fc3a7Unix File System (UFS) partition
85d5e45c-237c-11e1-b4b3-e89a8f7fc3a7Vinum volume manager partition
85d5e45d-237c-11e1-b4b3-e89a8f7fc3a7ZFS partition
Ceph
45b0969e-9b03-4f30-b4c6-b4b80ceff106Journal
45b0969e-9b03-4f30-b4c6-5ec00ceff106dm-crypt journal
4fbd7e29-9d25-41b8-afd0-062c0ceff05dOSD
4fbd7e29-9d25-41b8-afd0-5ec00ceff05ddm-crypt OSD
89c57f98-2fe5-4dc0-89c1-f3ad0ceff2beDisk in creation
89c57f98-2fe5-4dc0-89c1-5ec00ceff2bedm-crypt disk in creation
cafecafe-9b03-4f30-b4c6-b4b80ceff106Block
30cd0809-c2b2-499c-8879-2d6b78529876Block DB
5ce17fce-4087-4169-b7ff-056cc58473f9Block write-ahead log
fb3aabf9-d25f-47cc-bf5e-721d1816496bLockbox for dm-crypt keys
4fbd7e29-8ae0-4982-bf9d-5a8d867af560Multipath OSD
45b0969e-8ae0-4982-bf9d-5a8d867af560Multipath journal
cafecafe-8ae0-4982-bf9d-5a8d867af560Multipath block
7f4a666a-16f3-47a2-8445-152ef4d03f6cMultipath block
ec6d6385-e346-45dc-be91-da2a7c8b3261Multipath block DB
01b41e1b-002a-453c-9f17-88793989ff8fMultipath block write-ahead log
cafecafe-9b03-4f30-b4c6-5ec00ceff106dm-crypt block
93b0052d-02d9-4d8a-a43b-33a3ee4dfbc3dm-crypt block DB
306e8683-4fe2-4330-b7c0-00a917c16966dm-crypt block write-ahead log
45b0969e-9b03-4f30-b4c6-35865ceff106dm-crypt LUKS journal
cafecafe-9b03-4f30-b4c6-35865ceff106dm-crypt LUKS block
166418da-c469-4022-adf4-b30afd37f176dm-crypt LUKS block DB
86a32090-3647-40b9-bbbd-38d8c573aa86dm-crypt LUKS block write-ahead log
4fbd7e29-9d25-41b8-afd0-35865ceff05ddm-crypt LUKS OSD
OpenBSD
824cc7a0-36a8-11e3-890a-952519ad3f61Data partition
QNX
cef5a9ad-73bc-4601-89f3-cdeeeee321a1Power-safe (QNX6) file system
Plan 9
c91818f9-8025-47af-89d2-f030d7000c2cPlan 9 partition
VMware ESX
9d275380-40ad-11db-bf97-000c2911d1b8vmkcore (coredump partition)
aa31e02a-400f-11db-9590-000c2911d1b8VMFS filesystem partition
9198effc-31c0-11db-8f78-000c2911d1b8VMware Reserved
Android-IA
2568845d-2332-4675-bc39-8fa5a4748d15Bootloader
114eaffe-1552-4022-b26e-9b053604cf84Bootloader2
49a4d17f-93a3-45c1-a0de-f50b2ebe2599Boot
4177c722-9e92-4aab-8644-43502bfd5506Recovery
ef32a33b-a409-486c-9141-9ffb711f6266Misc
20ac26be-20b7-11e3-84c5-6cfdb94711e9Metadata
38f428e6-d326-425d-9140-6e0ea133647cSystem
a893ef21-e428-470a-9e55-0668fd91a2d9Cache
dc76dda9-5ac1-491c-af42-a82591580c0dData
ebc597d0-2053-4b15-8b64-e0aac75f4db1Persistent
c5a0aeec-13ea-11e5-a1b1-001e67ca0c3cVendor
bd59408b-4514-490d-bf12-9878d963f378Config
8f68cc74-c5e5-48da-be91-a0c8c15e9c80Factory
9fdaa6ef-4b3f-40d2-ba8d-bff16bfb887bFactory (alt)
767941d0-2085-11e3-ad3b-6cfdb94711e9Fastboot / Tertiary
ac6d7924-eb71-4df8-b48d-e267b27148ffOEM
Android 6.0+ ARM
19a710a2-b3ca-11e4-b026-10604b889dcfAndroid Meta
193d1ea4-b3ca-11e4-b075-10604b889dcfAndroid EXT
Open Network Install Environment (ONIE)
7412f7d5-a156-4b13-81dc-867174929325Boot
d4e6e2cd-4469-46f3-b5cb-1bff57afc149Config
PowerPC
9e1a2d38-c612-4316-aa26-8b49521e5a8bPReP boot
freedesktop.org OSes (Linux, etc.)
bc13c2ff-59e6-4262-a352-b275fd6f7172Shared boot loader configuration
Atari TOS
734e5afe-f61a-11e6-bc64-92361f002671Basic data partition (GEM, BGM, F32)

Partition attribute flags

OffsetSizeValueDescription
0.01 bitPartition is required by the platform, e.g. an OEM partition
0.11 bitEFI firmware should ignore the content of the partition
0.21 bitPartition contains bootable legacy BIOS, equivalent to MBR active flag
0.345 bitsUnknown (Reserved)
6.016 bitsFlags specific to the partition type

Microsoft basic partition type attribute flags

OffsetSizeValueDescription
7.41 bitPartition is read-only
7.51 bitPartition is a shadow copy (of another partition)
7.61 bitPartition is hidden
7.71 bitPartition should not have a drive letter assigned (no auto-mount)

ChromeOS partition type attribute flags

OffsetSizeValueDescription
6.04 bitsPriority, where 15 is thehighest priority, 1 is the lowest and 0 indicates the partition is not bootable
6.44 bitsNumber of tries to attempt to boot from the partition
7.01 bitPartition was previously successfully booted from

Master Boot Record (MBR) partition table format

The Master Boot Record (MBR) partition table is mainly used on the family of Intel x86 based computers.

Overview

A MBR partition table consists of:

  • Master Boot Record (MBR)
  • Extended Partition Records (EPRs)

Characteristics

CharacteristicsDescription
Byte orderlittle-endian
Date and time valuesN/A
Character stringsN/A

Terminology

TermDescription
Physical blockA fixed location on the storage media defined by the storage media
Logical blockAn abstract location on the storage media defined by software

Sector size(s)

Traditionally the size of sector is 512 bytes, but modern hard disk drives use 4096 bytes. The linux fdisk utility supports sector sizes of: 512, 1024, 2048 and 4096.

The location of of the “boot signature” of the MBR does not indicate the sector size. Methods to derive the sector size from the data:

  • check the “boot signature” of the first EPR, if present
  • check the content of well known partition types

Cylinder Head Sector (CHS) address

The Cylinder Head Sector (CHS) address is 24 bits in size and consists of:

OffsetSizeValueDescription
0.0 8 bitsHead
1.0 6 bitsSector
1.510 bitsCylinder

The logical block address (LBA) can be determined from the CHS with the following calculation:

lba = (((cylinder * heads_per_cylinder) + head) * sectors_per_track) + sector - 1

The Master Boot Record (MBR)

The Master Boot Record (MBR) is a data structure that describes the properties of the storage medium and its partitions.

The classical MBR can only contain 4 partition table entries. Additional partition entries must be stored using extended partition records (EPR). The classical MBR has evolved into different variants like:

  • The modern MBR
  • The Advanced Active Partitions (AAP) MBR
  • The NEWLDR MBR
  • The AST/NEC MS-DOS and SpeedStor MBR
  • The Disk Manager MBR

The classical MBR

The classical MBR is 512 bytes in size and consists of:

OffsetSizeValueDescription
0446The boot (loader) code
44616Partition table entry 1
46216Partition table entry 2
47816Partition table entry 3
49416Partition table entry 4
5102"\x55\xaa"The (boot) signature

The modern MBR

The modern MBR is 512 bytes in size and consists of:

OffsetSizeValueDescription
0218The first part of the boot (loader) code
Disk timestamp used by Microsoft Windows 95, 98 and ME
21820x0000Unknown (Reserved)
2201Unknown (Original physical drive), which contains a value that ranges from 0x80 to 0xff, where 0x80 is the first drive, 0x81 the second, etc.
2211Seconds, which contains a value that ranges from 0 to 59
2221Minutes, which contains a value that ranges from 0 to 59
2231Hours, which contains a value that ranges from 0 to 23
Without disk identity
224222The second part of the boot (loader) code
With disk identity, used by UEFI, Microsoft Windows NT or later
224216The second part of the boot (loader) code
4404Disk identity (signature)
44420x0000 or 0x5a5acopy-protection marker
Common
44616Partition table entry 1
46216Partition table entry 2
47816Partition table entry 3
49416Partition table entry 4
5102"\x55\xaa"The (boot) signature

The extended partition record

The extended partition record (EPR) (also referred to as extended boot record (EBR)) starts with a 64 byte (extended) partition record (EPR) like the MBR. This partition table contains information about the logical partition (volume) and additional extended partition tables.

OffsetSizeValueDescription
04460x00Unknown (Unused), which should contain zero bytes
44616Partition table entry 1
46216Partition table entry 2, which should contain an extended partition
478160x00Partition table entry 3, which should be unused and contain zero bytes
494160x00Partition table entry 4, which should be unused and contain zero bytes
5102"\x55\xaa"Signature

The second partition entry contains an extended partition which points to the next EPR. The LBA addresses in the EPR are relative to the start of the first EPR.

The first EPR typically has a partition type of 0x05 but certain version of Windows are known to use a partition type 0x0f, such as Windows 98.

The partition table entry

The partition table entry is 16 bytes in size and consists of:

OffsetSizeValueDescription
01Partition flags
13The partition start address, which contains a CHS relative from the start of the harddisk
41Partition type
53The partition end address, which contains a CHS relative from the start of the harddisk
84The partition start address, which contains a LBA (sectors) relative from the start of the harddisk
124Size of the partition in number of sectors

Partition flags

The partition flags consist of the following values:

ValueIdentifierDescription
0x80Partition is boot-able

Partition types

The partition types consist of the following values:

ValueIdentifierDescription
0x00Empty
0x01FAT12 (CHS)
0x02XENIX root
0x02XENIX user
0x04FAT16 (16 MiB -32 MiB CHS)
0x05Extended (CHS)
0x06FAT16 (32 MiB - 2 GiB CHS)
0x07HPFS/NTFS
0x08AIX
0x09AIX bootable
0x0aOS/2 Boot Manager
0x0bFAT32 (CHS)
0x0cFAT32 (LBA)
0x0eFAT16 (32 MiB - 2 GiB LBA)
0x0fExtended (LBA)
0x10OPUS
0x11Hidden FAT12 (CHS)
0x12Compaq diagnostics
0x14Hidden FAT16 (16 MiB - 32 MiB CHS)
0x16Hidden FAT16 (32 MiB - 2 GiB CHS)
0x17Hidden HPFS/NTFS
0x18AST SmartSleep
0x1bHidden FAT32 (CHS)
0x1cHidden FAT32 (LBA)
0x1eHidden FAT16 (32 MiB - 2 GiB LBA)
0x24NEC DOS
0x27Unknown (PackardBell recovery/installation partition)
0x39Plan 9
0x3cPartitionMagic recovery
0x40Venix 80286
0x41PPC PReP Boot
0x42SFS or LDM: Microsoft MBR (Dynamic Disk)
0x4dQNX4.x
0x4eQNX4.x 2nd part
0x4fQNX4.x 3rd part
0x50OnTrack DM
0x51OnTrack DM6 Aux1
0x52CP/M
0x53OnTrack DM6 Aux3
0x54OnTrackDM6
0x55EZ-Drive
0x56Golden Bow
0x5cPriam Edisk
0x61SpeedStor
0x63GNU HURD or SysV
0x64Novell Netware 286
0x65Novell Netware 386
0x70DiskSecure Multi-Boot
0x75PC/IX
0x78XOSL
0x80Old Minix
0x81Minix / old Linux
0x82Solaris x86 or Linux swap
0x83Linux
0x84Hibernation or OS/2 hidden C: drive
0x85Linux extended
0x86NTFS volume set
0x87NTFS volume set
0x8eLinux LVM
0x93Amoeba
0x94Amoeba BBT
0x9fBSD/OS
0xa0IBM Thinkpad hibernation
0xa1Hibernation
0xa5FreeBSD
0xa6OpenBSD
0xa7NeXTSTEP
0xa8Mac OS X
0xa9NetBSD
0xabMac OS X Boot
0xafMac OS X
0xb7BSDI
0xb8BSDI swap
0xbbBoot Wizard hidden
0xc1DRDOS/sec (FAT-12)
0xc4DRDOS/sec (FAT-16 < 32M)
0xc6DRDOS/sec (FAT-16)
0xc7Syrinx
0xdaNon-FS data
0xdbCP/M / CTOS / ...
0xdeDell Utility
0xdfBootIt
0xe1DOS access
0xe3DOS R/O
0xe4SpeedStor
0xebBeOS
0xeeEFI GPT protective partition
0xefEFI system partition (FAT)
0xf0Linux/PA-RISC boot
0xf1SpeedStor
0xf2DOS secondary
0xf4SpeedStor
0xfbVMWare file system
0xfcVMWare swap
0xfdLinux RAID auto-detect
0xfeLANstep
0xffBBT

File system formats

A file system format is used to manage the storage of files.

Terminology

  • File entry (file system entry): an object that represent an element within the file system, such as a file or directory. A file system typically stores metadata of a file entry, such as the name, size, permissions, date and time values, and location of the content.
  • Data fork (or data stream): a file system object that represents the content of a file entry. NTFS and HFS support multiple data forks (or data streams) for an individual file entry.
  • Extended attribute: A file system object that represents additional (or extended) metadata of an individual file entry.
  • Reparse point: a file system object that redirects to another location or implementation (filter driver), such as Windows Overlay Filter (WOF) compression. NTFS and ReFS support reparse points.

Formats

Apple File System (APFS)

TODO: add description

Apple File System Compression (decmpfs)

Hierarchical File System (HFS) and Apple File System (APFS) use Apple File System Compression (decmpfs) to compress file contents.

Overview

An Apple File System Compression (decmpfs) compressed file consists of:

  • an extended attribute named “com.apple.decmpfs”

Characteristics

CharacteristicsDescription
Byte orderlittle-endian

decmpfs extended attribute

The decmpfs extended attribute consists of:

  • decmpfs header
  • optional compressed data

decmpfs header

The decmpfs header is 16 bytes in size and consists of:

OffsetSizeValueDescription
04"fpmc"Signature
44Compression method
88Uncompressed data size

Note that the signature is likely stored in little-endian and represents “cmpf”.

Compression methods

ValueIdentifierDescription
1CMP_Type1Unknown (uncompressed extended attribute data)
3ZLIB (DEFLATE) compressed extended attribute data, where the compressed data is stored in the extended attribute after the compressed data header
464k chunked ZLIB (DEFLATE) compressed resource fork, where the compressed data is stored in the resource fork
5Unknown (sparse compressed extended attribute data), where the uncompressed data contains 0-byte values
6Unknown (unused)
7LZVN compressed extended attribute data, where the compressed data is stored in the extended attribute after the compressed data header
864k chunked LZVN compressed resource fork, where the compressed data is stored in the resource fork
9Unknown (uncompressed extended attribute data, different than CMP_Type1)
10Unknown (64k chunked uncompressed data resource fork), where the compressed data is stored in the resource fork
11LZFSE compressed extended attribute data, where the compressed data is stored in the extended attribute after the compressed data header
1264k chunked LZFSE compressed resource fork, where the compressed data is stored in the resource fork
0x80000001Unknown (faulting file)

Note that if the ZLIB (DEFLATE) compressed data starts with 0xff the data is stored uncompressed after the first compressed data byte.

Note that if the LZVN compressed data starts with 0x06 (end of stream oppcode) the data is stored uncompressed after the first compressed data byte.

Extended File System (ext) format

The Extended File System (ext) is one of the more common file system used in Linux.

There are multiple version of ext.

VersionRemarks
1Introduced in April 1992
2Introduced in January 1993
3Introduced in November 2001, which featured journaling, dynamic growth and large directory indexing (HTree)
4Introduces in October 2006 as unstable and becmae stable in October 2008, which featured extents and improved timestamps

Overview

An Extended File System (ext) consists of:

  • one or more block groups

Characteristics

CharacteristicsDescription
Byte orderlittle-endian, with the exception of UUID values that are stored in big-endian
Date and time valuesnumber of seconds since January 1, 1970 00:00:00 (POSIX epoch), disregarding leap seconds. Or number of nanoseconds, when extra precision is enabled. Date and time values are stored in UTC
Character stringsUTF-8 or a narrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a system defined codepage

Block group

A block group consists of:

  • optional 1024 bytes of boot code or zero bytes (at offset: 0)
  • optional superblock
  • optional group descriptor table
  • block bitmap
  • inode bitmap
  • allocated and unallocated blocks

The primary superblock is stored at offset 1024 relative from the start of the volume. Backup superblocks are stored at offset 1024 relative from the start of the block group if block size <= 1024 or otherwise at offset 0 from the start of the block group.

The group descriptor table is stored in the block after the superblock.

An ext2 file system with revision 0 stores a copy at the start of every block group, along with backups of the group descriptor table. Later revisions reduce the number of backup copies by only putting backups in specific groups (sparse superblock feature EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER).

Not all values in a backup superblock and backup group descriptor tables match those of the primary superblock and group descriptor table.

Note that backup superblocks can be empty (filled with 0-byte values) or contain remnant data on an Android ext file system with sparse_super.

Flex block groups

Flex (or flexible) block groups are a set of block groups that treated as a single logical block group. Metadata such as the superblock, group descriptors, data block bitmaps spans the entire logical block group and not the individual block groups part of the set.

Meta block groups

Meta block groups (META_BG) are a set (or cluster) of block groups, for which its group descriptor structures can be stored in a single block.

The first meta block group value in the superblock indicates what the first

meta block group value is 256, and the number of group descriptors that can be stored in a single block 64, then the group descriptors for the block groups [0, 16383] are stored in the group descriptor table after the primary superblock and corresponding locations of backups.

Successive group descriptor tables, for example [16384, 16447], are stored in the first block group of a meta block group and backups in the second and last block groups of the meta block group.

Blocks

The volume is devided in blocks:

block offset = block number * block size

The block size is defined in the superblock.

Note that mke2fs indicates the maximum block size is 65536.

The superblock

The ext2 superblock

The ext2 superblock is 208 bytes in size and consists of:

OffsetSizeValueDescription
04Number of inodes
44Number of blocks
84Number of reserved blocks. Reserved blocks are used to prevent the file system from filling up
124Number of unallocated blocks
164Number of unallocated inodes
204First data block number. The block number is relative from the start of the volume
244Block size, which contains the number of bits to shift 1024 to the MSB (left)
284Fragment size, which contains the number of bits to shift 1024 to the MSB (left)
324Number of blocks per block group
364Number of fragments per block group
404Number of inodes per block group
444Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
484Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
522The (current) mount count
542Maximum mount count
562"\x53\xef"Signature
582File system state flags
602Error-handling status
622Minor format revision
644Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
684Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
724Creator operating system
764Format revision
802Reserved block owner (or user) identifier (UID)
822Reserved block group identifier (GID)
Dynamic inode information, if major version is EXT2_DYNAMIC_REV
844First non-reserved inode
882Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024
902Block group, which contains a block group number
924Compatible feature flags
964Incompatible feature flags
1004Read-only compatible feature flags
10416File system identifier, which contains a big-endian UUID
12016Volume label, which contains a narrow character string without end-of-string character
13664Last mount path, which contains a narrow character string without end-of-string character
2004Algorithm usage bitmap
Performance hints, if EXT2_COMPAT_PREALLOC is set
2041Number of pre-allocated blocks per file
2051Number of pre-allocated blocks per directory
2062Unknown (padding)

The ext3 superblock

The ext3 superblock is 336 bytes in size and consists of:

OffsetSizeValueDescription
04Number of inodes
44Number of blocks
84Number of reserved blocks. Reserved blocks are used to prevent the file system from filling up
124Number of unallocated blocks
164Number of unallocated inodes
204First data block number. The block number is relative from the start of the volume
244Block size, which contains the number of bits to shift 1024 to the MSB (left)
284Fragment size, which contains the number of bits to shift 1024 to the MSB (left)
324Number of blocks per block group
364Number of fragments per block group
404Number of inodes per block group, which can be 0 in combination with EXT3_FEATURE_INCOMPAT_JOURNAL_DEV
444Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
484Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
522The (current) mount count
542Maximum mount count
562"\x53\xef"Signature
582File system state flags
602Error-handling status
622Minor format revision
644Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
684Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
724Creator operating system
764Format revision
802Reserved block owner (or user) identifier (UID)
822Reserved block group identifier (GID)
Dynamic inode information, if major version is EXT2_DYNAMIC_REV
844First non-reserved inode
882Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024
902Block group, which contains a block group number
924Compatible feature flags
964Incompatible feature flags
1004Read-only compatible feature flags
10416File system identifier, which contains a big-endian UUID
12016Volume label, which contains a narrow character string without end-of-string character
13664Last mount path, which contains a narrow character string without end-of-string character
2004Algorithm usage bitmap
Performance hints, if EXT2_COMPAT_PREALLOC is set
2041Number of pre-allocated blocks per file
2051Number of pre-allocated blocks per directory
2062Unknown (padding)
Journalling support, if EXT3_FEATURE_COMPAT_HAS_JOURNAL is set
20816Journal identifier, which contains a big-endian UUID
2244Journal inode
2284Unknown (Journal device)
2324Unknown (Head of orphan inode list). The orphan inode list is a list of inodes to delete
2364 x 4hash-tree seed
2521Default hash version
2531Journal backup type
2542Group descriptor size
2564Default mount options
2604First meta block group (or metablock)
2644File system creation time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
26817 x 4Backup journal inodes

The ext4 superblock

The superblock is 1024 bytes in size and consists of:

OffsetSizeValueDescription
04Number of inodes
44Number of blocks, which contains the lower 32-bit of the value
84Number of reserved blocks, which contains the lower 32-bit of the value. Reserved blocks are used to prevent the file system from filling up
124Number of unallocated blocks, which contains the lower 32-bit of the value
164Number of unallocated inodes, which contains the lower 32-bit of the value
204Root group block number. The block number is relative from the start of the volume
244Block size, which contains the number of bits to shift 1024 to the most-significant-bit (MSB)
284Fragment size, which contains the number of bits to shift 1024 to the most-significant-bit (MSB)
324Number of blocks per block group
364Number of fragments per block group
404Number of inodes per block group, which can be 0 in combination with EXT4_FEATURE_INCOMPAT_JOURNAL_DEV
444Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
484Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
522The (current) mount count
542Maximum mount count
562"\x53\xef"Signature
582File system state flags
602Error-handling status
622Minor format revision
644Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
684Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
724Creator operating system
764Format revision
802Reserved block owner (or user) identifier (UID)
822Reserved block group identifier (GID)
Dynamic inode information, if major version is EXT2_DYNAMIC_REV
844First non-reserved inode
882Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024
902Block group
924Compatible feature flags
964Incompatible feature flags
1004Read-only compatible feature flags
10416File system identifier, which contains a big-endian UUID
12016Volume label, which contains a narrow character string without end-of-string character
13664Last mount path, which contains a narrow character string without end-of-string character
2004Algorithm usage bitmap
Performance hints, if EXT2_COMPAT_PREALLOC is set
2041Number of pre-allocated blocks per file
2051Number of pre-allocated blocks per directory
2062Unknown (padding)
Journalling support, if EXT3_FEATURE_COMPAT_HAS_JOURNAL is set
20816Journal identifier, which contains a big-endian UUID
2244Journal inode
2284Unknown (Journal device)
2324Unknown (Head of orphan inode list). The orphan inode list is a list of inodes to delete
2364 x 4hash-tree seed
2521Default hash version
2531Journal backup type
2542Group descriptor size
2564Default mount options
2604First meta block group (or metablock)
2644File system creation time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
26817 x 4Backup journal inodes
If 64-bit support (EXT4_FEATURE_INCOMPAT_64BIT) is enabled
3364Number of blocks, which contains the upper 32-bit of the value
3404Number of reserved blocks, which contains the upper 32-bit of the value
3444Number of unallocated blocks, which contains the upper 32-bit of the value
3482Minimum inode size
3502Reserved inode size
3524Miscellaneous flags
3562RAID stride
3582Multiple mount protection (MMP) update interval in seconds
3608Block for multi-mount protection
3684Unknown (blocks on all data disks (N*stride))
3721Number of block groups per flex block group, which is stored as: 2 ^ value
3731Checksum type
3741Unknown (encryption level)
3751Unknown (padding)
3768Unknown (s_kbytes_written)
3844Inode number of active snapshot
3884Identifier of active snapshot
3928Unknown (reserved s_snapshot_r_blocks_count)
4004Inode number of snapshot list head
4044Unknown (s_error_count)
4084Unknown (s_first_error_time)
4124Unknown (s_first_error_ino)
4168Unknown (s_first_error_block)
42432Unknown (s_first_error_func)
4564Unknown (s_first_error_line)
4604Unknown (s_last_error_time)
4644Unknown (s_last_error_ino)
4684Unknown (s_last_error_line)
4728Unknown (s_last_error_block)
48032Unknown (s_last_error_func)
51264Unknown (s_mount_opts)
5764Unknown (s_usr_quota_inum)
5804Unknown (s_grp_quota_inum)
5844Unknown (s_overhead_clusters)
5882 x 4Unknown (s_backup_bgs)
5964Unknown (s_encrypt_algos)
60016Unknown (s_encrypt_pw_salt)
6164Unknown (s_lpf_ino)
6204Unknown (s_prj_quota_inum)
6244Metadata checksum seed
6281Unknown (s_wtime_hi)
6291Unknown (s_mtime_hi)
6301Unknown (s_mkfs_time_hi)
6311Unknown (s_lastcheck_hi)
6321Unknown (s_first_error_time_hi)
6331Unknown (s_last_error_time_hi)
6341Unknown (s_first_error_errcode)
6351Unknown (s_last_error_errcode)
6362Unknown (s_encoding)
6382Unknown (s_encoding_flags)
6404Unknown (s_orphan_file_inum)
64494 x 4 = 376Unknown (reserved)
10204Checksum

If checksum type is CRC-32C, the checksum is stored as 0xffffffff - CRC-32C.

Note that some versions of mkfs.ext set the file system creation time even for ext2 and when EXT3_FEATURE_COMPAT_HAS_JOURNAL is not set.

TODO: Is the only way to determine the file system version the compatibility and equivalent flags?

Checksum calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over the 1020 bytes of data of the suberblock.

Metadata checksum seed calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over:

  • the 16 byte file system identifier in the superblock

If EXT4_FEATURE_INCOMPAT_CSUM_SEED is set the metadata checksum seed value stored in the superblock should be used instead of calculating it based on the file system identifier.

If checksum type is CRC-32C, the metadata checksum seed is stored as 0xffffffff - CRC-32C.

File system state flags

ValueIdentifierDescription
0x0001Is clean
0x0002Has errors
0x0004Recovering orphan inodes

Error-handling status

ValueIdentifierDescription
1Continue
2Remount as read-only
3Panic

Creator operating system

ValueIdentifierDescription
0Linux
1GNU Hurd
2Masix
3FreeBSD
4Lites

Format revision

ValueIdentifierDescription
0EXT2_GOOD_OLD_REVOriginal version with a fixed inode size of 128 bytes
1EXT2_DYNAMIC_REVVersion with dynamic inode size support

Compatible feature flags

ValueIdentifierDescription
0x00000001EXT2_COMPAT_PREALLOCPre-allocate directory blocks, which is intended to reduce fragmentation
0x00000002EXT2_FEATURE_COMPAT_IMAGIC_INODESHas AFS server inodes
0x00000004EXT3_FEATURE_COMPAT_HAS_JOURNALHas a journal
0x00000008EXT2_FEATURE_COMPAT_EXT_ATTRHas extended attributes
0x00000010EXT2_FEATURE_COMPAT_RESIZE_INO, EXT2_FEATURE_COMPAT_RESIZE_INODEIs resizeable, the file system has reserved GDT blocks for expansion, which also requires RO_COMPAT_SPARSE_SUPER
0x00000020EXT2_FEATURE_COMPAT_DIR_INDEXHas indexed directories
0x00000040COMPAT_LAZY_BGUnknown (Lazy block group)
0x00000080COMPAT_EXCLUDE_INODEUnknown (Exclude inode), which is not yet implemented and intended for a future file system snapshot feature
0x00000100COMPAT_EXCLUDE_BITMAPUnknown (Exclude bitmap), which is not yet implemented and intended for a future file system snapshot feature
0x00000200EXT4_FEATURE_COMPAT_SPARSE_SUPER2Has sparse superblock version 2
0x00000400EXT4_FEATURE_COMPAT_FAST_COMMITUnknown (fast commit)
0x00000800EXT4_FEATURE_COMPAT_STABLE_INODESUnknown (stable inodes)
0x00001000EXT4_FEATURE_COMPAT_ORPHAN_FILEHas orphan file

Note that EXT2_FEATURE_COMPAT_, EXT3_FEATURE_COMPAT_, EXT4_FEATURE_COMPAT_ and COMPAT_ can be used interchangeably.

Incompatible feature flags

ValueIdentifierDescription
0x00000001EXT2_FEATURE_INCOMPAT_COMPRESSIONHas compression, which is not yet implemented
0x00000002EXT2_FEATURE_INCOMPAT_FILETYPEDirectory entry has file type
0x00000004EXT3_FEATURE_INCOMPAT_RECOVERNeeds recovery
0x00000008EXT3_FEATURE_INCOMPAT_JOURNAL_DEVJournal device
0x00000010EXT2_FEATURE_INCOMPAT_META_BGHas meta (or metadata) block groups
0x00000040EXT4_FEATURE_INCOMPAT_EXTENTSHas extents
0x00000080EXT4_FEATURE_INCOMPAT_64BITHas 64-bit support, which supports more than 2^32 blocks
0x00000100EXT4_FEATURE_INCOMPAT_MMPMultiple mount protection
0x00000200EXT4_FEATURE_INCOMPAT_FLEX_BGHas flex (or flexible) block groups
0x00000400EXT4_FEATURE_INCOMPAT_EA_INODEHas large inodes, which are larger than 128 bytes
0x00001000EXT4_FEATURE_INCOMPAT_DIRDATAData in directory entry, which is not yet implemented
0x00002000EXT4_FEATURE_INCOMPAT_CSUM_SEED, EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUMInitial metadata checksum value (or seed) is stored in the superblock
0x00004000EXT4_FEATURE_INCOMPAT_LARGEDIRLarge directory >2GB or 3-level hash tree (HTree)
0x00008000EXT4_FEATURE_INCOMPAT_INLINE_DATAHas data stored in inode
0x00010000EXT4_FEATURE_INCOMPAT_ENCRYPTHas encrypted inodes
0x00020000EXT4_FEATURE_INCOMPAT_CASEFOLDHash case folding

Note that EXT2_FEATURE_INCOMPAT_, EXT3_FEATURE_INCOMPAT_, EXT4_FEATURE_INCOMPAT_ and INCOMPAT_ can be used interchangeably.

Read-only compatible feature flags

ValueIdentifierDescription
0x00000001EXT2_FEATURE_RO_COMPAT_SPARSE_SUPERHas sparse superblocks and group descriptor tables. If set a superblock is stored in block groups 0, 1 and those that are powers of 3, 5 and 7. If not set a superblock is stored in every block group
0x00000002EXT2_FEATURE_RO_COMPAT_LARGE_FILEContains large files
0x00000004EXT2_FEATURE_RO_COMPAT_BTREE_DIRIntended for hash-tree directory (or directory B-tree), which is not yet implemented
0x00000008EXT4_FEATURE_RO_COMPAT_HUGE_FILEHas huge file support
0x00000010EXT4_FEATURE_RO_COMPAT_GDT_CSUMHas group descriptors with checksums
0x00000020EXT4_FEATURE_RO_COMPAT_DIR_NLINKThe ext3 32000 subdirectory limit does not apply. A directory's number of links will be set to 1 if it is incremented past 64999
0x00000040EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZEHas large inodes. The size of an inode can be larger than 128 bytes
0x00000080EXT4_FEATURE_RO_COMPAT_HAS_SNAPSHOTHas snapshots, which is not yet implemented and intended for a future file system snapshot feature
0x00000100EXT4_FEATURE_RO_COMPAT_QUOTAQuota is handled transactionally with the journal
0x00000200EXT4_FEATURE_RO_COMPAT_BIGALLOCHas big block allocation bitmaps. Block allocation bitmaps are tracked in units of clusters (of blocks) instead of blocks
0x00000400EXT4_FEATURE_RO_COMPAT_METADATA_CSUMFile system metadata has checksums
0x00000800EXT4_FEATURE_RO_COMPAT_REPLICASupports replicas
0x00001000EXT4_FEATURE_RO_COMPAT_READONLYRead-only file system image
0x00002000EXT4_FEATURE_RO_COMPAT_PROJECTFile system tracks project quotas
0x00004000EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKSFile system has (read-only) shared blocks
0x00008000EXT4_FEATURE_RO_COMPAT_VERITYUnknown (Verity inodes may be present on the filesystem)
0x00010000EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENTOrphan file may be non-empty

EXT2_FEATURE_RO_COMPAT_, EXT3_FEATURE_RO_COMPAT_, EXT4_FEATURE_RO_COMPAT_ and RO_COMPAT_ are used interchangeably.

Note that in some ext file systems used by ChromeOS it has been observed that the upper 8-bits of the read-only compatible feature flags are set as in 0xff000003. debugfs identifies these as FEATURE_R24 - FEATURE_R31.

Checksum types

ValueIdentifierDescription
1EXT4_CRC32C_CHKSUMCRC-32C (or CRC32-C), which uses the Castagnoli polynomial (0x1edc6f41)

The group descriptor table

The group descriptor table is stored in the block following the superblock.

The group descriptor table consist of:

  • one or more group descriptors

The ext2 and ext3 group descriptor

The ext2 and ext3 group descriptor is 32 bytes in size and consists of:

OffsetSizeValueDescription
04Block bitmap block number. The block number is relative from the start of the volume
44Inode bitmap block number. The block number is relative from the start of the volume
84Inode table block number. The block number is relative from the start of the volume
122Number of unallocated blocks
142Number of unallocated inodes
162Number of directories
182Unknown (padding)
203 x 4Unknown (reserved)

Note that it has been observed that implementations that support ext4 can set a value in the padding. It is currently assumed that this value contains block group flags.

The ext4 group descriptor

The ext4 group descriptor is 68 bytes in size and consists of:

OffsetSizeValueDescription
04Block bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
44Inode bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
84Inode table block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
122Number of unallocated blocks, which contains the lower 16-bit of the value
142Number of unallocated inodes, which contains the lower 16-bit of the value
162Number of directories, which contains the lower 16-bit of the value
182Block group flags
204Exclude bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
242Block bitmap checksum, which contains the lower 16-bit of the value
262Inode bitmap checksum, which contains the lower 16-bit of the value
282Number of unused inodes, which contains the lower 16-bit of the value
302Checksum
If 64-bit support (EXT4_FEATURE_INCOMPAT_64BIT) is enabled and group descriptor size > 32
324Block bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
364Inode bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
404Inode table block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
442Number of unallocated blocks, which contains the upper 16-bit of the value
462Number of unallocated inodes, which contains the upper 16-bit of the value
482Number of directories, which contains the upper 16-bit of the value
502Number of unused inodes, which contains the upper 16-bit of the value
524Exclude bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
562Block bitmap checksum, which contains the upper 16-bit of the value
602Inode bitmap checksum, which contains the upper 16-bit of the value
644Unknown (padding)

If checksum type is CRC-32C, the checksum is stored as the lower 16-bits of 0xffffffff - CRC-32C, otherwise the checksum is stored as a CRC-16.

Checksum calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over:

  • the 16 byte file system identifier in the superblock
  • the group number as a 32-bit little-endian integer
  • the data of the group descriptor with the checksum set to 0-byte values

TODO: describe the block bitmap checksum calculation: crc32c(s_uuid+grp_num+bbitmap)

TODO: describe the inode bitmap checksum calculation: crc32c(s_uuid+grp_num+ibitmap)

Block group flags

ValueIdentifierDescription
0x0001EXT4_BG_INODE_UNINITThe inode table and bitmap are not initialized
0x0002EXT4_BG_BLOCK_UNINITThe block bitmap is not initialized
0x0004EXT4_BG_INODE_ZEROEDThe inode table is filled with 0

Direct and indirect blocks

Direct blocks are blocks that part of the data stream of a file entry.

A direct block number is 0 that is part of the data stream represents a sparse data block.

Indirect blocks are blocks that refer to blocks containing direct or indirect block numbers. There are multiple levels of indirect block:

  • indirect blocks (level 1), that refer to direct blocks
  • double indirect blocks (level 2), that refer to indirect blocks
  • triple indirect blocks (level 3), that refer to double indirect blocks

An indirect block number is 0 that is part of the data stream represents sparse data blocks.

Extents

Extents were introduced in ext4 and are controlled by EXT4_FEATURE_INCOMPAT_EXTENTS.

Extents form an extent B-Tree, where:

An extents B-tree node consists of:

  • extents header
  • extents entries
  • extents footer

Note that inodes can have an implicit last sparse extent if the the inode data size is greater than the total data size defined by the extent descriptors.

The ext4 extents header

The ext4 extents header (ext4_extent_header) is 12 bytes in size and consists of:

OffsetSizeValueDescription
02"\x0a\xf3"Signature
22Number of entries
42Maximum number of entries
62Depth, where 0 reprensents a leaf node and 1 to 5 different levels of branch nodes
84Generation, which is used by Lustre, but not by standard ext4

The ext4 extent descriptor

The ext4 extent descriptor (ext4_extent) is 12 bytes in size and consists of:

OffsetSizeValueDescription
04Logical block number
42Number of blocks
62Upper 16-bits of physical block number
84Lower 32-bits of physical block number

If number of blocks > 32768 the extent is considered “uninitialized” which is (as far as currently known) comparable to extent being sparse. The number of blocks of the sparse extent can be determined as following:

sparse_number_of_blocks = number_of_blocks - 32768

Sparse extents can exist between the extent descriptors. In such a case the logical block number will not align with the information from the previous extent descriptors.

Note that the native Linux ext implementation expects the extents to be stored in order of logical block number.

The ext4 extents index

The ext4 extent index (ext4_extent_idx) is 12 bytes in size and consists of:

OffsetSizeValueDescription
04Logical block number, which contains the first logical block number of next depth extents block
44Lower 32-bits of physical block number, which contains the block number of the next depth extents block
82Upper 16-bits of physical block number, which contains the block number of the next depth extents block
102Unknown (unused)

The ext4 extents footer (ext4_extent_tail) is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Checksum of an extents block, which contains a CRC32

The inode

The size of the inode is defined in the superblock when dynamic inode information is present.

Note that the ext4 inode format can be used on ext2 formatted file system. This was observed in combination with format revision 1 and inode size > 128 created by mkfs.ext2.

The ext2 inode

The ext2 inode is 128 bytes in size and consists of:

OffsetSizeValueDescription
02File mode, which contains file type and permissions
22Lower 16-bits of owner (or user) identifier (UID)
44Data size
84(last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
124(last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
164(last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
204Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
242Lower 16-bits of group identifier (GID)
262Number of (hard) links
284Numer of blocks
324Flags
364Unknown (reserved)
4012 x 4Array of direct block numbers. A block number is relative from the start of the volume
884Indirect block number. A block number is relative from the start of the volume
924Double indirect block number. A block number is relative from the start of the volume
964Triple indirect block number. A block number is relative from the start of the volume
1004NFS generation number
1044File ACL (or extended attributes) block number
1084Unknown (Directory ACL)
1124Fragment block address
1161Fragment block index
1171Fragment size
1182Unknown (padding)
1202Upper 16-bits of owner (or user) identifier (UID)
1222Upper 16-bits of group identifier (GID)
1244Unknown (reserved)

Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.

If the inode size is larger than 128 bytes, the additional data can be stored using an ext4 inode extension.

The ext3 inode

The ext3 inode is 132 bytes in size and consists of:

OffsetSizeValueDescription
02File mode, which contains file type and permissions
22Lower 16-bits of owner (or user) identifier (UID)
44Data size
84(last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
124(last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
164(last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
204Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
242Lower 16-bits of group identifier (GID)
262Number of (hard) links
284Numer of blocks
324Flags
364Unknown (reserved)
4012 x 4Array of direct block numbers. A block number is relative from the start of the volume
884Indirect block number. A block number is relative from the start of the volume
924Double indirect block number. A block number is relative from the start of the volume
964Triple indirect block number. A block number is relative from the start of the volume
1004NFS generation number
1044File ACL (or extended attributes) block number
1084Unknown (Directory ACL)
1124Fragment block address
1161Fragment block index
1171Fragment size
1182Unknown (padding)
1202Upper 16-bits of owner (or user) identifier (UID)
1222Upper 16-bits of group identifier (GID)
1244Unknown (reserved)
Extension (if inode size > 128)
1282Extended inode size
1302Unknown (padding)

Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.

If the inode size is larger than 128 bytes, the additional data can be stored using an ext4 inode extension.

The ext4 inode

The ext4 inode is 160 bytes in size and consists of:

OffsetSizeValueDescription
02File mode, which contains file type and permissions
22Lower 16-bits of owner (or user) identifier (UID)
44Lower 32-bits of data size
If EXT4_EA_INODE_FL is not set
84(last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
124(last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
164(last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
If EXT4_EA_INODE_FL is set
84Unknown (extended attribute value data checksum)
124Unknown (lower 32-bits of extended attribute reference count)
164Unknown (inode number that owns the extended attribute)
Common
204Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
242Lower 16-bits of group identifier (GID)
262Number of (hard) links
284Lower 32-bits of number of blocks
324Flags
If EXT4_EA_INODE_FL is not set
364Lower 32-bits of version
If EXT4_EA_INODE_FL is set
364Unknown (upper 32-bits of extended attribute reference count)
If EXT4_EXTENTS_FL and EXT4_INLINE_DATA_FL are not set
4012 x 4Array of direct block numbers. A block number is relative from the start of the volume
884Indirect block number. A block number is relative from the start of the volume
924Double indirect block number. A block number is relative from the start of the volume
964Triple indirect block number. A block number is relative from the start of the volume
If EXT4_EXTENTS_FL is set
4012Extents header
524 x 12extent descriptors or extents indexes
If EXT4_INLINE_DATA_FL is set
4060File content data
Common
1004NFS generation number
1044Lower 32-bits of file ACL (or extended attributes) block number
1084Upper 32-bits of data size
1124Fragment block address
1162Upper 16-bits of number of blocks
1182Upper 16-bits of file ACL (or extended attributes) block number
1202Upper 16-bits of owner (or user) identifier (UID)
1222Upper 16-bits of group identifier (GID)
1242Lower 16-bits of checksum
1262Unknown (reserved)
Extension (if inode size > 128)
1282Extended inode size, which can vary, values of 4, 28 and 32 have been observed
1302Upper 16-bits of checksum
1324(last) inode change (or modification) time extra precision
1364(last) content modification time extra precision
1404(last) access time extra precision
1444Creation time
1484Creation time extra precision
1524Upper 32-bits of version
1564Unknown (i_projid)

If checksum type is CRC-32C, the checksum is stored as 0xffffffff - CRC-32C.

Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.

Checksum calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated from:

  • the 16 byte file system identifier in the superblock
  • the inode number as a 32-bit little-endian integer
  • the NFS generation number in the inode as a 32-bit little-endian integer
  • the data of the inode with the lower and upper part of the checksum set to 0-byte values.

Extra precision

The ext4 extra precision is 4 bytes in size and consists of:

OffsetSizeValueDescription
0.02 bitsExtra epoch value
0.230 bitsFraction of second in nanoseconds

The 34 bits extra precision timestamp (in number of seconds) can be calculated as following:

extra_precision_timestamp = (extra_epoch_value * 0x100000000) + timestamp

Notes

It has been observed that when EXT4_EA_INODE_FL is set the (last) modification time can contain a valid timestamp.

According to The Linux Kernel documentation

For backward compatibility with older versions of this feature, the i_mtime/i_generation may store a back-reference to the inode number and i_generation of the one owning inode (in cases where the EA inode is not referenced by multiple inodes) to verify that the EA inode is the correct one being accessed.

File mode

ValueIdentifierDescription
Access other, Bitmask: 0x0007 (S_IRWXO)
0x0001S_IXOTHX-access for other
0x0002S_IWOTHW-access for other
0x0004S_IROTHR-access for other
Access group, Bitmask: 0x0038 (S_IRWXG)
0x0008S_IXGRPX-access for group
0x0010S_IWGRPW-access for group
0x0020S_IRGRPR-access for group
Access owner (or user), Bitmask: 0x01c0 (S_IRWXU)
0x0040S_IXUSRX-access for owner (or user)
0x0080S_IWUSRW-access for owner (or user)
0x0100S_IRUSRR-access for owner (or user)
Other
0x0200S_ISTXTSticky bit
0x0400S_ISGIDSet group identifer (GID) on execution
0x0800S_ISUIDSet owner (or user) identifer (UID) on execution
Type of file, Bitmask: 0xf000 (S_IFMT)
0x1000S_IFIFONamed pipe (FIFO)
0x2000S_IFCHRCharacter device
0x4000S_IFDIRDirectory
0x6000S_IFBLKBlock device
0x8000S_IFREGRegular file
0xa000S_IFLNKSymbolic link
0xc000S_IFSOCKSocket

Inode flags

ValueIdentifierDescription
0x00000001EXT2_SECRM_FL, EXT3_SECRM_FL, EXT4_SECRM_FL, EXT4_INODE_SECRMSecure deletion
0x00000002EXT2_UNRM_FL, EXT3_UNRM_FL, EXT4_UNRM_FL, EXT4_INODE_UNRMUndelete
0x00000004EXT2_COMPR_FL, EXT3_COMPR_FL, EXT4_COMPR_FL, EXT4_INODE_COMPRCompressed file, which is not yet implemented
0x00000008EXT2_SYNC_FL, EXT3_SYNC_FL, EXT4_SYNC_FL, EXT4_INODE_SYNCSynchronous updates
0x00000010EXT2_IMMUTABLE_FL, EXT3_IMMUTABLE_FL, EXT4_IMMUTABLE_FL, EXT4_INODE_IMMUTABLEImmutable file
0x00000020EXT2_APPEND_FL, EXT3_APPEND_FL, EXT4_APPEND_FL, EXT4_INODE_APPENDWrites to file may only append
0x00000040EXT2_NODUMP_FL, EXT3_NODUMP_FL, EXT4_NODUMP_FL, EXT4_INODE_NODUMPDo not remove (or dump) file
0x00000080EXT2_NOATIME_FL, EXT3_NOATIME_FL, EXT4_NOATIME_FL, EXT4_INODE_NOATIMEDo not update access time (atime)
0x00000100EXT2_DIRTY_FL, EXT3_DIRTY_FL, EXT4_DIRTY_FL, EXT4_INODE_DIRTYDirty compressed file, which is not yet implemented
0x00000200EXT2_COMPRBLK_FL, EXT3_COMPRBLK_FL, EXT4_COMPRBLK_FL, EXT4_INODE_COMPRBLKOne or more compressed clusters, which is not yet implemented
0x00000400EXT2_NOCOMP_FL, EXT3_NOCOMP_FL, EXT4_NOCOMPR_FL, EXT4_INODE_NOCOMPRDo not compress, which is not yet implemented
ext2 and ext3
0x00000800EXT2_ECOMPR_FL, EXT3_ECOMPR_FLEncrypted Compression error
ext4
0x00000800EXT4_ENCRYPT_FL, EXT4_INODE_ENCRYPTEncrypted file
Common
0x00001000EXT2_BTREE_FL, EXT2_INDEX_FL, EXT3_INDEX_FL, EXT4_INDEX_FL, EXT4_INODE_INDEXHash-indexed directory (previously referred to as B-tree format)
0x00002000EXT2_IMAGIC_FL, EXT3_IMAGIC_FL, EXT4_IMAGIC_FL, EXT4_INODE_IMAGICAFS directory
0x00004000EXT2_JOURNAL_DATA_FL, EXT3_JOURNAL_DATA_FL, EXT4_JOURNAL_DATA_FL, EXT4_INODE_JOURNAL_DATAFile data must be written using the journal
0x00008000EXT2_NOTAIL_FL, EXT3_NOTAIL_FL, EXT4_NOTAIL_FL, EXT4_INODE_NOTAILFile tail should not be merged, which is not used by ext4
0x00010000EXT2_DIRSYNC_FL, EXT3_DIRSYNC_FL, EXT4_DIRSYNC_FL, EXT4_INODE_DIRSYNCDirectory entries should be written synchronously (dirsync)
0x00020000EXT2_TOPDIR_FL, EXT3_TOPDIR_FL, EXT4_TOPDIR_FL, EXT4_INODE_TOPDIRTop of directory hierarchy
ext4
0x00040000EXT4_HUGE_FILE_FL, EXT4_INODE_HUGE_FILEIs a huge file
0x00080000EXT4_EXTENTS_FL, EXT4_INODE_EXTENTSInode uses extents
0x00100000EXT4_INODE_VERITYVerity protected inode
0x00200000EXT4_EA_INODE_FL, EXT4_INODE_EA_INODEInode used for large extended attribute
0x00400000EXT4_EOFBLOCKS_FL, EXT4_INODE_EOFBLOCKSBlocks allocated beyond EOF
0x01000000EXT4_SNAPFILE_FLInode is a snapshot
0x02000000EXT4_INODE_DAXInode is direct-access (DAX)
0x04000000EXT4_SNAPFILE_DELETED_FLSnapshot is being deleted
0x08000000EXT4_SNAPFILE_SHRUNK_FLSnapshot shrink has completed
0x10000000EXT4_INLINE_DATA_FL, EXT4_INODE_INLINE_DATAInode has inline data
0x20000000EXT4_PROJINHERIT_FL, EXT4_INODE_PROJINHERITCreate sub file entries with the same project identifier
0x40000000EXT4_INODE_CASEFOLDCasefolded directory
0x80000000EXT4_INODE_RESERVEDUnknown (reserved)

Reserved inode numbers

ValueIdentifierDescription
1EXT2_BAD_INO, EXT3_BAD_INO, EXT4_BAD_INOBad blocks inode
2EXT2_ROOT_INO, EXT3_ROOT_INO, EXT4_ROOT_INORoot inode
3EXT4_USR_QUOTA_INOOwner (or user) quota inode
4EXT4_GRP_QUOTA_INOGroup quota inode
5EXT2_BOOT_LOADER_INO, EXT3_BOOT_LOADER_INO, EXT4_BOOT_LOADER_INOBoot loader inode
6EXT2_UNDEL_DIR_INO, EXT3_UNDEL_DIR_INO, EXT4_UNDEL_DIR_INOUndelete directory inode
7EXT3_RESIZE_INO, EXT4_RESIZE_INOReserved group descriptors inode
8EXT3_JOURNAL_INO, EXT4_JOURNAL_INOJournal inode

Inline data

ext4 supports storing file entry data inline when the inode flag EXT4_INLINE_DATA_FL is set.

Note that inodes can have an implicit last sparse extent if the the inode data size is greater than 60 bytes.

Huge files

TODO: complete section

Directory entries

Directories entries are stored in the data blocks of a directory inode. The directory entries can be stored in multiple ways:

  • as linear directory entries
  • as inline data directory entries
  • as hash-tree directory entries

Linear directory entries

Linear directories entries are stored in a series of allocation blocks.

Linear directory entries contain:

  • directory entry for “.” (self)
  • directory entry for “..” (parent)
  • directory entry for other file system entries

The directory entry

The directory entry is of variable size, at most 263 bytes, and consists of:

OffsetSizeValueDescription
04Inode number
42Directory entry size, which must be a multitude of 4
61Name size, which contains the size of the name without the end-of-string character and has a maximum of 255
71File type
8...Name, which contains a narrow character string without end-of-string character

Older directory entry structures considered the name size a 16-bit value, but the upper byte was never used.

The name can contain any character value except the path segment separator (‘/’) and the NUL-character (‘\0’).

File types

ValueIdentifierDescription
0EXT2_FT_UNKNOWNUnknown
1EXT2_FT_REG_FILERegular file
2EXT2_FT_DIRDirectory
3EXT2_FT_CHRDEVCharacter device
4EXT2_FT_BLKDEVBlock device
5EXT2_FT_FIFOFIFO queue
6EXT2_FT_SOCKSocket
7EXT2_FT_SYMLINKSymbolic link

Inline data directory entries

ext4 supports storing the directory entries as inline data when the inode flag EXT4_INLINE_DATA_FL is set.

The inline data directory entries is of variable size, at most 60 bytes, and consists of:

OffsetSizeValueDescription
04Parent inode number
4...Array of directory entries

Hash tree directory entries

The data of the hash tree (HTree) is stored in the data blocs or extent defined by the directory inode. The hash-indexed directory entries are read-compatible with the linear directory entry.

Hash tree root

The hash tree root consists of:

  • dx_root
    • directory entry for “.” (self)
    • directory entry for “..” (parent)
    • dx_root_info
    • Array of dx_entry
  • directory entry for other file system entries

dx_root_info

OffsetSizeValueDescription
040Unknown (reserved)
41Hash method (or version)
518Root information size
61Number of indirect levels in the hash tree
71Unknown (unused flags)

dx_entry

TODO: complete section

struct dx_entry
{
        __le32 hash;
        __le32 block;
};

If the target path of a symbolic link is less than 60 characters long, it is stored in the 60 bytes in the inode that are normally used for the 12 direct and 3 indirect block numbers. If the target path is longer than 60 characters, a block is allocated, and the block contains the target path. The inode data size contains the length of the target path.

Extended attributes

Extended attributes can be stored:

  • in the inode block after the inode data
  • in the block referenced by the file ACL (or extended attributes) block number, if not 0

Note that both should be read to get the all the extended attributes.

Extended attributes consists of:

  • An extended attributes header
  • Extended attributes entries with a terminator

The extended attributes inode header

The extended attributes inode header (ext2_xattr_ibody_header, ext3_xattr_ibody_header, ext4_xattr_ibody_header) is 4 bytes in size and consists of:

OffsetSizeValueDescription
04"\x00\x00\x02\xea"Signature

The extended attributes block header

The ext2 and ext3 extended attributes block header

The ext2 and ext3 extended attributes block header (ext2_xattr_header, ext3_xattr_header) is 32 bytes in size and consists of:

OffsetSizeValueDescription
04"\x00\x00\x02\xea"Signature
44Unknown (reference count)
84Number of blocks
124Attributes hash
164 x 4Unknown (reserved)

The ext4 extended attributes block header

The ext4 extended attributes block header (ext4_xattr_header) is 32 bytes of size and consists of:

OffsetSizeValueDescription
04"\x00\x00\x02\xea"Signature
44Unknown (Reference count)
84Number of blocks
124Attributes hash
164Checksum
203 x 4Unknown (reserved)

The extended attributes entry

The extended attributes entry (ext2_xattr_entry, ext3_xattr_entry, ext4_xattr_entry) is of variable size and consists of:

OffsetSizeValueDescription
01Name size, which contains the size of the name without the end-of-string character
11Name index
22Value data offset, which contains the offset of the value data relative from the start of the extended attributes block or after the extended attributes signature in the inode block data
44Value data inode number, which contains the inode number that contains the value data or 0 to indicate the current block
84Value data size
124Unknown (Attribute hash)
16...Name string, which contains an ASCII string without end-of-string character and can be empty, for example in combination with a prefix or with an encrypted file
......32-bit alignment padding

The last extended attributes entry has the first 4 values set to 0 (8 bytes) and is used as a terminator.

Note that some implementations of older Android versions of ext appear to only set the first 4 bytes to 0 for the terminator.

The extended attribute name index

The name index indicates the prefix of the extended attribute name.

Name indexName prefixDescription
0""No prefix
1"user."
2"system.posix_acl_access"
3"system.posix_acl_default"
4"trusted."
6"security."
7"system."
8"system.richacl"

Journal

The journal was introduced in ext3.

TODO: complete section

Exclude bitmap

TODO: complete section

Note that the excluded bitmap is used for snapshots.

Corruption scenarios

File entry with invalid extents header signature

File content inaccessible but file entry metadata and extended attributes accessible.

References

Extensible File Allocation Table (exFAT) file system format

The Extensible File Allocation Table (exFAT) file system format is a successor of the File Allocation Table (FAT) file system format.

Overview

An exFAT file system consists of:

  • One or more reserved sectors
    • a boot record (or boot sector)
  • One or more cluster block allocation tables
  • File and directory data

Characteristics

CharacteristicsDescription
Byte orderlittle-endian
Date and time valuesFAT date and time
Character stringsUCS-2 little-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00"

Boot record

The boot record is stored in the first sector of the volume.

The boot record is at least 512 bytes in size and consists of:

OffsetSizeValueDescription
03"\xeb\x76\x90"Boot entry point (JMP +120, NOP)
38"EXFAT\x20\x20\x20"File system signature (or OEM name)
11530Unknown (reserved), which must be 0
648Partition offset
728Total number of sectors
804Cluster block allocation table start sector
844Cluster block allocation table size in number of sectors, which must be non 0
884Data cluster start sector
924Total number of data clusters
964Root directory start cluster
1004Volume serial number
1041Format revision minor number
10511Format revision major number
1062Volume flags
1081Bytes per sector, which is stored as 2^n, for example 9 is 2^9 = 512. The bytes per sector value must be 512, 1024, 2048 or 4096
1091Sectors per cluster block, which is stored as 2^n, for example 3 is 2^3 = 8. The sectors per cluster block must be 1 upto 32M (2^25)
1101Number of cluster block allocation tables
1111Drive number
1121Unknown (percent in use), which contains the percentage of allocated cluster blocks in the cluster heap of 0xff if not available
1137Unknown (reserved)
120390Used for boot code
5102"\x55\xaa"Sector signature

Volume flags

ValueIdentifierDescription
0x0001ActiveFatActive FAT, 0 for the first FAT, 1 for the second FAT
0x0002VolumeDirtyIs dirty
0x0004MediaFailureHas media failures
0x0008ClearToZeroMust be cleared
0xfff0Unknown (reserved)

Cluster block allocation table

A cluster block allocation table consists of:

  • One ore more cluster block allocation table entries

Cluster block allocation table entry

A cluster block allocation table entry is 32 bits in size and consists of:

OffsetSizeValueDescription
032 bitsData cluster number

Where the data cluster number has the following meanings:

Value(s)Description
0x00000000Unused (free) cluster
0x00000001Unknown (invalid)
0x00000002 - 0xffffffefUsed cluster
0xfffffff0 - 0xfffffff6Reserved
0xfffffff7Bad cluster
0xfffffff8 - 0xffffffffEnd of cluster chain

Directory

A directory consists of:

  • Zero or more directory entries
  • Terminator directory entry

Directory entry

A directory entry is 32 bytes in size and consists of:

OffsetSizeValueDescription
01Entry type
119Entry data
204Data stream start cluster
248Data stream size

Directory entry type

OffsetSizeValueDescription
05 bitsType type code
0.51 bitIs non-critical (also referred to as type importance)
0.61 bitIs secondary entry (also referred to as type category)
0.71 bitIn use
ValueDescription
0x00Terminator directory entry
0x01 - 0x7fUnused
0x80Invalid
0x81 - 0xffUsed
Directory entry type codes
ValueDescription
Critical and primary
0x81Allocation bitmap
0x82Case folding mappings
0x83Volume label
0x85File entry
Non-critical and primary
0xa0Volume identifier
0xa1TexFAT padding
Critical and secondary
0xc0Data stream
0xc1File entry name
Non-critical and secondary
0xe0Vendor extension
0xe1Vendor allocation
Allocation bitmap directory entry
OffsetSizeValueDescription
010x81Entry type
11Bitmap flags
2180Unknown (Reserved)
204Data stream start cluster
248Data stream size
Case folding mappings directory entry
OffsetSizeValueDescription
010x82Entry type
130Unknown (Reserved)
44Checksum
8120Unknown (Reserved)
204Data stream start cluster
248Data stream size
Volume label directory entry
OffsetSizeValueDescription
010x83Entry type
11Name number of characters
222Name string, which contains an UCS-2 little-endian string without an end-of-string character
2480Unknown (Reserved)

Note that the volume label directory entry should only be stored in the first and/or second directory entry of the root directory.

File entry directory entry
OffsetSizeValueDescription
010x85Entry type
11Unknown (Secondary count)
22Unknown (Set checksum)
42File attribute flags
620Unknown (Reserved)
82Creation time
102Creation date
122Last modification time
142Last modification date
162Last access time
182Last access date
201Creation time fraction of seconds, which contains fraction of 2-seconds in 10 ms intervals
211Last modification time fraction of seconds, which contains fraction of 2-seconds in 10 ms intervals
221Creation time UTC offset, which contains number of 15 minute intervals of the time relative to UTC, where the MSB indicates the offset is valid
231Last modification time UTC offset, which contains number of 15 minute intervals of the time relative to UTC, where the MSB indicates the offset is valid
241Last access time UTC offset, which contains number of 15 minute intervals of the time relative to UTC, where the MSB indicates the offset is valid
2570Unknown (Reserved)
Volume identifier directory entry
OffsetSizeValueDescription
010xa0Entry type
11Unknown (Secondary count)
22Unknown (Set checksum)
42Unknown (Flags)
616Volume identifier, which contains a GUID
22100Unknown (Reserved)
Data stream directory entry
OffsetSizeValueDescription
010xc0Entry type
11Unknown (Flags)
210Unknown (Reserved)
31Name number of characters
42Name hash
620Unknown (Reserved)
88Data stream valid data size
1640Unknown (Reserved)
204Data stream start cluster
248Data stream size
File entry name directory entry
OffsetSizeValueDescription
010xc1Entry type
11Unknown (Flags)
230Name string, which contains an UCS-2 little-endian string without an end-of-string character

File attribute flags

ValueDescription
0x0001Read-only
0x0002Hidden
0x0004System
0x0008Is volume label
0x0010Is directory
0x0020Archive
0x0040Is device
0x0080Unused (reserved)

References

File Allocation Table (FAT) file system format

The File Allocation Table (FAT) is widely used a file sytem and is the default file system for DOS and Windows.

There are multiple known variants or derivatives of FAT, such as:

  • (original) 8-bit FAT
  • FAT-12
  • FAT-16
  • FAT-32
  • exFAT

Overview

A FAT file system consists of:

  • One or more reserved sectors
    • a boot record (or boot sector)
    • file system informartion for FAT-32
  • One or more cluster block allocation tables
  • Root directory data for FAT-12 and FAT-16
  • File and directory data

Note that FAT-32 stores the root directory as part of the file and directory data.

Characteristics

CharacteristicsDescription
Byte orderlittle-endian
Date and time valuesFAT date and time
Character stringsA narrow character Single Byte Character (SBC) ASCII string

Terminology

TermDescription
Hidden sectorsThe sectors stored before the FAT volume, such as those used to store a parition table

Determing the FAT format version

To distinguish between FAT-12, FAT-16 and FAT-32, compute the number of clusters in the data area:

data_area_size = total_number_of_sectors - (number_of_reserved_sectors + (
    number_of_allocation_tables * allocation_table_size) + size_of_root_directory)
number_of_clusters = round down (data_area_size / sectors_per_cluster)
  • FAT-12 is used if the number of clusters is less than 4085
  • FAT-16 is used if the number of clusters is less than 65525
  • FAT-32 is used otherwise

Boot record

The boot record is stored in the first sector of the volume.

FAT-12 and FAT-16 boot record

The FAT-12 and FAT-16 boot record is at least 512 bytes in size and consists of:

OffsetSizeValueDescription
03"\xeb\x3c\x90"Boot entry point (JMP +62, NOP)
38File system signature (or OEM name)
112Bytes per sector, which must be 512, 1024, 2048 or 4096
131Sectors per cluster block, which must be 1, 2, 4, 8, 16, 32, 64 or 128
142Number of reserved sectors (reserved region), which starts at the first sector of the volume (sector 0) and must be 1 or more (typically 1 or 32)
161Number of cluster block allocation tables, which must be 1 or more (typically 2)
172Number of root directory entries
192Total number of sectors (16-bit)
211Media descriptor
222Cluster block allocation table size (16-bit) in number of sectors
242Number of sectors per track
262Number of heads
284Number of hidden sectors
324Total number of sectors (32-bit)
361Drive number
3710Unknown (reserved for Windows NT)
381Extended boot signature
If extended boot signature == 0x29
394Volume serial number, which can be derived from the system current date and time
4311Volume label, which contains a narrow character string or "NO\x20NAME\x20\x20\x20\x20" if not set
548"FAT12\x20\x20\x20" or "FAT16\x20\x20\x20"File system hint, which is informational and not required
If extended boot signature != 0x29
3923Unknown
Common
62448Used for boot code
5102"\x55\xaa"Sector signature

Note that the sector signature must be set at offset 512 but in addition can be set in the last 2 bytes of the sector.

FAT-32 boot record

The FAT-32 boot record is at least 512 bytes in size and consists of:

OffsetSizeValueDescription
03"\xeb\x58\x90"Boot entry point (JMP +90, NOP)
38File system signature (or OEM name)
112Bytes per sector, which must be 512, 1024, 2048 or 4096
131Sectors per cluster block, which must be 1, 2, 4, 8, 16, 32, 64 or 128
142Number of reserved sectors (reserved region), which starts at the first sector of the volume (sector 0) and must be 1 or more (typically 1 or 32)
161Number of cluster block allocation tables, which must be 1 or more (typically 2)
1720Number of root directory entries, which must be 0 for FAT-32
1920Total number of sectors (16-bit), which must be 0 for FAT-32
211Media descriptor
2220Cluster block allocation table size (16-bit) in number of sectors, which must be 0 for FAT-32
242Number of sectors per track
262Number of heads
284Number of hidden sectors
324Total number of sectors (32-bit)
364Cluster block allocation table size (32-bit) in number of sectors, which must be non 0 for FAT-32
402Extended flags
4210Format revision minor number
4310Format revision major number
444Root directory start cluster
482File system information (FSINFO) sector number
502Boot record sector number
52120Unknown (reserved)
641Drive number
6510Unknown (reserved for Windows NT)
661Extended boot signature
If extended boot signature == 0x29
674Volume serial number, which can be derived from the system current date and time
7111Volume label, which contains a narrow character string or "NO\x20NAME\x20\x20\x20\x20" if not set
828"FAT32\x20\x20\x20"File system hint, which is informational and not required
If extended boot signature != 0x29
6723Unknown
Common
90420Used for boot code
5102"\x55\xaa"Sector signature

Note that the sector signature must be set at offset 512 but in addition can be set in the last 2 bytes of the sector.

OEM names

ValueDescription
"MSWIN4.1"
"MSDOS 5.0"

Media descriptors

ValueIdentifierDescription
0xe5
0xed
0xee
0xef
0xf0removable media
0xf4
0xf5
0xf8fixed (non-removable) media
0xf9
0xfa
0xfb
0xfc
0xfd
0xfe
0xff

Cluster block allocation table

A cluster block allocation table consists of:

  • One ore more cluster block allocation table entries

FAT 12 cluster block allocation table entry

A FAT 12 cluster block allocation table entry is 12 bits in size and consists of:

OffsetSizeValueDescription
012 bitsData cluster number

Where the data cluster number has the following meanings:

Value(s)Description
0x000Unused (free) cluster
0x001Unknown (invalid)
0x002 - 0xfefUsed cluster
0xff0 - 0xff6Reserved
0xff7Bad cluster
0xff8 - 0xfffEnd of cluster chain

FAT 16 cluster block allocation table entry

A FAT 16 cluster block allocation table entry is 16 bits in size and consists of:

OffsetSizeValueDescription
016 bitsData cluster number

Where the data cluster number has the following meanings:

Value(s)Description
0x0000Unused (free) cluster
0x0001Unknown (invalid)
0x0002 - 0xffefUsed cluster
0xfff0 - 0xfff6Reserved
0xfff7Bad cluster
0xfff8 - 0xffffEnd of cluster chain

FAT 32 cluster block allocation table entry

A FAT 32 cluster block allocation table entry is 32 bits in size and consists of:

OffsetSizeValueDescription
032 bitsData cluster number

Note that only the lower 28-bits are used

Where the data cluster number has the following meanings:

Value(s)Description
0x00000000Unused (free) cluster
0x00000001Unknown (invalid)
0x00000002 - 0x0fffffefUsed cluster
0x0ffffff0 - 0x0ffffff6Reserved
0x0ffffff7Bad cluster
0x0ffffff8 - 0x0fffffffEnd of cluster chain
0x10000000 - 0xffffffffUnknown

Directory

A directory consists of:

  • self (“.”) directory entry (not used in root directory)
  • parent (“..”) directory entry (not used in root directory)
  • Zero or more directory entries
  • Terminator directory entry

Directory entry

Determining the root directory location

first_allocation_table_offset = number_of_reserved_sectors * bytes_per_sector

FAT-12 and FAT-16 root directory

root_directory_start_offset = first_allocation_table_offset + (
    number_of_allocation_tables * allocation_table_size * bytes_per_sector)
first_cluster_offset = directory_start_sector + (number_of_root_directory_entries * 32)

FAT-32 root directory

first_cluster_offset = first_allocation_table_sector + (
    number_of_allocation_tables * allocation_table_size * bytes_per_sector)
root_directory_start_offset = first_cluster_sector + (
    (root_directory_cluster - 2) * number_of_sectors_per_cluster)

FAT-12 and FAT-16 directory entry

A FAT-12 and FAT-16 directory entry is 32 bytes in size and consists of:

OffsetSizeValueDescription
08Name, which is padded with spaces and the first character can have a special meaning
83Extension, which is padded with spaces
111File attribute flags
121Flags
131Creation time fraction of seconds, which contains fraction of 2-seconds in 10 ms intervals
142Creation time
162Creation date
182Last access date
202Unknown (OS/2 extended attribute)
222Last modification time
242Last modification date
262Data stream start cluster
284Data stream data size

FAT-32 directory entry

A FAT-32 directory entry is 32 bytes in size and consists of:

OffsetSizeValueDescription
08Name, which is padded with spaces and the first character can have a special meaning
83Extension, which is padded with spaces
111File attribute flags
121Flags
131Creation time fraction of seconds, which contains fraction of 2-seconds in 10 ms intervals
142Creation time
162Creation date
182Last access date
202Data stream data size, which contains the upper 16-bit of the value
222Last modification time
242Last modification date
262Data stream start cluster, which contains the lower 16-bit of the value
284Data stream data size

Short (or 8.3) file name

A FAT short (or 8.3) file name is stored in an OEM character set (codepage). The first character can have a special meaning.

Valid FAT short file name characters are:

ValueDescription
'A-Z'Upper case character
'0-9'Numeric character
' 'Space, where trailing spaces are considered padding and therefore ignored
'.'Dot, with the exception of "." and "..", where trailing dot characters are ignored
'!'Exclamation mark
'#'Hash
'$'Dollar sign
'%'Percent sign
'&'Ampersand
'''Single quote
'('Left parenthesis
')'Right parenthesis
'-'Hyphen
'@'At sign
'^'Caret
'_'Underscore
'`'Grave accent
'{'Left curly brace
'}'Right curly brace
'~'Tilde
0x80 - 0xffExtended ASCII character, which are codepage dependent

Note that other characters such as plus sign (‘+’) have been observed in FAT short file names.

First character

ValueDescription
0x00Last (or terminator) directory entry
0x01 - 0x13VFAT long file name directory entry
0x05Directory entry pending deallocation (deprecated since DOS 3.0) or substitution of a 0xe5 value
0x41 - 0x54Last VFAT long file name directory entry
0xe5Unallocated directory entry

File attribute flags

ValueDescription
0x01Read-only
0x02Hidden
0x04System
0x08Is volume label
0x10Is directory
0x20Archive
0x40Is device
0x80Unused (reserved)

Flags

ValueDescription
0x01Data is EFS encrypted
0x02Data contains large EFS header
0x08Name should be represented in lower case
0x10Extension should be represented in lower case

VFAT long file name entry

VFAT long file names entries are stored in directory entries. Multiple VFAT long file name entries can be used to store a single long file name, where the highest (last) sequence number is stored first. A maximum of 20 VFAT long file name entries can be used to store a long file name of 255 UCS-2 characters.

VFAT long file names are stored using UCS-2 little-endian, which allows for unpaired Unicode surrogates such as “U+d800” and “U+dc00”

VFAT long file name entries are stored before the directory entry containing the short file name and additional file entry information.

A VFAT long file name entry is 32 bytes in size and consists of:

OffsetSizeValueDescription
01Sequence number
110First name segment string, which contains 5 UCS-2 string characters
1110x0fUnknown (attributes)
1210x00Unknown (type)
131Checksum of the short (8.3) file name
1412Second name segment string, which contains 6 UCS-2 string characters
2620Unknown (first cluster)
284Third name segment string, which contains 2 UCS-2 string characters

Note that unused characters in the VFAT long file segment strings after the end-of-string character (0x0000) are padded with 0xffff.

VFAT long file name sequence number

OffsetSizeValueDescription
05 bitsNumber
0.51 bit0Unknown (reserved)
0.61 bit0Unknown (last logical, first physical LFN entry)
0.71 bit0Unknown

References

Hierarchical File System (HFS) format

The Hierarchical File System (HFS) was the default file system for Mac OS after Macintosh File System (MFS) and before Apple File System (APFS).

Note that this document uses Mac OS to refer to the Macintosh Operating System in general, instead of specific versions like Mac OS X or macOS. Mac OS X is used to refer to version of Mac OS 10.0 or later.

There are multiple known variants or derivatives of HFS, such as:

  • HFS
  • HFS+ 8.10, used by Mac OS 8.1 to 9.2.2
  • HFS+ 10.0, introduced in Mac OS 10.0
  • HFSX, introduced in Mac OS 10.3

Note that HFS can be referred to as “HFS Standard” and HFS+ or HFSX as “HFS Extended”.

HFSX (or HFS/X) is an extension to HFS+ to allow additional features that are incompatible with HFS+. One such feature is case-sensitive file names. A HFSX volume may be either case-sensitive or case-insensitive. Case sensitivity (or lack thereof) applies to all file and directory names on the volume.

Overview

FeatureHFSHFS+ and HFSX
Maximum file size231 (2 GiB)263 (8 EiB)
Maximum file name size31 characters255 characters
Maximum number of blocks216 (65535 bytes)232 (4294967296 bytes)
Character setnarrow character with codepageUnicode UTF-16 big-endian
Time stampsIn local timeIn UTC
Catalog B-tree file node size512 bytes4096 bytes
File attributesnoneBasic and extended

HFS

A HFS file system consists of:

The backup master directory block (MDB), is stored in the last 2 sectors of the volume.

Characteristics

CharacteristicsDescription
Byte orderbig-endian
Date and time valuesHFS timestamp in local time
Character stringsNarrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a system defined codepage

HFS+ and HFSX

A HFS+ or HFSX file system consists of:

The backup volume header, is stored in the last 1024 bytes of the volume.

Characteristics

CharacteristicsDescription
Byte orderbig-endian
Date and time valuesHFS timestamp in UTC
Character stringsUTF-16 big-endian

Terminology

TermDescription
Clump sizeSize of the group of (allocation) blocks (or clump), in bytes, to avoid fragmentation

Unicode strings

Unicode strings are stored as UTF-16 big-endian in Normalization Form Canonical Decomposition (NFD) based on Unicode 3.2, with exclusions. Unicode values in the ranges U+2000 - U+2FFF, U+F900 - U+FAFF and U+2F800 - U+2FAFF are not decomposed.

On Mac OS 8.1 through 10.2.x decomposition was based on Unicode 2.1.

TODO: determine what the impact of the different Unicode versions is.

Note that based on observations on Mac OS 10.15.7 on HFS+ the range U+1D000 - U+1D1FF is excluded from decomposition and U+2400 is replaced by U+0.

HFS timestamp

Date and time values are stored as an unsigned 32-bit integer containing the number of seconds since January 1, 1904 at 00:00:00 (midnight), where:

  • MFS and HFS use local time;
  • HFS+ and HFSX use Coordinated Universal Time (UTC).

This document will refer to both forms as HFS timestamp.

The maximum representable date is February 6, 2040 at 06:28:15 UTC.

The HFS timestamp does not account for leap seconds. It includes a leap day in every year that is evenly divisible by 4. This is sufficient given that the range of representable dates does not contain 1900 or 2100, neither of which have leap days.

File names

TN1150 states that HFS file names are compared in case-insensitive assuming a MacRoman encoding.

Upper caseLower case
0x41 - 0x5a (A - Z)0x61 - 0x7a (a - z)
0x80 (Ä)0x8a (ä)
0x81 (Å)0x8c (å)
0x82 (Ç)0x8d (ç)
0x83 (É)0x8e (é)
0x84 (Ñ)0x96 (ñ)
0x85 (Ö)0x9a (ö)
0x86 (Ü)0x9f (ü)
0xae (Æ)0xbe (æ)
0xaf (Ø)0xbf (ø)
0xcb (À)0x88 (à)
0xcc (Ã)0x8b (ã)
0xcd (Õ)0x9b (õ)
0xce (Œ)0xcf (œ)
0xd9 (Ÿ)0xd8 (ÿ)
0xe5 (Â)0x89 (â)
0xe6 (Ê)0x90 (ê)
0xe7 (Á)0x87 (á)
0xe8 (Ë)0x91 (ë)
0xe9 (È)0x8f (è)
0xea (Í)0x92 (í)
0xeb (Î)0x94 (î)
0xec (Ï)0x95 (ï)
0xed (Ì)0x93 (ì)
0xee (Ó)0x97 (ó)
0xef (Ô)0x99 (ô)
0xf1 (Ò)0x98 (ò)
0xf2 (Ú)0x9c (ú)
0xf3 (Û)0x9e (û)
0xf4 (Ù)0x9d (ù)

HFS+ allows for the “/” character in file names. On Mac OS, Finder this will be represented as a “/” but in Terminal it is replaced by “:” since the same character is used as path segment separator. A file name with a “:” created in Terminal will be shown as “/” in Finder. Finder does not allow the creation of a file containing “:” in the name. A symbolic link created in Terminal to a file with a “:” in name will not convert the “:” character in the link target data. The Linux HFS+ implementation appears to apply a similar conversion logic as Terminal.

B-tree files

HFS, HFS+ and HFSX use multiple B-trees files.

A B-tree file consists of fixed sized nodes:

  • header node
  • map nodes
  • index (root and branch) nodes
  • leaf nodes

Note that only the data fork of a B-tree file is used. The resource fork should be unused.

The size of a B-tree file can be calculated in the following manner:

size = number_of_nodes * node_size

Node size

The node size is determined when the B-tree file is created.

FeatureHFSHFS+ and HFSX
Node size512 byteswhere the value must be a power of 2 in the range 512 - 32768

In a HFS+ the B-tree node size is stored in the header node.

Default node sizes:

FeatureHFSHFS+ and HFSX
catalog file5124 KiB (8 KiB in Mac OS X)
extents overflow file5121 KiB (4 KiB in Mac OS X)
attributes fileN/A4 KiB

B-tree (file) node

A B-tree file node consists of:

  • node descriptor
  • node records
  • node record offsets

The first node in the file is referenced by node number 0.

The node offset relative to the start of the file and can be calculated in the following manner:

node_offset = node_number * node_size

B-tree node descriptor

The B-tree node descriptor (BTNodeDescriptor) is 14 bytes in size and consists of:

OffsetSizeValueDescription
04Next tree node number (forward link), which contains 0 if empty
44Previous tree node number (backward link), which contains 0 if empty
81Node type, which consists of a signed 8-bit integer
91Node level, which consists of a signed 8-bit integer
102Number of records
1220Unknown (Reserved), should contain 0

The root node level is 0, with a maximum depth of 8.

B-tree node types
ValueIdentifierDescription
-1kBTLeafNodeleaf node
0kBTIndexNodeindex node
1kBTHeaderNodeheader node
2kBTMapNodemap node

B-tree node record

The B-tree node record contains (leaf) data or a reference to an index node and consists of:

  • a key
  • value data

B-tree record offsets

The B-tree record offsets are an array of 16-bit integers relative from the start of the B-tree node descriptor. The first record offset is found at node size - 2, e.g. 512 - 2 = 510, the second 2 bytes before that, e.g. 508, etc.

An additional record offset is added at the end to signify the start of the free space.

Note that the record offsets are not necessarily stored in linear order.

B-tree header node

The B-tree header node is stored in the first node of the B-tree file and contains 3 records:

  • the B-tree header record;
  • the user data record, which consist of 128 bytes (reserved within HFS);
  • the B-tree map record.

Note that the records in the B-tree header node do not have keys.

B-tree header record

The B-tree header record (BTHeaderRec) is 106 bytes in size and consists of:

OffsetSizeValueDescription
02Depth of the tree
24Root node number
64Number of data records contained in leaf nodes
104First leaf node number
144Last leaf node number
182Node size, in bytes, where the value must be a power of 2 in the range 512 - 32768
202Maximum key size, in bytes
224Number of nodes
264Number of unused nodes
HFS
3076Unknown (Reserved)
HFS+/HFSX
302Unknown (Reserved)
324Clump size, in bytes
361B-tree file type
371Key comparision method
384Flags (or attributes)
4216 x 4 = 64Unknown (Reserved)

TODO: does the number of data records equal the number of leaf nodes?

File type
ValueIdentifierDescription
0x00Control file
0x80First user B-tree type
0xffReserved B-tree type
Key comparision methodtype
ValueIdentifierDescription
0x00Unknown (not set), observed on HFS standard, HFS+ and an empty HFSX file system
0xbcBinary compare (case-sensitive)
0xcfUnicode case folding (case-insensitive)
Flags
ValueIdentifierDescription
0x00000001kBTBadCloseMaskBad close, which indicates that the B-tree was not closed properly and should be checked for consistency (Not used by HFS+ and HFSX)
0x00000002kBTBigKeysMaskBig keys, which indicates the key data size value of the keys in index and leaf nodes is 16-bit integer, otherwise, it is an 8-bit integer (Must be set for HFS+ and HFSX)
0x00000004kBTVariableIndexKeysMaskVariable-size (index) keys, which indicates that the keys in index nodes occupy the number of bytes indicated by their key size; otherwise, the keys in index nodes always occupy maximum key size (must be set for the HFS+ and HFSX Catalog B-tree, and cleared for the HFS+ and HFSX Extents overflow B-tree)

B-tree map record

The B-tree map record contains of a bitmap that indicates which nodes in the B-tree file are used and which are not. If a bit is set, then the corresponding node in the B-tree file is in use.

The bitmap is 256 bytes in size and can represent a maximum of 2048 nodes. If more nodes are needed a map node is used to store additional mappings.

The map node

If a B-tree file contains more than 2048 nodes, which are enough for about 8000 files, a map node is used to store additional node-mapping information.

The next tree node value in the B-tree node descriptor of the header node is used to refer to the first map node.

A map node consists of a B-tree node descriptor and one B-tree map record. The map record is 494 bytes in size 512 - (14 + 2) and can therefore contain mapping information for 3952 nodes.

If a B-tree contains more than 6000 nodes (enough for about 25000 files) a second map node is needed. The next tree node value in the B-tree node descriptor of the first map node is used to refer to the second.

If more map nodes are required, each additional map node is similarly linked to the previous one.

The root node

The root node is the start of the B-tree structure; usually the root node is an index node, but it might be a leaf node if there are no index nodes.

The root node number is stored in the B-tree header record and is 0 if the B-tree is empty.

The index node

The records stored in an index node are called pointer records. A pointer record consists of a key followed by the node number of the corresponding node. The size of the key varies according to the type of B-tree file.

  • In a catalog file, the search key is a combination of the file or directory name and the parent identifier of that file or directory.
  • In an extents overflow file, the search key is a combination of that file’s type, its file identifier and the index of the first block in the extent.

The immediate descendants of an index node are called the children of the index node. An index node can have from 1 to 15 children, depending on the size of the pointer records that the index node contains.

The leaf node

The leaf nodes contain data records. The structure of the leaf node data records varies according to the type of B-tree.

  • In an extents overflow file, the leaf node data records consist of a key and an extent record.
  • In a catalog file, the leaf node data records can be any one of four kinds of records.

HFS Master Directory Block (MDB)

The primary Master Directory Block (MDB) (or volume information block (VIB)) is located at offset 1024 of the volume.

The MDB is 162 bytes in size and consists of:

OffsetSizeValueDescription
02"BD" (or "\x42\x44")Volume signature
24Creation time, which contains a HFS timestamp in local time
64(last) modification time, which contains a HFS timestamp in local time
102Volume attribute flags
122Number of files in the root directory
142Volume bitmap block number, contains a block number relative from the start of the volume, where 0 is the first block number, typically 3
162Next allocation search block number
182Number of blocks, where a volume can contain at most 65535 blocks
204Block size, in bytes, must be a multitude of 512
244Clump size, in bytes
282Data area block number, contains a block number relative from the start of the volume, where 0 is the first block number
304Next available catalog node identifier (CNID), which can be a directory or file record identifier
342Number of unused blocks
361Volume label size, with a maximum of 27
3727Volume label
644(last) backup time, which contains a HFS timestamp in local time
682Backup sequence number
704Volume write count, which contains the number of times the volume has been written to
744Extents overflow file clump size, in bytes
784Catalog file clump size, in bytes
822Number of sub directories in the root directory
844Total number of files, which does not include file system metadata files
884Total number of directories (folders), which does not include the root folder
9232Finder information
1242Embedded volume signature (drVCSize)
1264Embedded volume extent descriptor (drVBMCSize and drCtlCSize)
1304Extents overflow file size
13412Extents overflow file extents record
1464Catalog file size
15012Catalog file extents record

Note that the volume modification time is not necessarily the data and time when the volume was last flushed.

Notes

TODO: check

  • drVCSize => Volume cache block size (16-bit)
  • drVBMCSize => Volume bitmap cache block size (16-bit)
  • drCtlCSize => Common volume cache block size (16-bit)

HFS Volume Bitmap

The volume bitmap is used to keep track of block allocation. The bitmap contains one bit for each block in the volume.

  • If a bit is set, the corresponding block is currently in use by some file.
  • If a bit is clear, the corresponding block is not currently in use by any file and is available.

The volume bitmap does not indicate which files occupy which blocks. The actual file-mapping information in maintained in two locations:

  • in the corresponding catalog entry;
  • in the corresponding extents overflow file entry.

The size of the volume bitmap depends on the number of blocks in the volume.

A 800 KiB floppy disk with a block size of 512 bytes has a volume bitmap size of:

((800 * 1024) / (512 * 8)) = 1600 bits (200 bytes).

A 32 MiB volume containing 32 MiB with a block size of 512 bytes has a volume bitmap size of:

((32 * 1024 * 1024) / (512 * 8)) = 65536 bits (8192 bytes).

The number of blocks in the volume in the MDB consists of a 16-bit integer, so no more than 65535 blocks can be addressed. The volume bitmap is never larger than 8192 bytes (or 16 physical blocks). For volumes containing more than 32 MiB of space, the block size must be increased.

A volume containing 40 MiB of space must have an block size that is at least 2 x 512 bytes.

A volume containing 80 MiB of space must have an block size that is at least 3 x 512 bytes.

HFS+ and HFSX Volume Header

The volume header (HFSPlusVolumeHeader) replaces the master directory block (MDB). The volume header starts at offset 1024 of the volume.

The block containing the first 1536 bytes (reserved space plus volume header) are marked as used in the allocation file.

The volume header is 512 bytes in size and consists of:

OffsetSizeValueDescription
02"H+" (or "\x48\x2b") or "HX" (or "\x48\x58")Volume signature, where "H+" (kHFSPlusSigWord) is used for HFS+ and "HX" (kHFSXSigWord) for HFSX
22Format version, where 4 (kHFSPlusVersion) is used for HFS+ and 5 (kHFSXVersion) for HFSX
44Volume attribute flags
84Last mounted version
124Journal information block number, contains a block number relative from the start of the volume
164Creation time, which contains a HFS timestamp in UTC
204(last) content modification time, which contains a HFS timestamp in UTC
244(last) backup time, which contains a HFS timestamp in UTC
284Checked time, which contains a HFS timestamp in UTC
324Total number of files, which does not include file system metadata files
364Total number of directories (folders), which does not include the root folder
404Block size, in bytes
444Total number of blocks
484Number of unused blocks
524Next allocation search block number (nextAllocation)
564Clump size, in bytes, of a resource fork
604Clump size, in bytes, of a data fork
644Next available catalog node identifier (CNID), which can be a directory or file record identifier
684Volume write count, which contains the number of times the volume has been written to
728Encodings bitmap
8032Finder information
11280Allocation file fork descriptor
19280Extents overflow file fork descriptor
27280Catalog file fork descriptor
35280Attributes file fork descriptor
43280Startup file fork descriptor

Total number of blocks

For a disk whose size is an even multiple of the block size, all areas on the disk are included in an block, including the volume header and backup volume header. For a disk whose size is not an even multiple of the block size, only the blocks that will fit entirely on the disk are counted here. The remaining space at the end of the disk is not used by the volume format (except for storing the backup volume header, as described above).

Volume attribute flags

The volume attributes flags are specified as following.

ValueIdentifierDescription
0x00000080kHFSVolumeHardwareLockBitVolume hardware lock, set if the volume is write-protected due to a hardware setting
0x00000100kHFSVolumeUnmountedBitVolume unmounted, set if the volume was correctly flushed before being unmounted or ejected
0x00000200kHFSVolumeSparedBlocksBitVolume spared blocks, set if there are any records in the extents overflow file for bad blocks
0x00000400kHFSVolumeNoCacheRequiredBitVolume no cache required, set if the blocks from this volume should not be cached
0x00000800kHFSBootVolumeInconsistentBitBoot volume inconsistent, set if the volume was mounted for writing
0x00001000kHFSCatalogNodeIDsReusedBitCatalog node identifiers reused, set when the next catalog identifier value overflows 32 bits, forcing smaller catalog node identifiers to be reused
0x00002000kHFSVolumeJournaledBitJournaled, set if the file system uses a journal
0x00004000kHFSVolumeInconsistentBitUnknown (Reserved)
0x00008000kHFSVolumeSoftwareLockBitVolume software lock, set if the volume is write-protected due to a software setting
0x40000000kHFSContentProtectionBitUnknown (Reserved)
0x80000000kHFSUnusedNodeFixBitUnknown (Reserved)

Last mounted version

ValueIdentifierDescription
"8.10"used by Mac OS 8.1 to 9.2.2
"10.0"kHFSPlusMountVersionused by Mac OS X
"FSK!" or "fsck"used by fsck_hfs on Mac OS X
"HFSJ"kHFSJMountVersionused by journaled HFS+ or HFSX

TODO: add text about HFS standard

HFS+ supports both hard links and symbolic links.

Hard links to directories are not supported (allowed).

Hard links in HFS+/HFSX are represented by multiple different types of file records:

  • one indirect node file record, named “iNode#”, where # is the link reference. This file contains the content of the file shared by the hard links.
  • one or more hard link file records, that reference the indirect node file record.

Indirect node files are stored in a file system metadata directory referred to as the metadata directory with the name “/\u{2400}\u{2400}\u{2400}\u{2400}HFS+ Private Data”.

The link reference corresponds to the catalog node identifier (CNID) of the indirect node file, where 0 is not a valid link reference.

Note that TN1150 states that a new link reference randomly chosen from the range 100 to 1073741923. However link references that fall outside of this range have been observed such as “iNode20”.

The special permission data of the hard link file records contains the link reference if:

  • the catalog file record flag kHFSHasLinkChainMask is set;
  • and the first 8 bytes of the file information contains “hlnkhfs+”
ValueIdentifierDescription
"hlnk"kHardLinkFileTypeHard link file type
"hfs+"kHFSPlusCreatorHard link file creator

The hard link file’s creation date should be set to the creation date of the metadata directory, but the creation date may also be set to the creation date of the volume’s root directory though this is deprecated.

Device identifier

The Special permission data contains the device identifier. The device identifier can be stored in different formats, such as: “native”, “386bsd”, “4bsd”, “bsdos”, “freebsd”, “hpux”, “isc”, “linux”, “netbsd”, “osf1”, “sco”, “solaris”, “sunos”, “svr3”, “svr4” and “ultrix”.

The “native” and “hpux” device identifier is 4 bytes in size and consists of:

OffsetSizeValueDescription
01Major device number
120Unknown
31Minor device number

The “386bsd”, “4bsd”, “freebsd”, “isc”, “linux”, “netbsd”, “sco”, “sunos”, “svr3” and “ultrix” device identifier is 4 bytes in size and consists of:

OffsetSizeValueDescription
020Unknown
21Major device number
31Minor device number

The “solaris” and “svr4” device identifier is 4 bytes in size and consists of:

OffsetSizeValueDescription
0.018 bitsMinor device number
2.214 bitsMajor device number

The “bsdos” and “osf1” device identifier is 4 bytes in size and consists of:

OffsetSizeValueDescription
0.020 bitsMinor device number
2.412 bitsMajor device number

The “bsdos” alternative device identifier is 4 bytes in size and consists of:

OffsetSizeValueDescription
0.08 bitsSub unit number
1.012 bitsUnit number
2.412 bitsMajor device number

The data fork of a symbolic link contains the path of the directory or file it refers to.

On HFS+/HFSX the symbolic link target contains a POSIX pathname, as used by the Mac OS BSD and Cocoa programming interfaces; not a traditional Mac OS or Carbon, path.

The path is stored as an UTF-8 encoded string without an end-of-string character. The length of the path should be 1024 bytes or less. The path may be full or partial, with or without a leading forward slash.

The first 8 bytes of the file information should contain “slnkrhap”.

ValueIdentifierDescription
"slnk"kSymLinkFileTypeSymbolic link file type
"rhap"kSymLinkCreatorSymbolic link file creator

The resource fork of a symbolic link is reserved and should be 0 bytes in size.

The catalog file

The catalog file is a B-tree file used to maintain information about the hierarchy of files and directories of a volume.

The block number of the first file extent of the catalog file (the header node) is stored in the master directory block (HFS) or the volume header (HFS+). The B-tree structure is described in section: B-tree files.

Each node in the catalog file is assigned a unique catalog node identifier (CNID). The CNID is used for both directory and file identifiers. For any given file or directory the parent identifier is the CNID of the parent directory. The first 16 CNIDs are reserved for use by Apple and include the following standard assignments:

CNIDIdentifierAssignment
0Unknown (Reserved)
1kHFSRootParentIDParent identifier of the root directory (folder)
2kHFSRootFolderIDDirectory identifier of the root directory (folder)
3kHFSExtentsFileIDExtents overflow file
4kHFSCatalogFileIDCatalog file
5kHFSBadBlockFileIDBad allocation block file
6kHFSAllocationFileIDAllocation file (HFS+)
7kHFSStartupFileIDStartup file (HFS+)
8kHFSAttributesFileIDAttributes file (HFS+)
14kHFSRepairCatalogFileIDUsed temporarily by fsck_hfs when rebuilding the catalog file
15kHFSBogusExtentFileIDBogus extent file, which is used temporarily during exchange files operations
16kHFSFirstUserCatalogNodeIDFirst available CNID for user's files and folders

Catalog file keys

In a catalog file a key consists of:

  • parent directory identifier
  • (optional) file or directory name

The volume reference number is not included in the search key.

Text encoding hint

Encoding typeValueEncodings bitmap number
MacRoman00
MacJapanese11
MacChineseTrad22
MacKorean33
MacArabic44
MacHebrew55
MacGreek66
MacCyrillic77
MacDevanagari99
MacGurmukhi1010
MacGujarati1111
MacOriya1212
MacBengali1313
MacTamil1414
MacTelugu1515
MacKannada1616
MacMalayalam1717
MacSinhalese1818
MacBurmese1919
MacKhmer2020
MacThai2121
MacLaotian2222
MacGeorgian2323
MacArmenian2424
MacChineseSimp2525
MacTibetan2626
MacMongolian2727
MacEthiopic2828
MacCentralEurRoman2929
MacVietnamese3030
MacExtArabic3131
MacSymbol3333
MacDingbats3434
MacTurkish3535
MacCroatian3636
MacIcelandic3737
MacRomanian3838
MacFarsi14049
MacUkrainian15248

HFS catalog key

The HFS catalog key is of variable size and consists of:

OffsetSizeValueDescription
01Key data size, in bytes, which consists of a signed 8-bit integer
If key data size >= 6
11Unknown (Reserved)
24Parent identifier (CNID)
61Name size without the end-of-string character
7...Name string, which contains a narrow character string without end-of-string character
......Unknown (Alignment padding)

Note that a key data size of 0 indicates a records that is no longer in use.

The catalog node name always is stored as 32 bytes and therefore the maximum key size within an index node should be 37. In a leaf node the catalog node name varies in size.

Keys in a leaf node must be stored 16-bit aligned within the node data. The size of the alignment padding is not included in the key data size.

HFS+ and HFSX catalog key

The HFS+ and HFSX catalog key is of variable size and consists of:

OffsetSizeValueDescription
02Key data size, in bytes
If key data size >= 4
24Parent identifier, which contains a CNID
If key data size >= 6
62Number of characters in the name string
8...Name string, which contains an UTF-16 big-endian string without end-of-string character

Note that the characters ‘:’ and U+2400 are stored as ‘/’ and U+0 respectively and must be converted before comparision.

The catalog data

A catalog leaf node can contain four different types of records:

  • a folder record, which contains information about a single directory.
  • a file record, which contains information about a single file.
  • a folder thread record, which provides a link between a directory and its parent directory.
  • a file thread record, which provides a link between a file and its parent directory.

The thread records are used to find the name and directory identifier of the parent of a given file or directory.

Each catalog data record consists of:

  • the catalog data record header;
  • the catalog data record data.

The catalog data record header

HFS catalog data record header

The HFS catalog data record header is 2 bytes in size and consists of:

OffsetSizeValueDescription
01Record type, which consists of a signed 8-bit integer
110x00Unknown (Reserved), which consists of a signed 8-bit integer

Note that to distinguish between HFS and HFS+ record types, record type should be treated as a 16-bit big-endian value.

HFS+ and HFSX catalog data record header

The HFS+ and HFSX catalog data record header is 2 bytes in size and consists of:

OffsetSizeValueDescription
02Record type
The catalog data record types
ValueIdentifierDescription
0x0001kHFSPlusFolderRecordHFS+/HFSX Folder record
0x0002kHFSPlusFileRecordHFS+/HFSX File record
0x0003kHFSPlusFolderThreadRecordHFS+/HFSX Folder thread record
0x0004kHFSPlusFileThreadRecordHFS+/HFSX File thread record
0x0100kHFSFolderRecord (or cdrDirRec)HFS Folder record
0x0200kHFSFileRecord (or cdrFilRec)HFS File record
0x0300kHFSFolderThreadRecord (or cdrThdRec)HFS Folder thread record
0x0400kHFSFileThreadRecord (or cdrFThdRec)HFS File thread record

The catalog folder record

HFS catalog folder record

The HFS catalog folder record (cdrDirRec, kHFSFolderRecord) is 70 bytes in size and consists of:

OffsetSizeValueDescription
020x0100Record type
22Folder flags
42Number of directory entries (valence)
64Identifier (CNID)
104Creation time, which contains a HFS timestamp in local time
144(last) content modification time, which contains a HFS timestamp in local time
184(last) backup time, which contains a HFS timestamp in local time
2216Folder information
3816Extended folder information
544 x 4 = 16Unknown (Reserved), which consists of an array of 32-bit integer values
HFS catalog folder record flags

Not defined. The HFS catalog folder record appears to always have a corresponding folder thread record.

HFS+ and HFSX catalog folder record

The HFS+ and HFSX catalog folder record (HFSPlusCatalogFolder) is 88 bytes in size and consists of:

OffsetSizeValueDescription
020x0001Record type
22Flags
44Number of directory entries (valence)
84Identifier (CNID)
124Creation time, which contains a HFS timestamp in UTC
164(last) content modification time, which contains a HFS timestamp in UTC
204(last) record (or attribute) modification (or change) time, which contains a HFS timestamp in UTC
244(last) access time, which contains a HFS timestamp in UTC
284(last) backup time, which contains a HFS timestamp in UTC
Permissions
324Owner identifier
364Group identifier
401Administration flags
411Owner flags
422File mode
444Special permission data
Folder information
4816Folder information
Extended folder information
6416Extended folder information
 
804Text encoding hint
8440x00Unknown (Reserved)

The catalog file record

HFS catalog file record

The HFS catalog file record (cdrFilRec, kHFSFileRecord) is 102 bytes in size and consists of:

OffsetSizeValueDescription
020x0200Record type
21Flags, which consists of a signed 8-bit integer
310x00File type, which consists of a signed 8-bit integer and should contain 0
416File information
204Identifier (CNID)
242Data fork block number
264Data fork size
304Data fork allocated size
342Resource fork block number
364Resource fork size
404Resource fork allocated size
444Creation time, which contains a HFS timestamp in local time
484(last) content modification time, which contains a HFS timestamp in local time
524(last) backup time, which contains a HFS timestamp in local time
5616Extended file information
722Clump size
7412Data fork extents record
8612Resource fork extents record
9840x00Unknown (Reserved)

TODO: determine if the data and resource fork block number values are used

HFS catalog file record flags
ValueIdentifierDescription
0x0001File is locked and cannot be written to
0x0002Has thread record
0x0080kHFSHasDateAddedMaskHad added time
HFS+ and HFSX catalog file record

The HFS+ and HFSX catalog file record (kHFSPlusFileRecord) is 248 bytes in size and consists of:

OffsetSizeValueDescription
020x0002Record type
22Flags
440x00Unknown (Reserved)
84Identifier (CNID)
124Creation time, which contains a HFS timestamp in UTC
164(last) content modification time, which contains a HFS timestamp in UTC
204(last) record (or attribute) modification time, which contains a HFS timestamp in UTC
244(last) access time, which contains a HFS timestamp in UTC
284(last) backup time, which contains a HFS timestamp in UTC
Permissions
324Owner identifier
364Group identifier
401Administration flags
411Owner flags
422File mode
444Special permission data
File information
4816File information (or user information)
Extended file information
6416Extended file information (or finder information)
 
804Text encoding hint
8440x00Unknown (Reserved)
8880Data fork descriptor
16880Resource fork descriptor
HFS+ catalog file record flags
ValueIdentifierDescription
0x0001kHFSFileLockedMaskFile is locked and cannot be written to
0x0002kHFSThreadExistsMaskHas thread record, which should be always set for a file record on HFS+/HSFX
0x0004kHFSHasAttributesMaskHas extended attributes
0x0008kHFSHasSecurityMaskHas ACLs
0x0010kHFSHasFolderCountMaskHas number of sub-folder
0x0020kHFSHasLinkChainMaskHas a hard link target (link chain), where the CNID of the hard link target is stored in the special permission data
0x0040kHFSHasChildLinkMaskHas a child that is a directory link
0x0080kHFSHasDateAddedMaskHad added time, where the extended folder of file information contains the time the folder or file was added (date_added)
0x0100kHFSFastDevPinnedMaskUnknown
0x0200kHFSDoNotFastDevPinMaskUnknown
0x0400kHFSFastDevCandidateMaskUnknown
0x0800kHFSAutoCandidateMaskUnknown

The catalog thread record

The file thread record is similar to the folder thread record except that it refers to a file, instead of a directory.

HFS catalog file thread record

The HFS catalog thread record (kHFSFolderThreadRecord (or cdrThdRec), kHFSFileThreadRecord (or cdrFThdRec)) is of variable size and consists of:

OffsetSizeValueDescription
020x0300 or 0x0400Record type
22 x 4 = 80x00Unknown (Reserved), which consists of an array of 32-bit integer values
104Parent identifier (CNID)
141Number of characters in the name string, with a maximum of 31
15...Name string, which contains a narrow character string without end-of-string character
HFS+ and HFSX catalog file thread record

The HFS+ and HFSX catalog thread record (kHFSPlusFolderThreadRecord, kHFSPlusFileThreadRecord) is of variable size and consists of:

OffsetSizeValueDescription
020x0003 or 0x0004Record type
220x00Unknown (Reserved), which consists of a unsigned 16-bit integer
44Parent identifier (CNID)
82Number of characters in the name string, with a maximum of 255
10...Name string, which contains an UTF-16 big-endian string without end-of-string character

Permissions

For each file and folder HFS+ maintains basic access permissions record for each file and folder. These are similar to basic Unix file permissions.

TODO: add note about permissions on HFS

Owner and group identifier

The Mac OS X user ID of the owner of the file or folder. Mac OS X versions prior to 10.3 treats user ID 99 as if it was the user ID of the user currently logged in to the console. If no user is logged in to the console, user ID 99 is treated as user ID 0 (root). Mac OS X version 10.3 treats user ID 99 as if it was the user ID of the process making the call (in effect, making it owned by everyone simultaneously). These substitutions happen at run-time. The actual user ID on disk is not changed.

The Mac OS X group ID of the group associated with the file or folder. Mac OS X typically maps group ID 99 to the group named “unknown.” There is no run-time substitution of group IDs in Mac OS X.

Administration flags

ValueIdentifierDescription
0x01SF_ARCHIVEDFile has been archived
0x02SF_IMMUTABLEFile is immutable and may not be changed
0x04SF_APPENDWrites to file may only append

Owner flags

ValueIdentifierDescription
0x01UF_NODUMPDo not backup (dump) this file
0x02UF_IMMUTABLEFile is immutable and may not be changed
0x04UF_APPENDWrites to file may only append
0x08UF_OPAQUEDirectory is opaque

File mode

ValueIdentifierDescription
0xf000 (0170000)S_IFMTFile type bitmask
0x1000 (0010000)S_IFIFONamed pipe
0x2000 (0020000)S_IFCHRCharacter-special file (Character device)
0x4000 (0040000)S_IFDIRDirectory
0x6000 (0060000)S_IFBLKBlock-special file (Block device)
0x8000 (0100000)S_IFREGRegular file
0xa000 (0120000)S_IFLNKSymbolic link
0xc000 (0140000)S_IFSOCKSocket
0xe000 (0160000)S_IFWHTWhiteout, which is a file entry that covers up all entries of a particular name from lower branches

HFS+ uses the BSD file type and mode bits. Note that the constants from the header shown below are in octal (base eight), not hexadecimal.

Octal valueIdentifierDescription
0004000S_ISUIDSet user identifier on execution
0002000S_ISGIDSet group identifier on execution
0001000S_ISTXTSticky bit
0000700S_IRWXURead, write and execute access for owner
0000400S_IRUSRRead access for owner
0000200S_IWUSRWrite access for owner
0000100S_IXUSRExecute access for owner
0000070S_IRWXGRead, write and execute access for group
0000040S_IRGRPRead access for group
0000020S_IWGRPWrite access for group
0000010S_IXGRPExecute access for group
0000007S_IRWXORead, write and execute access for other
0000004S_IROTHRead access for other
0000002S_IWOTHWrite access for other
0000001S_IXOTHExecute access for other

Note that if the sticky bit is set for a directory, then Mac OS restricts movement, deletion, and renaming of files in that directory. Files may be removed or renamed only if the user has write access to the directory; and is the owner of the file or the directory, or is the super-user.

HFS+ file special permission data

The special permission data is used to store the following information:

  • hard link reference (iNodeNum)
  • number of (hard) links (linkCount) in indirect node files
  • device numbers of block (S_IFBLK) and character (S_IFCHR) devices files

File system hierarchy

File and folder records have a search key with a non-empty name string. In thread records the name string in the search key is empty. E.g. to list the file entries in a directory:

  • find all the file or folder records given the parent CNID

Finding a file or directory by its CNID is a two-step process:

  1. use the CNID to look up the thread record for the file or directory
  2. use the thread record to look up the file or folder record

File forks

Forks in HFS and HFS+ can be compared to data streams in NTFS. In HFS+ the fork values are grouped in a separate fork descriptor structure. HFS+ also defines extended attributes (named forks). These are not stored in the catalog file but in the attributes file.

HFS+ fork descriptor structure

HFS+ maintains information about file contents using the HFS+ fork descriptor structure (HFSPlusForkData).

The fork descriptor structure is 80 bytes in size and consists of:

OffsetSizeValueDescription
08Size, in bytes
84Clump size, in bytes
124Number of blocks
1664Data extents record

The extents overflow file

In HFS and HFS+ extents (contiguous ranges of blocks) are used to track which blocks belong to a file. The first three (HFS) and eight (HFS+) are stored in the catalog file. Additional extents are stored in the extents overflow file.

The structure of an extents overflow file is relatively simple compared to that of a catalog file. The function of the extents overflow file is to store those file extents that are not contained in the master directory block (MDB) or volume header and the catalog file

Note that the file system B-tree files can have additional extents in the extents overflow file. This has been observed with the attributes file. It is currently unknown if the extents (overflow) file itself can have overflow extents.

The extents overflow key (record)

Disks initialized using the enhanced Disk Initialization Manager introduced in system software version might contain extent records for some blocks that do not belong to any actual file in the file system. These extent records have been marked as a bad block (CNID 5). See the chapter “Disk Initialization Manager” in this book for details on bad block sparing.

The key has been selected so that the extent records for a particular fork are grouped together in the B-tree, right next to all the extent records for the other fork of the file. The fork offset of the preceding extent record is needed to determine the key of the next extent record

In an extents overflow file the search key consists of:

  • fork type
  • file identifier
  • first block in the extent

HFS extents overflow key (record)

The HFS extents overflow key (record) is 8 bytes in size and consists of:

OffsetSizeValueDescription
017Key data size, in bytes, which consists of a signed 8-bit integer
11Fork type, which consists of a signed 8-bit integer
24File identifier (CNID)
62Logical block number

The first 8 extents in a fork are held in its catalog file record. So the number of extent records for a fork is:

(number_of_extents - 3 + 2) / 4

HFS+ and HFSX extents overflow key (record)

The HFS+ and HFSX extents overflow key (record) is 12 bytes in size and consists of:

OffsetSizeValueDescription
0210Key data size, in bytes, which consists of an unsigned 16-bit integer
21Fork type, which consists of a signed 8-bit integer
310x00Unknown (Padding)
44File identifier (CNID)
84Logical block number

The first 8 extents in a fork are held in its catalog file record. So the number of extent records for a fork is:

(number_of_extents - 8 + 7) / 8

HFS fork types

ValueIdentifierDescription
-1 (0xff)Resource fork
0 (0x00)Data fork

The extent (data) record

An extent is a contiguous range of blocks that have been allocated to an individual file. An extent is represented by an extent descriptor.

HFS extents record

The HFS extents record (HFSExtentRecord) is 12 bytes in size and consists of:

OffsetSizeValueDescription
03 x 4 = 12Array of HFS extent descriptors

HFS extent descriptor

The HFS extents descriptor (HFSExtentDescriptor) is 4 bytes in size and consists of:

OffsetSizeValueDescription
02Physical block number, which contains a block number relative from the start of the data area
22Number of blocks
extent_offset = (data_area_block_number + extent_block_number) * block_size

An unused extent descriptor should have both the block number and number of blocks set to 0.

HFS+ and HFSX extents record

The HFS+ and HFSX extents record (HFSPlusExtentRecord) is 64 bytes in size and consists of:

OffsetSizeValueDescription
08 x 8 = 64Array of HFS+ extent descriptors

HFS+ and HFSX extent descriptor

The HFS+ and HFSX extents descriptor (HFSPlusExtentDescriptor) is 8 bytes in size and consists of:

OffsetSizeValueDescription
04Physical block number, which contains a block number relative from the start of the volume
44Number of blocks
extent_offset = extent_block_number * block_size

An unused extent descriptor should have both the block number and number of blocks set to 0.

Bad Block File

The extents overflow file is also used to hold information about the bad blocks; refered to as the bad block file. The bad block file is used to mark areas on the disk as bad, unable to be used for storing data; typically to map out bad sectors on the storage medium.

Typically, blocks are larger than sectors. If a single sector is found to be bad, the entire block is unusable. The bad block file is sometimes used to mark blocks as unusable when they are not bad, e.g. in the HFS wrapper.

Bad block extent records are always assumed to reference the data fork (fork type of 0).

Allocation (bitmap) file

The allocation file is uzed to keep track of whether each block in a volume is currently allocated to some file system structure or not. The contents of the allocation file is a bitmap. The bitmap contains one bit for each block in the volume.

  • If a bit is set, the corresponding block is currently in use by some file system structure.
  • If a bit is clear, the corresponding block is not currently in use, and is available for allocation.

The size of the allocation file depends on the number of blocks in the volume, which in turn depends both on the size of the disk and on the size of the volume’s blocks. For example, a volume on a 1 GB disk and having an block size of 4 KB needs an allocation file size of 256 Kbits (32 KiB, or 8 blocks). Since the allocation file itself is allocated using blocks, it always occupies an integral number of blocks (its size may be rounded up).

The allocation file may be larger than the minimum number of bits required for the given volume size. Any unused bits in the bitmap must be set to 0.

Each byte in the allocation file holds the state of eight blocks. The byte at offset X into the file contains the allocation state of allocations blocks (N x 8) through (N x 8 + 7). Within each byte, the most significant bit holds information about the block with the lowest number, the least significant bit holds information about the block with the highest number. Listing 1 shows how you would test whether an block is in use, assuming that you’ve read the entire allocation file into memory.

Determining whether a block is in use.

static Boolean IsAllocationBlockUsed(UInt32 thisAllocationBlock,
                                     UInt8 *allocationFileContents)
{
    UInt8 thisByte;

    thisByte = allocationFileContents[thisAllocationBlock / 8];
    return (thisByte & (1 << (7 - (thisAllocationBlock % 8)))) != 0;
}

Attributes file

The attributes file is a B-tree file used to store extended attributes.

The location of the attributes file can be found in the HFS+ and HFSX volume header.

Attributes file keys

An attributes file key is of variable size and consists of:

OffsetSizeValueDescription
02Key data size, in bytes
If key data size >= 12
22Unknown
44Identifier (CNID)
84Unknown
122Number of characters in the name string
14...Name string, which contains an UTF-16 big-endian string without end-of-string character

Note that the name of an extended attribute appears to be case senstive even on a case insensitive file system.

The attributes file data

The attributes file defines two types of attributes:

  1. Fork data attributes, which are used for attributes whose data is large. The attribute’s data is stored in extents on the volume and the attribute merely contains a reference to those extents.
  2. Extension attributes, which are used to augment fork descriptor structure, allowing a forks to have more than eight extents.

Attributes file data record header

Each attributes file data record starts with a type value, which describes the type of attribute data record.

The attributes file data record header is 4 bytes in size and consists of:

OffsetSizeValueDescription
04Record type
The attributes data record types
ValueIdentifierDescription
0x00000010kHFSPlusAttrInlineDataAttribute record with inline data
0x00000020kHFSPlusAttrForkDataAttribute record with fork descriptor
0x00000030kHFSPlusAttrExtentsAttribute record with extents overflow

Note that at the moment it is unclear when an attribute record of type kHFSPlusAttrExtents is created and how it should be handled.

The inline data attribute record

The inline data attribute record is of variable size and consists of:

OffsetSizeValueDescription
040x00000010Record type
440Unknown (Reserved)
84Unknown
124Attribute data size
16...Attribute data

The fork descriptor attribute record

The fork descriptor attribute record is 88 bytes in size and consists of:

OffsetSizeValueDescription
040x00000020Record type
440Unknown (Reserved)
880Attribute fork descriptor

The extents attribute record

The extents attribute record is 72 bytes in size and consists of:

OffsetSizeValueDescription
040x00000030Record type
440Unknown (Reserved)
864Attribute extents record

Startup file

The startup file is a file system metadata file intended to hold information needed when booting a system that does not have built-in (ROM) support for HFS+ (or HFSX). A boot loader can find the startup file without full knowledge of the format using the first eight extents of the startup file located in the volume header.

Format wise it is valid for the startup file to contain more than eight extents, but in doing so the purpose of the startup file is defeated.

The next block number is used by Mac OS as a hint for where to start searching for available blocks when allocating space for a file.

Metadata zone and hot files

In Mac OS X 10.3 a metadata zone was instroduced to store certain file system metadata, such as allocation bitmap file, extents overflow file, and the catalog file, the journal file and frequently used small files (also referred to as “hot files”) near each other to reduces seek time for typical accesses.

Hot File B-tree

The hot file B-tree is a file named “.hotfiles.btree” stored the root directory.

Journal

A HFS+ (or HFSX) volume may have an optional journal to speed recovery when mounting a volume that was not unmounted safely. The purpose of the journal is to ensure that when a group of related changes are being made, that either all of those changes are actually made, or none of them are made. The journal makes it quick and easy to restore the volume structures to a consistent state, without having to scan all of the structures. The journal is used only for the volume structures and metadata; it does not protect the contents of a fork.

The volume header specifies if journalling is activated.

The journal data stuctures consist of:

  • a journal information block, contains the location and size of the journal header and journal buffer;
  • a journal header, describes which part of the journal buffer is active and contains transactions waiting to be committed;
  • a journal buffer, a cyclic buffer to hold the file system meta data transactions.

On HFS+ volumes, the journal information block is stored as a file. The name of that file is “.journal_info_block” and it is stored in the volume’s root directory.

The journal header and journal buffer are stored together in a different file named “.journal”, also in the volume’s root directory. Each of these files are contiguous on disk, they occupy exactly one extent.

The volume header contains the extent of the journal information block file. The journal information block contains the location of the journal file.

Journal information block

The journal information block describes where the journal header and journal buffer are stored. The journal information block is stored at the start of the block referred to by the volume header.

The journal information block is 44 bytes in size and consists of:

OffsetSizeValueDescription
04Journal flags
48 x 4 = 32Device signature
368Journal header offset
448Journal size, in bytes, which includes the size of the journal header and the journal buffer, but not the journal information block
5232 x 4 = 1280x00Unknown (Reserved)

Journal flags

The journal flags consist of the following values:

Value(s)Description
0x00000001On volume, where the journal header offset is relative to the start of the volume
0x00000002On other device, where the device signature identifies the device containing the journal and the journal header offset is relative to the start of the device
0x00000004Needs initialization, to indicate that there are no valid transactions in the journal and needs to be initialized

Note that according to TN1150 journals stored on a separate device are not supported.

The journal header

The journal header is 44 bytes in size and consists of:

OffsetSizeValueDescription
04"\x4a\x4e\x4c\x78"Signature
44"\x12\x34\x56\x78"Byte order (or endian) signature
88First transaction start offset
168Next transaction start offset
248Journal size, in bytes, which includes the size of the journal header and buffer
324Journal block header size, in bytes, typically ranges from 4096 to 16384
364checksum
404Journal header size, in bytes, typically the size of one sector

First and next transaction offset

The first transaction offset contains the offset in bytes from the start of the journal header to the start of the first (oldest) transaction.

The next transaction offset contains the offset in bytes from the start of the journal header to the end of the last (newest) transaction. Note that this field may be less than the start field, indicating that the transactions wrap around the end of the journal’s circular buffer. If end equals start, then the journal is empty, and there are no transactions that need to be replayed.

Journal transactions

A single transaction is stored in the journal as several blocks. These blocks include both the data to be written and the location where that data is to be written. This is represented on storage medium by a block list header, which describes the number and sizes of the blocks, immediately followed by the contents of those blocks.

Since block list headers are of limited size, a single transaction may consist of several block list headers and their associated block contents. If the next value in the first block information structure is non-zero, then the next block list header is a continuation of the same transaction.

The journal buffer is treated as a circular buffer. When reading or writing the journal buffer, the I/O operation must stop at the end of the journal buffer and resume (wrap around) immediately following the journal header. Block list headers or the contents of blocks may wrap around in this way. Only a portion of the journal buffer is active at any given time; this portion is indicated by the start and end fields of the journal header. The part of the journal buffer that is not active contains no meaningful data, and must be ignored.

To prevent ambiguity when start equals end, the journal is never allowed to be perfectly full (all of the journal buffer used by block lists and blocks). If the journal was perfectly full, and start was not equal to jhdr_size, then end would be equal to start. You would then be unable to differentiate between an empty and full journal.

When the journal is not empty (contains transactions), it must be replayed to be sure the volume is consistent. That is, the data from each of the transactions must be written to the correct blocks on disk.

The journal block list header

The block list header describes a list of blocks included in a transaction. A transaction may include several block lists if it modifies more blocks than can be represented in a single block list.

The journal block list header is 16 bytes in size and consists of:

OffsetSizeValueDescription
02Maximum number of journal blocks
22Number of journal blocks following the journal block header, typically 1
44Block list size, in bytes, which includess the size of the header and blocks
84Checksum
1240x00Unknown (Alignment padding)
16...Journal block information array

Note that the number of journal blocks includes the first journal block, The first journal block is reserved to be used when multiple blocks need to be chained, therefore the number of journal blocks actually containing data is minus one (-1).

Journal block information

The journal block information is 16 bytes in size and consists of:

OffsetSizeValueDescription
08Block sector number
84Block size, in bytes
124Next journal block

Journal checksum

The journal header and block list header both contain checksum values. The checksums are verified as part of a basic consistency check of these journal data structures. To verify the checksum, temporarily set the checksum field to 0 and then call the hfs_plus_calculate_checksum routine as specified below.

uint32_t hfs_plus_calculate_checksum(
          uint8_t *buffer,
          size_t buffer_size )
{
    size_t buffer_offset = 0;
    uint32_t checksum    = 0;

    for( buffer_offset = 0;
         buffer_offset < buffer_size;
         buffer_offset++)
    {
        checksum = ( checksum << 8 ) ^ ( checksum + buffer[ buffer_offset ] );
    }
    return( ~checksum );
}

Application specific data structures

HFS, HFS+ and HFSX contain application specific data structures.

Finder information

The finder information in the master directory block (MDB) and volume header consists of an array of 32-bit values. This array contains information used by the Mac OS Finder and the system software boot process.

Array entryDescription
0Bootable system directory identifier (CNID), i.e. "System Folder" in Mac OS 8 or 9, or "/System/Library/CoreServices" in Mac OS X. Typically 3 or 5, is 0 if the volume is not bootable
1Startup application parent identifier (CNID), i.e. "Finder". Is 0 if the volume is not bootable
2Directory identifier (CNID) to display in Finder on mount, or 0 if none
3Directory identifier (CNID) of a bootable Mac OS 8 or 9 System Folder, or 0 if none
4Unknown (Reserved)
5Directory identifier (CNID) of a bootable Mac OS X system, the "/System/Library/CoreServices" directory, or 0 if none
6 and 7Mac OS X volume identifier, consist of a 64-bit integer

File information

HFS file information

The HFS file information is 16 bytes in size and consists of:

OffsetSizeValueDescription
04 x 1 = 4File type, which consists of an array of unsigned 8-bit integers
44 x 1 = 4File creator, which consists of an array of unsigned 8-bit integers
82Finder flags
104Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically
142File icon window, which contains the window in which the file's icon appears

HFS extended file information

The HFS extended file information is 16 bytes in size and consists of:

OffsetSizeValueDescription
02Finder icon identifier
23 x 2 = 6Unknown (Reserved), which consists of an array of signed 16-bit integers
81Extended finder script code flags
91Extended finder flags
102Finder comment identifier, which consists of a signed 16-bit integer
124Put away folder identifier (CNID)

HFS+ and HFSX file information

The HFS+ and HFSX file information (FileInfo) is 16 bytes in size and consists of:

OffsetSizeValueDescription
04 x 1 = 4File type, which consists of an array of unsigned 8-bit integers
44 x 1 = 4File creator, which consists of an array of unsigned 8-bit integers
82Finder flags
104Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically
142Unknown (Reserved)

HFS+ and HFSX extended file information

The HFS+ and HFSX extended file information (ExtendedFileInfo) is 16 bytes in size and consists of:

OffsetSizeValueDescription
04Unknown (Reserved)
If kHFSHasDateAddedMask is not set
44Unknown (Reserved)
If kHFSHasDateAddedMask is set
44Added time, which contains a POSIX timestamp in UTC
Common
82Extended finder flags
102Unknown (Reserved), which consists of a signed 16-bit integer
124Put away folder identifier (CNID)

Folder information

HFS folder information

The HFS folder information is 16 bytes in size and consists of:

OffsetSizeValueDescription
08Window position and dimension (boundaries), which contains the top, left, bottom, right-coordinate values
82Finder flags
104Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically
142Folder view

HFS extended folder information

The HFS extended folder information is 16 bytes in size and consists of:

OffsetSizeValueDescription
04Scroll position for icon view, which contains x and y-coordinate values
If kHFSHasDateAddedMask is not set
44Open folder identifier chain, which consists of a signed 32-bit integer
If kHFSHasDateAddedMask is set
44Added time, which contains a POSIX timestamp in UTC
Common
81Extended finder script code flags
91Extended finder flags
102Finder comment identifier, which consists of a signed 16-bit integer
124Put away folder identifier (CNID)

HFS+ and HFSX folder information

The HFS+ and HFSX folder information is 16 bytes in size and consists of:

OffsetSizeValueDescription
08Window position and dimension (boundaries), which contains the top, left, bottom, right-coordinate values
82Finder flags
104Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically
142Unknown (Reserved)

HFS+ and HFSX extended folder information

The HFS+ and HFSX extended folder information is 16 bytes in size and consists of:

OffsetSizeValueDescription
04Scroll position for icon view, which contains x and y-coordinate values
44Unknown (Reserved), which consists of a signed 32-bit integer
82Extended finder flags
102Unknown (Reserved), which consists of a signed 16-bit integer
124Put away folder identifier (CNID)

Finder flags

The finder flags consists of the following values:

Value(s)Applies toDescription
0x0001Files and foldersIs on desktop
0x000eFiles and foldersColor
0x0040FilesIs shared
0x0080FilesHas no INITs
0x0100FilesHas been inited
0x0400Files and foldersHas custom icon
0x0800FilesIs stationary
0x1000Files and foldersName locked
0x2000FilesHas bundle
0x4000Files and foldersIs invisible
0x8000FilesIs alias

Extended finder flags

The extended finder flags consists of the following values:

Value(s)Description
0x0004Has routing information
0x0100Has custom badge resource
0x8000Extended flags are invalid, which indicates that set the other extended flags should be ignored

Notes

struct Point {
  SInt16              v;
  SInt16              h;
};
typedef struct Point  Point;

struct Rect {
  SInt16              top;
  SInt16              left;
  SInt16              bottom;
  SInt16              right;
};
typedef struct Rect   Rect;

/* OSType is a 32-bit value made by packing four 1-byte characters
   together. */
typedef UInt32        FourCharCode;
typedef FourCharCode  OSType;

File content

HFS supports multiple ways to store file content:

  • Data fork
  • Compressed data extended attribute
  • Compressed data extended attribute with resource fork
  • Resource fork
  • Extended attribute (named fork)

Data fork

The file content size is stored in the data fork descriptor of the catalog file record.

The extents of the file content are stored in the fork descriptor and extents overflow file.

Compressed data extended attribute

The file has an attribute record with inline data named “com.apple.decmpfs” with compression method 3, 5 or 7.

The file content size is stored in the compressed data header of the extended attribute.

For compression method 3 or 7 the file content data is stored in the extended attribute after the decmpfs compressed data header.

For compression method 5 the file content data contains 0-byte values. There are 12 bytes stored after the decmpfs compressed data header that consists of:

OffsetSizeValueDescription
04Unknown (Seen: 1)
44Unknown
84Unknown (Seen: 0)

Compressed data extended attribute with resource fork

The file has an attribute record with inline data named “com.apple.decmpfs” with compression method 4 or 8.

The file content size is stored in the compressed data header of the extended attribute.

The file content data is stored in a “com.apple.ResourceFork” extended attribute.

The compressed data starts with metadata that contains the offsets of the compressed data blocks.

ZLIB (DEFLATE) compressed data

  • ZLIB (DEFLATE) compressed header
  • Unknown (empty values)
  • ZLIB (DEFLATE) compressed data block offsets and sizes
  • ZLIB (DEFLATE) compressed data blocks
  • ZLIB (DEFLATE) compressed footer
ZLIB (DEFLATE) compressed header

The ZLIB (DEFLATE) compressed header is 16 bytes in size and consists of:

OffsetSizeValueDescription
04Compressed data block descriptors offset, where the offset is relative from the start of the ZLIB (DEFLATE) compressed data
44Compressed footer offset, where the offset is relative from the start of the ZLIB (DEFLATE) compressed data
84Compressed data block descriptors and data size
124Compressed footer size

Note that the values in the ZLIB (DEFLATE) compressed header are stored in big-endian.

ZLIB (DEFLATE) compressed data block descriptors

The ZLIB (DEFLATE) compressed data block descriptors are of variable size and consist of:

OffsetSizeValueDescription
04Compressed data size
44Number of compressed data block offset and size tuples
88 x ...Array of compressed data block descriptors
ZLIB (DEFLATE) compressed data block descriptor

The ZLIB (DEFLATE) compressed data block descriptor is 8 bytes in size and consists of:

OffsetSizeValueDescription
04Compressed block offset, where the offset is relative from the start of the ZLIB (DEFLATE) compressed data + 20
44Compressed block size

The ZLIB (DEFLATE) compressed footer is 50 bytes size and consists of:

OffsetSizeValueDescription
024Unknown (empty values)
242Unknown
262Unknown
282Unknown
302Unknown
324"cmpf"Unknown (signature)
364Unknown
404Unknown
446Unknown (empty values)

Note that the values in the ZLIB (DEFLATE) compressed footer are stored in big-endian.

LZVN compressed data

OffsetSizeValueDescription
04 x ...Array of compressed data block offsets, where an offset is relative from the start of the LZVN compressed data
......LZVN compressed data blocks

Note that the compressed data block contains a maximum of 65536 bytes of data. The compressed data block therefore should not exceed 65537 bytes in size.

Resource fork

The file content size is stored in the resource fork descriptor of the catalog file record.

The extents of the file content are stored in the fork descriptor and extents overflow file.

Extended attribute (named fork)

Extended attributes, also referred to as named forks, are stored in the HFS+ attributes file.

HFS wrapper

TODO: complete section

A HFSX volume cannot be wrapped in a HFS volume.

References

Macintosh File System (MFS)

The Macintosh File System (MFS) is the first file system created for Mac OS, intended for 400 KiB floppy disks.

Overview

A MFS file system consists of:

The backup master directory block (MDB), is stored in the last 2 sectors of the volume.

Characteristics

CharacteristicsDescription
Byte orderbig-endian
Date and time valuesTODO
Character stringsNarrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a system defined codepage

Terminology

TermDescription
Clump sizeSize of the group of (allocation) blocks (or clump), in bytes, to avoid fragmentation

Boot Block

If a volume is bootable, the first 2 blocks of the volume contain boot block. The boot block consists of:

  • boot block header
  • boot code
  • unknown (filler)

Boot Block Header

The boot block header is 138 or 144 bytes in size and consists of:

OffsetSizeValueDescription
02"LK" (or "\x4c\x4b")Boot block signature
24Boot code entry point
61Flags
71Format version
82Page flags (or Secondary Sound and Video Pages)
101System file name size, with a maximum of 15
1115System file name
261Finder (or shell) file name size, with a maximum of 15
2715Finder (or shell) file name, typically "Finder"
421Debugger file name size, with a maximum of 15
4315Debugger file name, typically "Macsbug"
581Disassembler (or second debugger) file name size, with a maximum of 15
5915Disassembler (or second debugger) file name, typically "Disassembler"
741Startup screen file name size, with a maximum of 15
7515Startup screen file name, typically "StartUpScreen"
901Startup (or bootup) file name size, with a maximum of 15
9115Startup (or bootup) file name, typically "Finder"
1061Clipboard (or scrap) file name size, with a maximum of 15
10715Clipboard (or scrap) file name, typically "Clipboard"
1222Number of allocated file control blocks (FCBs)
1242Number of elements in the event queue, typically 20
1264System heap size on Macintosh computer with 128 KiB of RAM
1304System heap size on Macintosh computer with 256 KiB of RAM
1344System heap size on Macintosh computer with +512 KiB of RAM
Newer boot block header format
1384Additional system heap space
1404Fraction of available RAM for the system heap

Note that “LK” presumably is short for “Larry Kenyon” who originally designed MFS.

Boot code entry point

The boot code entry point contains machine-language instructions that translate to:

BRA.S *+ 0x90

Or for older versions of the boot block header:

BRA.S *+ 0x88
BRA.W *+ 0x88
BRA     $88(PC)         * $6000,$0086

This instruction jumps to the main boot code following the boot block header.

This field is ignored, however, if bit 6 is clear in the high-order byte of the boot block version number or if the low-order byte contains 0x0d.

Boot Block Header Flags

Bit(s)Description
0 - 4Unknown (Reserved), should contain 0
5Use relative system heap sizing
6Execute boot code
7Newer boot block header format is used

If bit 7 of the flag byte is clear, then bits 5 and 6 are ignored and the version number is set in the format version value.

If the format version value is:

  • less than 21, the values in the system heap size on 128K Mac and 256K Mac should be ignored and the value in system heap size on all machines should be used.
  • 13 the boot code should be executed using the value in boot code entry point.
  • greater than or equal to 21 the value in system heap size on all machines should be used.

If bit 7 of the flag byte is set

  • bit 6 should be used to determine whether to execute the boot code using the value in boot code entry point.
  • bit 5 should be used to determine whether to use relative System heap sizing. If bit 5 is
    • clear the value in system heap size on all machines should be used.
    • is set the System heap is extended by the value in the additional system heap space plus the fraction of available RAM for the system heap.

Master Directory Block (MDB)

The Master Directory Block (MDB) is located at offset 1024 of the volume and consists of:

  • master directory block header
  • block map

Master Directory Block (MDB) header

The Master Directory Block (MDB) header is 64 bytes in size and consists of:

OffsetSizeValueDescription
02"\xd2\xd7"Volume signature
24Creation date and time, which contains a HFS timestamp in local time
64Last modification date and time, which contains a HFS timestamp in local time
102Volume attribute flags
122Number of files in the root directory
142File directory area sector number, contains a sector number relative from the start of the volume, where 0 is the first sector number
162File directory area size, in number of sectors
182Number of blocks
204Block size, in bytes, must be a multitude of 512
244Clump size, in bytes
282Data area sector number, contains a sector number relative from the start of the volume, where 0 is the first sector number
304Next available file identifier
342Number of unused blocks
361Volume label size, with a maximum of 27
3727Volume label

Block map

TODO: describe similar to FAT-12 block allocation table

File Directory Area

The file directory area consists of:

  • one or more file directory entries, where an individual file directory entry does not span multiple blocks

File Directory Entry

A file directory entry is of variable size and consists of:

OffsetSizeValueDescription
01Flags, where 0x80 indicates the file directory entry is in use
110Format version
24"\x3f\x3f\x3f\x3f"File type
64File creator
102Finder flags
124Window position and dimension (boundaries), which contains the top, left, bottom, right-coordinate values
162Folder file identifier, where 0 represents the main volume, -2 the desktop, -3 the trash, otherwise, if positive, a file identifier
184File identifier
222Data fork block number, contains 0 if the file entry has no data fork
244Data fork size, in bytes
284Data fork allocated size, in bytes
322Resource fork block number, contains 0 if the file entry has no resource fork
344Resource fork size, in bytes
384Resource fork allocated size, in bytes
424Creation date and time, which contains a HFS timestamp in local time
464(Content) modification date and time, which contains a HFS timestamp in local time
501File name size, with a maximum of 255
51...File name
......16-bit alignment padding

New Technologies File System (NTFS) format

The New Technologies File System (NTFS) format is the primary file system for Microsoft Windows versions that are based on Windows NT.

Overview

An New Technologies File System (NTFS) consists of:

  • boot record
  • boot loader
  • Master File Table (MFT)
  • Mirror Master File Table (MFT)

Characteristics

CharacteristicsDescription
Byte orderlittle-endian
Date and time valuesFILETIME in UTC
Character stringsUCS-2 little-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00"

Versions

Format versionRemarks
1.0Introduced in Windows NT 3.1
1.1Introduced in Windows NT 3.5, also seen to be used by Windows NT 3.1
1.2Introduced in Windows NT 3.51
3.0Introduced in Windows 2000
3.1Introduced in Windows XP

Note that the format versions mentioned above are the version as used by NTFS. Another common versioning schema uses the Windows version, e.g. NTFS 5.0 is the version of NTFS used on Windows XP which is version 3.1 in schema mentioned above.

Windows does not necessarily uses the latest format version, e.g. Windows 10 (1809) has been observed to use NTFS version 1.2 for 64k cluster block size.

Terminology

Cluster

NTFS refers to it file system blocks as clusters. Note that these are not the same as the physical clusters of a harddisk. For clarity this document will refer to these as cluster blocks. In other sources they are also referred to as logical clusters.

Typically a cluster block is 8 sectors (or 8 x 512 = 4096 bytes) in size. A cluster block number is relative to the start of the boot record.

Virtual cluster

The term virtual cluster refers to cluster blocks which are relative to the start of a data stream.

Long and short (file) name

In Windows terminology the name of a file (or directory) can either be short or long. The short name is an equivalent of the file name in the (DOS) 8.3 format. The long name is actual the (full) name of the file. The term long refers to the aspect that the name is longer than the short variant. Because most documentation refer to the (full) name as the long name, for clarity sake so will this document.

Metadata files

NTFS uses the Master File Table (MFT) to store information about files and directories. The MFT entries reference the different volume and file system metadata. There are several predefined metadata files.

The following metadata files are predefined and use a fixed MFT entry number.

MFT entry numberFile nameDescription
0"$MFT"Master File Table
1"$MFTMirr"Back up of the first 4 entries of the Master File Table
2"$LogFile"Metadata transaction journal
3"$Volume"Volume information
4"$AttrDef"MFT entry attribute definitions
5"."Root directory
6"$Bitmap"Cluster block allocation bitmap
7"$Boot"Boot record (or boot code)
8"$BadClus"Bad clusters
Used in NTFS version 1.2 and earlier
9"$Quota"Quota information
Used in NTFS version 3.0 and later
9"$Secure"Security and access control information
Common
10"$UpCase"Case folding mappings
11"$Extend"A directory containing extended metadata files
12-15Unknown (Reserved), which are marked as in-use but are empty
16-23Unused, which are marked as unused
Used in NTFS version 3.0 and later
24"$Extend$Quota"Quota information
25"$Extend$ObjId"Unique file identifiers for distributed link tracking
26"$Extend$Reparse"Backreferences to reparse points
Transactional NTFS metadata files, which have been observed in Windows Vista and later
27"$Extend$RmMetadata"Resource manager metadata directory
28"$Extend$RmMetadata$Repair"Repair information
29 or 30"$Extend$RmMetadata$TxfLog"Transactional NTFS (TxF) log metadata directory
30 or 31"$Extend$RmMetadata$Txf"Transactional NTFS (TxF) metadata directory
31 or 32"$Extend$RmMetadata$TxfLog$Tops"TxF Old Page Stream (TOPS) file, which is used to store data that has been overwritten inside a currently active transaction
32 or 33"$Extend$RmMetadata$TxfLog$TxfLog.blf"Transactional NTFS (TxF) base log metadata file
Observed in Windows 10 and later
29"$Extend$Deleted"Temporary location for files that have an open handle but a request has been made to delete them
Common
...A file or directory

The following metadata files are predefined, however the MFT entry number is commonly used but not fixed.

MFT entry numberFile nameDescription
"$Extend$UsnJrnl"USN change journal

The boot record

The boot record is stored at the start of the volume (in the $Boot metadata file) and contains:

  • the file system signature
  • the BIOS parameter block
  • the boot loader
OffsetSizeValueDescription
03Boot entry point
38"NTFS\x20\x20\x20\x20"File system signature (Also known as OEM identifier or dummy identifier)
DOS version 2.0 BIOS parameter block (BPB)
112Bytes per sector. Note that the following values are supported by mkntfs: 256, 512, 1024, 2048 and 4096
131Number of sectors per cluster block
1420Unknown (Reserved Sectors), which is not used by NTFS and must be 0
1610Number of cluster block allocation tables, which is not used by NTFS and must be 0
1720Number of root directory entries, which is not not used by NTFS and must be 0
1920Number of sectors (16-bit), which is not used by NTFS must be 0
211Media descriptor
2220Cluster block allocation table size (16-bit) in number of sectors, which is not used by NTFS and must be 0
DOS version 3.4 BIOS parameter block (BPB)
2420x3fSectors per track, which is not used by NTFS
2620xffNumber of heads, which is not used by NTFS
2840x3fNumber of hidden sectors, which is not used by NTFS
3240x00Number of sectors (32-bit), which is not used by NTFS must be 0
NTFS version 8.0 BIOS parameter block (BPB) or extended BPB, which was introduced in Windows NT 3.1
3610x80Unknown (Disc unit number), which is not used by NTFS
3710x00Unknown (Flags), which is not used by NTFS
3810x80Unknown (BPB version signature byte), which is not used by NTFS
3910x00Unknown (Reserved), which is not used by NTFS
408Number of sectors (64-bit)
488Master File Table (MFT) cluster block number
568Mirror MFT cluster block number
644MFT entry size
684Index entry size
728Volume serial number
8040Checksum, which is not used by NTFS
Common
84426Boot code
5102"\x55\xaa"The (boot) signature

Boot entry point

The boot entry point often contains a jump instruction to the boot code at offset 84 followed by a no-operation, e.g.

eb52   jmp 0x52
90     nop

Number of sectors per cluster block

The number of sectors per cluster block value as used by mkntfs is defined as following:

  • Values 0 to 128 represent sizes of 0 to 128 sectors.
  • Values 244 to 255 represent sizes of 2^(256-n) sectors.
  • Other values are unknown.

Cluster block size

The cluster block size can be determined as following:

cluster block size = bytes per sector x sectors per cluster block

Different NTFS implementations support different cluster block sizes. Known supported cluster block size:

Cluster block sizeBytes per sectorSupported by
256256mkntfs
512256 - 512mkntfs, ntfs3g, Windows
1024256 - 1024mkntfs, ntfs3g, Windows
2048256 - 2048mkntfs, ntfs3g, Windows
4096256 - 4096mkntfs, ntfs3g, Windows
8192256 - 4096mkntfs, ntfs3g, Windows
16K (16384)256 - 4096mkntfs, ntfs3g, Windows
32K (32768)256 - 4096mkntfs, ntfs3g, Windows
64K (65536)256 - 4096mkntfs, ntfs3g, Windows
128K (131072)256 - 4096mkntfs, ntfs3g, Windows 10 (1903)
256K (262144)256 - 4096mkntfs, ntfs3g, Windows 10 (1903)
512K (524288)256 - 4096mkntfs, ntfs3g, Windows 10 (1903)
1M (1048576)256 - 4096mkntfs, ntfs3g, Windows 10 (1903)
2M (2097152)512 - 4096mkntfs, ntfs3g, Windows 10 (1903)

Note that Windows 10 (1903) requires the partition containing the NTFS file system to be aligned with the cluster block size. For example for a cluster block size of 128k the partition must 128 KiB aligned. The default partition partition alignment appears to be 64 KiB.

mkntfs restricts the cluster size to:

bytes_per_sector >= cluster_block_size > 4096 * bytes_per_sector

Master File Table (MFT) offset

The Master File Table (MFT) offset can be determined as following:

mft_offset = boot_record_offset + (mft_cluster_block_number * cluster_block_size)

The lower 32-bit part of the NTFS volume serial number is the Windows API (WINAPI) volume serial number. This can be determined by comparing the output of:

fsutil fsinfo volumeinfo C:
fsutil fsinfo ntfsinfo C:

Often the total number of sectors in the boot record will be smaller than the underlying partition. A (nearly identical) backup of the boot record is stored in last sector of cluster block, that follows the last cluster block of the volume. Often this is the 512 bytes after the last sector of the volume, but not necessarily. The backup boot record is not included in the total number of sectors.

Master File Table (MFT) and index entry size

The Master File Table (MFT) entry size and index entry size are defined as following:

  • Values 0 to 127 represent sizes of 0 to 127 cluster blocks.
  • Values 128 to 255 represent sizes of 2^(256-n) bytes or 2^(-n) if considered as a signed byte.
  • Other values are not considered valid.

BitLocker Drive Encryption (BDE)

BitLocker Drive Encryption (BDE) uses the file system signature: “-FVE-FS-”. Where FVE is an abbreviation of Full Volume Encryption.

The data structures of BDE on Windows Vista and 7 differ.

A Windows Vista BDE volume starts with:

eb 52 90 2d 46 56 45 26 46 53 2d

A Windows 7 BDE volume starts with:

eb 58 90 2d 46 56 45 26 46 53 2d

BDE is largely a stand-alone but has some integration with NTFS.

TODO: link to BDE format documentation

Volume Shadow Snapshots (VSS)

Volume Shadow Snapshots (VSS) uses the GUID 3808876b-c176-4e48-b7ae-04046e6cc752 (stored in little-endian) to identify its data.

VSS is largely a stand-alone but has some integration with NTFS.

TODO: link to VSS format documentation

Media descriptor

OffsetSizeValueDescription
0.01 bitSides, where single-sided (0) and double-sided (1)
0.11 bitTrack size, where 9 sectors per track (0) and 8 sectors per track (1)
0.21 bitDensity, where 80 tracks (0) and 40 tracks (1)
0.31 bitType, where Fixed disc (0) and Removable disc (1)
0.44 bitsAlways set to 1

The boot loader

OffsetSizeValueDescription
512Windows NT (boot) loader (NTLDR/BOOTMGR)

The Master File Table (MFT)

The MFT consist of an array of MFT entries. The offset of the MFT table can be found in the volume header and the size of the MFT is defined by the MFT entry of the $MFT metadata file.

Note that the MFT can consists of multiple data ranges, defined by the data runs in the $MFT metadata file.

MFT entry

Although the size of a MFT entry is defined in the volume header is commonly 1024 bytes in size and consists of:

  • The MFT entry header
  • The fix-up values
  • An array of MFT attribute values
  • Padding, which should contain 0-byte values

Note that the MFT entry can be filled entirely with 0-byte values. Seen in Windows XP for MFT entry numbers 16 - 23.

MFT entry header

The MFT entry header (FILE_RECORD_SEGMENT_HEADER) is 42 or 48 bytes in size and consists of:

OffsetSizeValueDescription
MULTI_SECTOR_HEADER
04"BAAD", "FILE"Signature
42The fix-up values (or update sequence array) offset, which contain an offset relative from the start of the MFT entry
62The number of fix-up values (or update sequence array size)
Common
88Metadata transaction journal sequence number, which contains a $LogFile Sequence Number (LSN)
162Sequence (number)
182Reference (link) count
202Attributes offset (or first attribute offset), which contains an offset relative from the start of the MFT entry
222MFT entry flags
244Used size in bytes
284MFT entry size in bytes
328Base record file reference
402First available attribute identifier
If NTFS version is 3.0
422Unknown (wfixupPattern)
444Unknown
If NTFS version is 3.1
422Unknown (wfixupPattern)
444MFT entry number
“BAAD” signature

According to NTFS documentation if during chkdsk, when a multi-sector item is found where the multi-sector header does not match the values at the end of the sector, it marks the item as “BAAD” and fill it with 0-byte values except for a fix-up value at the end of the first sector of the item. The “BAAD” signature has been seen to be used on Windows NT4 and XP.

Sequence number

According to FILE_RECORD_SEGMENT_HEADER structure the sequence number is incremented each time that a file record segment is freed; it is 0 if the segment is not used.

Base record file reference

The base record file reference is used to store additional attributes for another MFT entry, e.g. for attribute lists.

MFT entry flags

ValueIdentifierDescription
0x0001FILE_RECORD_SEGMENT_IN_USE, MFT_RECORD_IN_USEIn use
0x0002FILE_FILE_NAME_INDEX_PRESENT, FILE_NAME_INDEX_PRESENT, MFT_RECORD_IS_DIRECTORYHas file name (or $I30) index. When this flag is set the file entry represents a directory
0x0004MFT_RECORD_IN_EXTENDUnknown. According to ntfs_layout.h this is set for all system files present in the $Extend directory
0x0008MFT_RECORD_IS_VIEW_INDEXIs index. When this flag is set the file entry represents an index. According to ntfs_layout.h this is set for all indices other than $I30

The fix-up values

The fix-up values are of variable size and consists of:

OffsetSizeValueDescription
02Fix-up placeholder value
22 x number of fix-up valuesFix-up (original) value array

On disk the last 2 bytes for each 512 byte block is replaced by the fix-up placeholder value. The original value is stored in the corresponding fix-up (original) value array entry.

Note that there can be more fix-up values than the number of 512 byte blocks in the data.

According to MULTI_SECTOR_HEADER structure the update sequence array must end before the last USHORT value in the first sector. It also states that the update sequence array size value contains the number of bytes, but based on analysis of data samples it seems to be more likely to the number of words.

In NT4 (version 1.2) the MFT entry is 42 bytes in size and the fix-up values are stored at offset 42. This is likely where the name wfixupPattern originates from.

TODO: provide examples on applying the fix-up values.

The file reference

The file reference (FILE_REFERENCE or MFT_SEGMENT_REFERENCE) is 8 bytes in size and consists of:

OffsetSizeValueDescription
06MFT entry number
62Sequence number

Note that the index value in the MFT entry is 32-bit in size.

MFT attribute

The MFT attribute consist of:

  • the attribute header
  • the attribute resident or non-resident data
  • the attribute name
  • Unknown data, likely alignment padding (4-byte alignment)
  • resident attribute data or non-resident attribute data runs
  • alignment padding (8-byte alignment), can contain remnant data

MFT attribute header

The MFT attribute header (ATTRIBUTE_RECORD_HEADER) is 16 bytes in size and consists of:

OffsetSizeValueDescription
04Attribute type (or type code)
44Attribute size (or record length), which includes the 8 bytes of the attribute type and size
81Non-resident flag (or form code), where RESIDENT_FORM (0) and NONRESIDENT_FORM (1)
91Name size (or name length), which contains the number of characters without the end-of-string character
102Name offset, which contains an offset relative from the start of the MFT attribute
122Attribute data flags
142Attribute identifier (or instance), which contains an unique identifier to distinguish between attributes that contain segmented data

MFT attribute data flags

ValueIdentifierDescription
0x0001Is LZNT1 compressed
0x00ffATTRIBUTE_FLAG_COMPRESSION_MASK
0x4000ATTRIBUTE_FLAG_ENCRYPTEDIs encrypted
0x8000ATTRIBUTE_FLAG_SPARSEIs sparse

TODO: determine the meaning of compression flag in the context of resident $INDEX_ROOT. Do the data flags have a different meaning for different attributes?

Resident MFT attribute

The resident MFT attribute data is present when the non-resident flag is not set (0). The resident data is 8 bytes in size and consists of:

OffsetSizeValueDescription
04Data size (or value length)
42Data offset (or value size), which contains an offset relative from the start of the MFT attribute
61Indexed flag
710x00Unknown (Padding)

TODO: determine the meaning of indexed flag bits, other than the LSB

Non-resident MFT attribute

The non-resident MFT attribute data is present when the non-resident flag is set (1). The non-resident data is 48 or 56 bytes in size and consists of:

OffsetSizeValueDescription
08First (or lowest) Virtual Cluster Number (VCN) of the data
88Last (or highest) Virtual Cluster Number (VCN) of the data
162Data runs offset (or mappings pairs offset), which contains an offset relative from the start of the MFT attribute
182Compression unit size, which contains the compression unit size as 2^(n) number of cluster blocks
204Unknown (Padding)
248Allocated data size (or allocated length), which contains the allocated data size in number of bytes. This value is not valid if the first VCN is nonzero
328Data size (or file size), which contains the data size in number of bytes. This value is not valid if the first VCN is nonzero
408Valid data size (or valid data length), which contains the valid data size in number of bytes. This value is not valid if the first VCN is nonzero
If compression unit size > 0
488Compressed data size

The total size of the data runs should be larger or equal to the data size.

Note that Windows will fill data beyond the valid data size with 0-byte values. The data size remains unchanged. This applies to compressed and uncompressed data. If the first VCN is zero a valid data size of 0 represents a file entirely filled with 0-byte values.

TODO: determine the meaning of a VCN of -1

For more information about compressed MFT attributes see compression.

Attribute name

The attribute name is of variable size and consists of:

OffsetSizeValueDescription
0...Name, which contains an UCS-2 little-endian string without end-of-string character

Data runs

The data runs are stored in a variable size (data) runlist. This runlist consists of runlist elements.

A runlist element is of variable size and consists of:

OffsetSizeValueDescription
0.0 4 bitsNumber of cluster blocks value size, which contains the number of bytes used to store the data run size
0.44 bitsCluster block number value size, which contains the number of bytes used to store the data run size
1Size value sizeData run number of cluster blocks, which contains the number of cluster blocks
...Cluster block number value sizeData run cluster block number

The data run cluster block number is a singed value, where the MSB is the singed bit, e.g. if the data run cluster block contains “dbc8” it corresponds to the 64-bit value 0xffffffffffffdbc8.

The first data run offset contains the absolute cluster block number where successive data run offsets are relative to the last data run offset.

Note that the cluster block number byte size is the first nibble when reading the byte stream, but here it is represented as the upper nibble of the first byte.

The last runlist element is (0, 0), which is stored as a 0-byte value.

According to NTFS documentation the size of the runlist is rounded up to the next multitude of 4 bytes, but based on analysis of data samples it seems that the size of the trailing data can be even larger than 3 and are not always 0-byte values.

TODO: provide examples of data runs

Sparse data runs

The MFT attribute data flag (ATTRIBUTE_FLAG_SPARSE) indicates if the data stream is sparse or not, where the runlist can contain both sparse and non-sparse data runs.

A sparse data run has a cluster block number value size of 0, representing there is no offset (cluster block number). A sparse data run is filled with 0-byte values.

Compressed data streams also define sparse data runs without setting the ATTRIBUTE_FLAG_SPARSE flag.

Note that $BadClus:$Bad also defines a data run with a cluster block number value size of 0, without setting the ATTRIBUTE_FLAG_SPARSE flag.

Compresssed data runs

The MFT attribute data flags (0x00ff) indicate if the data stream is compressed or not.

Windows supports compressed data runs for NTFS file systems with a cluster block size of 4096 bytes or less.

Windows 10 supports Windows Overlay Filter (WOF) compressed data, which stores the LZXPRESS Huffman or LZX compressed data in alternate data stream named WofCompressedData and links it to the default data stream using a reparse point.

The data is stored in compression unit blocks. A compression unit typically consists of 16 cluster blocks. However the actual value is stored in the non-resident MFT attribute.

Also see compression.

The attributes

Known attribute types

The attribute types are stored in the $AttrDef metadata file.

ValueIdentifierDescription
0x00000000Unused
0x00000010$STANDARD_INFORMATIONStandard information
0x00000020$ATTRIBUTE_LISTAttributes list
0x00000030$FILE_NAMEThe file or directory name
Used in NTFS version 1.2 and earlier
0x00000040$VOLUME_VERSIONVolume version
Used in NTFS version 3.0 and later
0x00000040$OBJECT_IDObject identifier
Common
0x00000050$SECURITY_DESCRIPTORSecurity descriptor
0x00000060$VOLUME_NAMEVolume label
0x00000070$VOLUME_INFORMATIONVolume information
0x00000080$DATAData stream
0x00000090$INDEX_ROOTIndex root
0x000000a0$INDEX_ALLOCATIONIndex allocation
0x000000b0$BITMAPBitmap
Used in NTFS version 1.2 and earlier
0x000000c0$SYMBOLIC_LINKSymbolic link
Used in NTFS version 3.0 and later
0x000000c0$REPARSE_POINTReparse point
Common
0x000000d0$EA_INFORMATION(HPFS) extended attribute information
0x000000e0$EA(HPFS) extended attribute
Used in NTFS version 1.2 and earlier
0x000000f0$PROPERTY_SETProperty set
Used in NTFS version 3.0 and later
0x00000100$LOGGED_UTILITY_STREAMLogged utility stream
Common
0x00001000First user defined attribute
0xffffffffEnd of attributes marker

Attribute chains

Multiple attributes can be chained to make up a single attribute data stream, e.g. the attributes:

  1. $INDEX_ALLOCATION ($I30) VCN: 0
  2. $INDEX_ALLOCATION ($I30) VCN: 596

The first attribute will contain the size of the data defined by all the attributes and successive attributes should have a size of 0.

It is assumed that the attributes in a chain must be continuous and defined in-order.

The standard information attribute

The standard information attribute ($STANDARD_INFORMATION) contains the basic file entry metadata. It is stored as a resident MFT attribute.

The standard information data (STANDARD_INFORMATION) is either 48 or 72 bytes in size and consists of:

OffsetSizeValueDescription
08Creation date and time, which contains a FILETIME
88Last modification (or last written) dat and time, which contains a FILETIME
168MFT entry last modification date and time, which contains a FILETIME
248Last access date and time, which contains a FILETIME
324File attribute flags
364Unknown (Maximum number of versions)
404Unknown (Version number)
444Unknown (Class identifier)
If NTFS version 3.0 or later
484Owner identifier
524Security descriptor identifier, which contains the entry number in the security ID index ($Secure:$SII). Also see Access Control
568Quota charged
648Update Sequence Number (USN)

Note that MFT entries have been observed without a $STANDARD_INFORMATION attribute, but with other attributes such as $FILE_NAME and an $I30 index.

Recent version of NTFS support case-sentive file names. If a directory is case-sensitive the corresponding $STANDARD_INFORMATION attribute will have a maximum number of versions of 0 and a version number of 1.

The attribute list attribute

The attribute list attribute ($ATTRIBUTE_LIST) is used to store MFT attributes outside the MFT entry, e.g. when the MFT entry is too small to store all the attributes.

The entries in the list reference the location of MFT attributes. The attribute list attribute can be stored as either a resident (for a small amount of data) or non-resident MFT attribute.

Note that MFT entry 0 also can contain an attribute list and allows to store listed attributes beyond the first data run.

The attribute list

An attribute list consists of:

  • one or more attribute list entries

The attribute list entry

An attribute list entry (ATTRIBUTE_LIST_ENTRY) is of variable size and consists of:

OffsetSizeValueDescription
04Attribute type (or type code)
42Size (or record length), which includes the 6 bytes of the attribute type and size
61Name size (or name length), which contains the number of characters without the end-of-string character
71Name offset, which contains an offset relative from the start of the attribute list entry
88Data first (or lowest) VCN
168File reference (or segment reference), which contains a reference to the MFT entry that contains (part of) the attribute data
242Attribute identifier (or instance), which contains an unique identifier to distinguish between attributes that contain segmented data
26...Name, which contains an UCS-2 little-endian string without end-of-string character
......alignment padding (8-byte alignment), can contain remnant data

The file name attribute

The file name attribute ($FILE_NAME) contains the basic file system information, like the parent file entry, various date and time values and name. It is stored as a resident MFT attribute.

The file name data (FILE_NAME) is of variable size and consists of:

OffsetSizeValueDescription
08Parent file reference
88Creation date and time, which contains a FILETIME
168Last modification (or last written) date and time, which contains a FILETIME
248MFT entry last modification date and time, which contains a FILETIME
328Last access date and time, which contains a FILETIME
408Allocated (or reserved) file size
488Data size
564File attribute flags
If FILE_ATTRIBUTE_REPARSE_POINT is set
604Reparse point tag
If FILE_ATTRIBUTE_REPARSE_POINT is not set
604Unknown (extended attribute data size)
Common
641Name string size, which contains the number of characters without the end-of-string character
651Namespace of the name string
66...Name, which contains an UCS-2 little-endian string without end-of-string character

An MFT attribute can contain multiple file name attributes, e.g. for a separate (long) name and short name.

In several cases on a Vista NTFS volume the MFT entry contained both a DOS & Windows and POSIX name space $FILE_NAME attribute. However the directory entry index ($I30) of the parent directory only contained the DOS & Windows name.

In case of a hard link the MFT entry will contain additional file name attributes with the parent file reference of each hard link.

Namespace

ValueIdentifierDescription
0POSIXCase-sensitive character set that consists of all Unicode characters except for: "\0" (zero character), "/" (forward slash). The ":" (colon) is valid for NTFS but not for Windows
1FILE_NAME_NTFS, WINDOWSCase-insensitive sub set of the POSIX character set that consists of all Unicode characters except for: " * / : < > ? \ | +. Note that names cannot end with a "." (dot) or " " (space)
2FILE_NAME_DOS, DOSCase-insensitive sub set of the WINDOWS character set that consists of all upper case ASCII characters except for: " * + , / : ; < = > ? \. Note that the name must follow the 8.3 format
3DOS_WINDOWSBoth the DOS and WINDOWS names are identical, which is the same as the DOS character set, with the exception that lower case is used as well

Note that the Windows API function CreateFile allows to create case-sensitive file names when the flag FILE_FLAG_POSIX_SEMANTICS is set.

Long to short name conversion

A short name can be determined from a long name with the following approach. In the long name:

  • ignore Unicode characters beyond the first 8-bit (extended ASCII)
  • ignore control characters and spaces (character < 0x20)
  • ignore non-allowed characters " * + , / : ; < = > ? \
  • ignore dots except the last one, which is used for the extension
  • make all letters upper case

Additional observations:

  • [ or ] are replaced by an underscore (_)

Make the name unique:

  1. use the characters 1 to 6 add ~1 and if the long name has an extension add the a dot and its first 3 letters, e.g. “Program Files” becomes “PROGRA~1” or “ ~PLAYMOVIE.REG“ becomes “~PLAYM~1.REG”
  2. if the name already exists try ~2 up to ~9, e.g. “Program Data”, in the same directory as “Program Files”, becomes “PROGRA~2”
  3. if the name already exists use a 16-bit hexadecimal value for characters 3 to 6 with ~1, e.g. “x86_microsoft-windows-r..ry-editor.resources_31bf3856ad364e35_6.0.6000.16386_en-us_f89a7b0005d42fd4” in a directory with a lot of file names starting with “x86_microsoft”, becomes “X8FCA6~1.163”

TODO: determine if the behavior is dependent on a setting that can be changed with fsutil

The volume version attribute

The volume version attribute ($VOLUME_VERSION) contains volume version.

TODO: complete section. Need a pre NTFS version 3.0 volume with this attribute. $AttrDef indicates the attribute to be 8 bytes in size.

The object identifier attribute

The object identifier attribute ($OBJECT_ID) contains distributed link tracker properties. It is stored as a resident MFT attribute.

The object identifier attribute data is either 16 or 64 bytes in size and consists of:

OffsetSizeValueDescription
016Droid file identifier, which contains a GUID
1616Birth droid volume identifier, which contains a GUID
3216Birth droid file identifier, which contains a GUID
4816Birth droid domain identifier, which contains a GUID

Droid in this context refers to CDomainRelativeObjId.

The security descriptor attribute

TODO: determine if this override any value in $Secure:$SDS?

The security descriptor attribute ($SECURITY_DESCRIPTOR) contains a Windows NT security descriptor. It can be stored as either a resident (for a small amount of data) and non-resident MFT attribute.

TODO: link to security descriptor format documentation

The volume name attribute

The volume name attribute ($VOLUME_NAME) contains the volume label. It is stored as a resident MFT attribute.

The volume name attribute data is of variable size and consists of:

OffsetSizeValueDescription
0...Volume label, which contains an UCS-2 little-endian string without end-of-string character

The volume name attribute is used in the $Volume metadata file MFT entry.

The volume information attribute

The volume information attribute ($VOLUME_INFORMATION) contains information about the volume. It is stored as a resident MFT attribute.

The volume information attribute data is 12 bytes in size and consists of:

OffsetSizeValueDescription
08Unknown
81Major format version
91Minor format version
102Volume flags

The volume information attribute is used in the $Volume metadata file MFT entry.

Volume flags

ValueIdentifierDescription
0x0001VOLUME_IS_DIRTYIs dirty
0x0002VOLUME_RESIZE_LOG_FILERe-size journal ($LogFile)
0x0004VOLUME_UPGRADE_ON_MOUNTUpgrade on next mount
0x0008VOLUME_MOUNTED_ON_NT4Mounted on Windows NT 4
0x0010VOLUME_DELETE_USN_UNDERWAYDelete USN in progress
0x0020VOLUME_REPAIR_OBJECT_IDRepair object identifiers
0x0080Unknown
0x4000VOLUME_CHKDSK_UNDERWAYchkdsk in progress
0x8000VOLUME_MODIFIED_BY_CHKDSKModified by chkdsk

The data stream attribute

The data stream attribute ($DATA) contains the file data. It can be stored as either a resident (for a small amount of data) and non-resident MFT attribute.

Multiple data attributes for the same data stream can be used in the attribute list to define different parts of the data stream data. The first data stream attribute will contain the size of the entire data stream data. Other data stream attributes should have a size of 0. Also see attribute chains.

The index root attribute

The index root attribute ($INDEX_ROOT) contains the root of the index tree. It is stored as a resident MFT attribute.

Also see the index and the index root.

The index allocation attribute

The index allocation attribute ($INDEX_ALLOCATION) contains an array of index entries. It is stored as a non-resident MFT attribute.

The index allocation attribute itself does not define which attribute type it contains in the index value data. For this information it needs the corresponding index root attribute.

Multiple index allocation attributes for the same index can be used in the attribute list to define different parts of the index allocation data. The first index allocation attribute will contain the size of the entire index allocation data. Other index allocation attributes should have a size of 0. Also see attribute chains.

Also see the index.

The bitmap attribute

The bitmap attribute ($BITMAP) contains the allocation bitmap. It can be stored as either a resident (for a small amount of data) and non-resident MFT attribute.

It is used to maintain information about which entry is used and which is not. Every bit in the bitmap represents an entry. The index is stored byte-wise with the LSB of the byte corresponds to the first allocation element. The allocation element can represent different things:

  • an MFT entry in the MFT (nameless) bitmap;
  • an index entry in an index ($I30).

The allocation element is allocated if the corresponding bit contains 1 or unallocated if 0.

The symbolic link attribute ($SYMBOLIC_LINK) contains a symbolic link.

TODO: complete section. Need a pre NTFS version 3.0 volume with this attribute. $AttrDef indicates the attribute is of variable size.

The reparse point attribute

The reparse point attribute ($REPARSE_POINT) contains information about a file system-level link. It is stored as a resident MFT attribute.

Als see the reparse point.

The (HPFS) extended attribute information

The (HPFS) extended attribute information ($EA_INFORMATION) contains information about the extended attribute ($EA).

The extended attribute information data is 8 bytes in size and consists of:

OffsetSizeValueDescription
02Size of an extended attribute entry
22Number of extended attributes which have the NEED_EA flag set
44Size of the extended attribute ($EA) data

The (HPFS) extended attribute

The (HPFS) extended attribute ($EA) contains the extended attribute data.

The extended attribute data is of variable size and consists of:

OffsetSizeValueDescription
04Offset to next extended attribute entry, where the offset is relative from the start of the extended attribute data
41Extended attribute flags
51Number of characters of the extended attribute name
62Value data size
8...The extended attribute name, which contains an ASCII string
......Value data
......Unknown

TODO: determine if the name is 2-byte aligned

Extended attribute flags

ValueIdentifierDescription
0x80NEED_EAUnknown (Need EA) flag

TODO: determine what the NEED_EA flag is used for

UNITATTR extended attribute value data

OffsetSizeValueDescription
04Unknown (equivalent of st_mode?)

The property set attribute

The property set attribute ($PROPERTY_SET) contains a property set.

TODO: complete section. Need a pre NTFS version 3.0 volume with this attribute. $AttrDef does not seem to always define this attribute.

The logged utility stream attribute

TODO: complete section

ValueIdentifierDescription
$EFSEncrypted NTFS (EFS)
$TXF_DATATransactional NTFS (TxF)

The attribute types

The attribute types are stored in the $AttrDef metadata file.

OffsetSizeValueDescription
0128Attribute which contains an UCS-2 little-endian string with end-of-string character. Unused bytes are filled with 0-byte values
1284Attribute type (or type code)
1328Unknown
1404Unknown (flags?)
1448Unknown (minimum attribute size?)
1528Unknown (maximum attribute size?)

The index

The index structures are used for various purposes one of which are the directory entries.

The root of the index is stored in index root. The index root attribute defines which type of attribute is stored in the index and the root index node.

If the index is too large part of the index is stored in an index allocation attribute with the same attribute name. The index allocation attribute defines a data stream which contains index entries. Each index entry contains an index node.

An index consists of a tree, where both the branch and index leaf nodes contain the actual data. E.g. in case of a directory entries index, any node that contains index value data make up for the directory entries.

The index value data in a branch node signifies the upper bound of the values in the that specific branch. E.g. if directory entries index branch node contains the name “textfile.txt” all names in that index branch are smaller than “textfile.txt”.

Note the actual sorting order is dependent on the collation type defined in the index root attribute.

The index allocation attribute is accompanied by a bitmap attribute with the corresponding attribute name. The bitmap attribute defines the allocation of virtual cluster blocks within the index allocation attribute data stream.

Note that the index allocation attribute can be present even though it is not used.

Common used indexes

Indexes commonly used by NTFS are:

ValueIdentifierDescription
$I30Directory entries (used by directories)
$SDHSecurity descriptor hashes (used by $Secure)
$SIISecurity descriptor identifiers (used by $Secure)
$OObject identifiers (used by $ObjId)
$OOwner identifiers (used by $Quota)
$QQuotas (used by $Quota)
$RReparse points (used by $Reparse)

The index root

The index root consists of:

  • index root header
  • index node header
  • an array of index values

The index root header

The index root header is 16 bytes in size and consists of:

OffsetSizeValueDescription
04Attribute type, which contains the type of the indexed attribute or 0 if none
44Collation type, which contains a value to indicate the ordering of the index entries
84Index entry size
124Number of cluster blocks per index entry

Note that for NTFS version 1.2 the index entry size does not have to match the index entry size in the volume header. The correct size seems to be the value in the index root header.

Collation type

ValueIdentifierDescription
0x00000000COLLATION_BINARYBinary, where the first byte is most significant
0x00000001COLLATION_FILENAMEUCS-2 strings case-insensitive, where the case folding is stored in $UpCase
0x00000002COLLATION_UNICODE_STRINGUCS-2 strings case-sensitive, where upper case letters should come first
0x00000010COLLATION_NTOFS_ULONGUnsigned 32-bit little-endian integer
0x00000011COLLATION_NTOFS_SIDNT security identifier (SID)
0x00000012COLLATION_NTOFS_SECURITY_HASHSecurity hash first, then NT security identifier
0x00000013COLLATION_NTOFS_ULONGSAn array of unsigned 32-bit little-endian integer values

The index entry

The index entry consists of:

  • the index entry header
  • the index node header
  • The fix-up values
  • alignment padding (8-byte alignment), contains zero-bytes
  • an array of index values

The index entry header

The index entry header is 24 bytes in size and consists of:

OffsetSizeValueDescription
04"INDX"Signature
42The fix-up values offset, which contains an offset relative from the start of the index entry header
62The number of fix-up values
88Metadata transaction journal sequence number, which contains a $LogFile Sequence Number (LSN)
168Virtual Cluster Number (VCN) of the index entry

Note that there can be more fix-up value than supported by the index entry data size.

The index node header

The index node header is 16 bytes in size and consists of:

OffsetSizeValueDescription
04Index values offset, where the offset is relative from the start of the index node header
44Index node size, where the value includes the size of the index node header
84Allocated index node size, where the value includes the size of the index node header
124Index node flags

In an index entry (index allocation attribute) the index node size includes the size of the fix-up values and the alignment padding following it.

The remainder of the index node contains remnant data and/or zero-byte values.

The index node flags

ValueIdentifierDescription
0x00000001Is branch node, which is used to indicate if the node is a branch node that has sub nodes

The index value

The index value is of variable size and consists of:

OffsetSizeValueDescription
08File reference
82Size, which includes the 10 bytes of the file reference and size
102Key data size
124Index value flags
If index key data size > 0
16...Key data
......Data
If index value flag 0x00000001 (is branch node) is set
...8Sub node Virtual Cluster Number (VCN)

The index values are stored 8 byte aligned.

Note that some other sources define the index value flags as a 16-bit value followed by 2 bytes of padding.

The index value flags

ValueIdentifierDescription
0x00000001Has sub node, when set the index value contains a sub node Virtual Cluster Number (VCN)
0x00000002Is last, when set the index value is the last in the index values array

Index key and value data

Directory entry index value

The MFT attribute name of the directory entry index is: $I30.

The directory entry index value contains a file name attribute in the index key data.

Note that the index value data can contain remnant data.

The short and long names of the same file have a separate index values. The short name uses the DOS name space and the long name the WINDOWS name space. Index values with a single name use either the POSIX or DOS_WINDOWS name space.

A hard link to a file in the same directory has separate index values.

Security descriptor hash index value

The MFT attribute name of the security descriptor hash index is: $SDH. It appears to only to be used by the $Secure metadata file.

Also see the security descriptor hash index value.

Security descriptor identifier index value

The MFT attribute name of the security descriptor identifier index is: $SII. It appears to only to be used by the $Secure metadata file.

Also see the security descriptor identifier index value.

Compression

Compressed data-runs

NTFS compression groups 16 cluster blocks together. This group of 16 cluster blocks also named a compression unit, which is either “compressed” or uncompressed.

The term compressed is quoted here because the group of cluster blocks can also contain uncompressed data. A group of cluster blocks is “compressed” when it is compressed size is smaller than its uncompressed data size. Within a group of cluster blocks each of the 16 blocks is “compressed” individually.

The compression unit size is stored in the non-resident MFT attribute. The maximum uncompressed data size is always the cluster size (in most case 4096).

Note that a resident $DATA attribute with the compression type in the data flags is stored uncompressed.

The data runs in the $DATA attribute define cluster block ranges, e.g.

21 02 35 52

This data run defines 2 data blocks starting at block number 21045 followed by 14 sparse blocks. The total number of blocks in the compression unit is 16. Compressed data is stored in the first 2 blocks and the 14 sparse blocks are only there to make sure the data runs add up to the compression unit size. They do not define actual sparse data.

Another example:

21 40 37 52

This data run defines 64 data blocks starting at block number 21047. Since this data run is larger than the compression unit size the data is stored uncompressed.

If the data run was e.g. 60 data blocks followed by 4 sparse blocks the first 3 compression units (blocks 1 to 48) would be uncompressed and the last compression unit (blocks 49 to 64) would be compressed.

Also “sparse data” and “sparse compression unit” data runs can be mixed. If in the previous example the 60 data blocks would be followed by 20 sparse blocks the last compression unit (blocks 65 to 80) would be sparse.

A compression unit can consists of multiple compressed data runs, e.g. 1 data block followed by 4 data blocks followed by 11 sparse blocks. Data runs have been observed where the last data run size does not align with the compression unit size.

The sparse blocks data run can be stored in a subsequent attribute in an attribute chain and can be stored in multiple data runs.

NTFS compression stores the “compressed” data in blocks. Each block has a 2 byte block header.

The block is of variable size and consists of:

OffsetSizeValueDescription
02Block size
2compressed data sizeUncompressed or LZNT1 compressed data

The upper 4 bits of the block size are used as flags:

Bit(s)Description
0 - 11Compressed data size
12 - 14Unknown
15Data is compressed

TODO: link to LZNT1 documentation

Windows Overlay Filter (WOF) compressed data

A MFT entry that contains Windows Overlay Filter (WOF) compressed data has the following attributes:

  • reparse point attribute with tag 0x80000017, which defines the compression method
  • a nameless data attribute that is sparse and contains the uncompressed data size
  • a data attribute named WofCompressedData that contains LZXPRESS Huffman or LZX compressed data
OffsetSizeValueDescription
Chunk offset table
0...Array of 32-bit of 64-bit compressed data chunk offsets, where the offset is relative from the start of the data chunks
Data chunks
......One or more compressed or uncompressed data chunks

Note that if the chunk size equals the size of the uncompressed data the chunk is stored (as-is) uncompressed.

The size of the chunk offset table is:

number of chunk offsets = uncompressed size / compression unit size

The offset of the first compressed data chunk is at the end of the chunk offset table and is not stored in the chunk offset table.

If the uncompressed size of a chunk is smaller than the compression unit size the chunk is stored uncompressed.

Also see Windows Overlay Filter (WOF) compression method.

The reparse point

The reparse point is used to create file system-level links. Reparse data is stored in the reparse point attribute. The reparse point data (REPARSE_DATA_BUFFER) is of variable size and consists of:

OffsetSizeValueDescription
04Reparse point tag
42Reparse data size
620Unknown (Reserved)
8...Reparse data

TODO: determine if non-native (Microsoft) reparse points are stored with their GUID

The reparse point tag

OffsetSizeValueDescription
0.0 16 bitsType
2.0 12 bitsUnknown (Reserved)
3.44 bitsFlags

Reparse point tag flags

ValueIdentifierDescription
0x1Unknown (Reserved)
0x2Is alias (Name surrogate bit), when this bit is set, the file or directory represents another named entity in the system
0x4Is high-latency media (Reserved)
0x8Is native (Microsoft-bit)

Known reparse point tags

ValueIdentifierDescription
0x00000000IO_REPARSE_TAG_RESERVED_ZEROUnknown (Reserved)
0x00000001IO_REPARSE_TAG_RESERVED_ONEUnknown (Reserved)
0x00000002IO_REPARSE_TAG_RESERVED_TWOUnknown (Reserved)
0x80000005IO_REPARSE_TAG_DRIVE_EXTENDERUsed by Home server drive extender
0x80000006IO_REPARSE_TAG_HSM2Used by Hierarchical Storage Manager Product
0x80000007IO_REPARSE_TAG_SISUsed by single-instance storage (SIS) filter driver
0x80000008IO_REPARSE_TAG_WIMUsed by the WIM Mount filter
0x80000009IO_REPARSE_TAG_CSVUsed by Clustered Shared Volumes (CSV) version 1
0x8000000aIO_REPARSE_TAG_DFSUsed by the Distributed File System (DFS)
0x8000000bIO_REPARSE_TAG_FILTER_MANAGERUsed by filter manager test harness
0x80000012IO_REPARSE_TAG_DFSRUsed by the Distributed File System (DFS)
0x80000013IO_REPARSE_TAG_DEDUPUsed by the Data Deduplication (Dedup)
0x80000014IO_REPARSE_TAG_NFSUsed by the Network File System (NFS)
0x80000015IO_REPARSE_TAG_FILE_PLACEHOLDERUsed by Windows Shell for placeholder files
0x80000016IO_REPARSE_TAG_DFMUsed by Dynamic File filter
0x80000017IO_REPARSE_TAG_WOFUsed by Windows Overlay Filter (WOF), for either WIMBoot or compression
0x80000018IO_REPARSE_TAG_WCIUsed by Windows Container Isolation (WCI)
0x8000001bIO_REPARSE_TAG_APPEXECLINKUsed by Universal Windows Platform (UWP) packages to encode information that allows the application to be launched by CreateProcess
0x8000001eIO_REPARSE_TAG_STORAGE_SYNCUsed by the Azure File Sync (AFS) filter
0x80000020IO_REPARSE_TAG_UNHANDLEDUsed by Windows Container Isolation (WCI)
0x80000021IO_REPARSE_TAG_ONEDRIVEUnknown (Not used)
0x80000023IO_REPARSE_TAG_AF_UNIXUsed by the Windows Subsystem for Linux (WSL) to represent a UNIX domain socket
0x80000024IO_REPARSE_TAG_LX_FIFOUsed by the Windows Subsystem for Linux (WSL) to represent a UNIX FIFO (named pipe)
0x80000025IO_REPARSE_TAG_LX_CHRUsed by the Windows Subsystem for Linux (WSL) to represent a UNIX character special file
0x80000036IO_REPARSE_TAG_LX_BLKUsed by the Windows Subsystem for Linux (WSL) to represent a UNIX block special file
0x9000001cIO_REPARSE_TAG_PROJFSUsed by the Windows Projected File System filter, for files managed by a user mode provider such as VFS for Git
0x90001018IO_REPARSE_TAG_WCI_1Used by Windows Container Isolation (WCI)
0x9000101aIO_REPARSE_TAG_CLOUD_1Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000201aIO_REPARSE_TAG_CLOUD_2Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000301aIO_REPARSE_TAG_CLOUD_3Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000401aIO_REPARSE_TAG_CLOUD_4Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000501aIO_REPARSE_TAG_CLOUD_5Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000601aIO_REPARSE_TAG_CLOUD_6Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000701aIO_REPARSE_TAG_CLOUD_7Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000801aIO_REPARSE_TAG_CLOUD_8Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000901aIO_REPARSE_TAG_CLOUD_9Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000a01aIO_REPARSE_TAG_CLOUD_AUsed by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000b01aIO_REPARSE_TAG_CLOUD_BUsed by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000c01aIO_REPARSE_TAG_CLOUD_CUsed by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000d01aIO_REPARSE_TAG_CLOUD_DUsed by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000e01aIO_REPARSE_TAG_CLOUD_EUsed by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0x9000f01aIO_REPARSE_TAG_CLOUD_FUsed by the Cloud Files filter, for files managed by a sync engine such as OneDrive
0xa0000003IO_REPARSE_TAG_MOUNT_POINTJunction (or mount point)
0xa000000cIO_REPARSE_TAG_SYMLINKSymbolic link
0xa0000010IO_REPARSE_TAG_IIS_CACHEUsed by Microsoft Internet Information Services (IIS) caching
0xa0000019IO_REPARSE_TAG_GLOBAL_REPARSEUsed by NPFS to indicate a named pipe symbolic link from a server silo into the host silo
0xa000001aIO_REPARSE_TAG_CLOUDUsed by the Cloud Files filter, for files managed by a sync engine such as Microsoft OneDrive
0xa000001dIO_REPARSE_TAG_LX_SYMLINKUsed by the Windows Subsystem for Linux (WSL) to represent a UNIX symbolic link
0xa000001fIO_REPARSE_TAG_WCI_TOMBSTONEUsed by Windows Container Isolation (WCI)
0xa0000022IO_REPARSE_TAG_PROJFS_TOMBSTONEUsed by the Windows Projected File System filter, for files managed by a user mode provider such as VFS for Git
0xa0000027IO_REPARSE_TAG_WCI_LINKUsed by Windows Container Isolation (WCI)
0xa0001027IO_REPARSE_TAG_WCI_LINK_1Used by Windows Container Isolation (WCI)
0xc0000004IO_REPARSE_TAG_HSMUsed by Hierarchical Storage Manager Product
0xc0000014IO_REPARSE_TAG_APPXSTRMUnknown (Not used)

Junction or mount point reparse data

A reparse point with tag IO_REPARSE_TAG_MOUNT_POINT (0xa0000003) contains junction or mount point reparse data. The junction or mount point reparse data is of variable size and consists of:

OffsetSizeValueDescription
02Substitute name offset, where the offset is relative from the start of the reparse name data
22Substitute name size in bytes, where the size of the end-of-string character is not included
42Display name offset, where the offset is relative from the start of the reparse name data
62Display name size in bytes, where the size of the end-of-string character is not included
Reparse name data
8...Substitute name, which contains an UCS-2 little-endian string without end-of-string character
......Display name, which contains an UCS-2 little-endian string without end-of-string character

Note that it is currently unclear if the names contain an end-of-string character or if they are followed by alignment padding.

TODO: determine what character values like 0x0002 represent in the substitute name

00000010: 5c 00 3f 00 3f 00 02 00  43 00 3a 00 5c 00 55 00   \.?.?... C.:.\.U.
00000020: 73 00 65 00 72 00 73 00  5c 00 74 00 65 00 73 00   s.e.r.s. \.t.e.s.
00000030: 74 00 5c 00 44 00 6f 00  63 00 75 00 6d 00 65 00   t.\.D.o. c.u.m.e.
00000040: 6e 00 74 00 73 00 00 00                            n.t.s...

A reparse point with tag IO_REPARSE_TAG_SYMLINK (0xa000000c) contains symbolic link reparse data. The symbolic link reparse data is of variable size and consists of:

OffsetSizeValueDescription
02Substitute name offset, where the offset is relative from the start of the reparse name data
22Substitute name size in bytes
42Display name offset, where the offset is relative from the start of the reparse name data
62Display name size, in bytes
84Symbolic link flags
Reparse name data
12...Substitute name, which contains an UCS-2 little-endian string without end-of-string character
......Display name, which contains an UCS-2 little-endian string without end-of-string character
ValueIdentifierDescription
0x00000001SYMLINK_FLAG_RELATIVEThe substitute name is a path name relative to the directory containing the symbolic link

Windows Overlay Filter (WOF) reparse data

A reparse point with tag IO_REPARSE_TAG_WOF (0x80000017) contains Windows Overlay Filter (WOF) reparse data. The Windows Overlay Filter (WOF) reparse data is 16 bytes in size and consists of:

OffsetSizeValueDescription
External provider information
041Unknown (WOF version)
442Unknown (WOF provider)
Internal provider information
841Unknown (file information version)
124Compression method

Windows Overlay Filter (WOF) compression method

ValueIdentifierDescription
0LZXPRESS Huffman with 4k window (compression unit)
1LZX with 32k window (compression unit)
2LZXPRESS Huffman with 8k window (compression unit)
3LZXPRESS Huffman with 16k window (compression unit)

TODO: link to LZXPRESS Huffman and LZX documentation

Windows Container Isolation (WCI) reparse data

A reparse point with tag IO_REPARSE_TAG_WCI (0x80000018) contains Windows Container Isolation (WCI) reparse data. The Windows Container Isolation (WCI) reparse data is of variable size and consists of:

OffsetSizeValueDescription
041Version
440Unknown (reserved)
816Look-up identifier, which contains a GUID
242Name size in bytes
26...Name, which contains an UCS-2 little-endian string without end-of-string character

The allocation bitmap

The metadata file $Bitmap contains the allocation bitmap.

Every bit in the allocation bitmap represents a block the size of the cluster block, where the LSB is the first bit in a byte.

TODO: describe what the $SRAT data stream is used for.

Access control

The $Secure metadata file contains the security descriptors used for access control.

TypeNameDescription
Data$SDSSecurity descriptor data stream, which contains all the Security descriptors on the volume
Index$SDHSecurity descriptor hash index
Index$SIISecurity descriptor identifier index, which contains the mapping of the security descriptor identifier (in $STANDARD_INFORMATION) to the offset of the security descriptor data (in $Secure:$SDS)

Security descriptor hash ($SDH) index

The security descriptor hash index value

OffsetSizeValueDescription
Key data
04Security descriptor hash
44Security descriptor identifier
Value data
84Security descriptor hash
124Security descriptor identifier
168Security descriptor data offset (in $SDS)
244Security descriptor data size (in $SDS)
284Unknown

Security descriptor identifier ($SII) index

The security descriptor identifier index value

OffsetSizeValueDescription
Key data
04Security descriptor identifier
Value data
44Security descriptor hash
84Security descriptor identifier
128Security descriptor data offset (in $SDS)
204Security descriptor data size (in $SDS)

TODO: describe the hash algorithm

Security descriptor ($SDS) data stream

OffsetSizeValueDescription
04Security descriptor hash
44Security descriptor identifier
128Security descriptor data offset (in $SDS)
204Security descriptor data size (in $SDS)
24...Security descriptor data
......Alignment padding (2-byte alignment)

TODO: link to security descriptor format documentation

The object identifiers

$ObjID:$O

OffsetSizeValueDescription
Key data
016File (or object) identifier, which contains a GUID
Value data
48File reference
1216Birth droid volume identifier, which contains a GUID
2816Birth droid file (or object) identifier, which contains a GUID
4416Birth droid domain identifier, which contains a GUID

Metadata transaction journal (log file)

TODO: complete section

The metadata file $LogFile contains the metadata transaction journal and consists of:

Log File service restart page header

The Log File service restart page header (LFS_RESTART_PAGE_HEADER) is 30 bytes in size and consists of:

OffsetSizeValueDescription
MULTI_SECTOR_HEADER
04"CHKD", "RCRD", "RSTR"Signature
42The fix-up values (or update sequence array) offset, which contain an offset relative from the start of the restart page header
62The number of fix-up values (or update sequence array size)
Common
88Checkdisk last LSN
164System page size
204Log page size
242Restart offset
262Minor format version
282Major format version

Log File service restart page versions

Major format versionRemarks
-1Beta Version
0Transition
1Update sequence support

USN change journal

The metadata file $Extend$UsnJrnl contains the USN change journal. It is a sparse file in which NTFS stores records of changes to files and directories. Applications make use of the journal to respond to file and directory changes as they occur, like e.g. the Windows File Replication Service (FRS) and the Windows (Desktop) Search service.

The USN change journal consists of:

  • the $UsnJrnl:$Max data stream, containing metadata like the maximum size of the journal
  • the $UsnJrnl:$J data stream, containing the update (or change) entries. The $UsnJrnl:$J data stream is sparse.

USN change journal metadata

The USN change journal metadata is 32 bytes in size and consists of:

OffsetSizeValueDescription
08Maximum size in bytes
88Allocation (size) delta in bytes
168Update (USN) journal identifier, which contains a FILETIME
248Unknown (empty)

USN change journal entries

The $UsnJrnl:$J data stream consists of an array of USN change journal entries. The USN change journal entries are stored on a per block-basis and 8-byte aligned. Therefore the remainder of the block can contain 0-byte values.

TODO: describe journal block size

Once the stream reaches maximum size the earliest USN change journal entries are removed from the stream and replaced with a sparse data run.

USN change journal entry

The USN change journal entry (USN_RECORD_V2) is of variable size and consists of:

OffsetSizeValueDescription
04Entry (or record) size
422Major format version
620Minor format version
88File reference
168Parent file reference
248Update sequence number (USN), which contains the file offset of the USN change journal entry which is used as a unique identifier
328Update date and time, which contains a FILETIME
404Update reason flags
444Update source flags
484Security descriptor identifier, which contains the entry number in the security ID index ($Secure:$SII). Also see Access Control
524File attribute flags
562Name size in bytes
582Name offset, which is relative from the start of the USN change journal entry
60(name size)Name, which contains an UCS-2 little-endian string without end-of-string character
......0x00Unknown (Padding)

Update reason flags

ValueIdentifierDescription
0x00000001USN_REASON_DATA_OVERWRITEThe data in the file or directory is overwritten
0x00000002USN_REASON_DATA_EXTENDThe file or directory is extended
0x00000004USN_REASON_DATA_TRUNCATIONThe file or directory is truncated
0x00000010USN_REASON_NAMED_DATA_OVERWRITEOne or more named data streams ($DATA attributes) of file were overwritten
0x00000020USN_REASON_NAMED_DATA_EXTENDOne or more named data streams ($DATA attributes) of file were extended
0x00000040USN_REASON_NAMED_DATA_TRUNCATIONOne or more named data streams ($DATA attributes) of a file were truncated
0x00000100USN_REASON_FILE_CREATEThe file or directory was created
0x00000200USN_REASON_FILE_DELETEThe file or directory was deleted
0x00000400USN_REASON_EA_CHANGEThe extended attributes of the file were changed
0x00000800USN_REASON_SECURITY_CHANGEThe access rights (security descriptor) of a file or directory were changed
0x00001000USN_REASON_RENAME_OLD_NAMEThe name changed, where the USN change journal entry contains the old name
0x00002000USN_REASON_RENAME_NEW_NAMEThe name changed, where the USN change journal entry contains the new name
0x00004000USN_REASON_INDEXABLE_CHANGEContent indexed status changed. The file attribute FILE_ATTRIBUTE_NOT_CONTENT_INDEXED was changed
0x00008000USN_REASON_BASIC_INFO_CHANGEBasic file or directory attributes changed. One or more file or directory attributes were changed e.g. read-only, hidden, system, archive, or sparse attribute, or one or more time stamps
0x00010000USN_REASON_HARD_LINK_CHANGEA hard link was created or deleted
0x00020000USN_REASON_COMPRESSION_CHANGEThe file or directory was compressed or decompressed
0x00040000USN_REASON_ENCRYPTION_CHANGEThe file or directory was encrypted or decrypted
0x00080000USN_REASON_OBJECT_ID_CHANGEThe object identifier of a file or directory was changed
0x00100000USN_REASON_REPARSE_POINT_CHANGEThe reparse point that in a file or directory was changed, or a reparse point was added to or deleted from a file or directory
0x00200000USN_REASON_STREAM_CHANGEA named data stream ($DATA attribute) is added to or removed from a file, or a named stream is renamed
0x00400000USN_REASON_TRANSACTED_CHANGEUnknown
0x80000000USN_REASON_CLOSEThe file or directory was closed

Update source flags

ValueIdentifierDescription
0x00000001USN_SOURCE_DATA_MANAGEMENTThe operation added a private data stream to a file or directory. The modifications did not change the application data
0x00000002USN_SOURCE_AUXILIARY_DATAThe operation was caused by the operating system. Although a write operation is performed on the item, the data was not changed
0x00000004USN_SOURCE_REPLICATION_MANAGEMENTThe operation was caused by file replication

Alternate data streams (ADS)

Data stream nameDescription
"♣BnhqlkugBim0elg1M1pt2tjdZe", "♣SummaryInformation", "{4c8cc155-6c1e-11d1-8e41-00c04fb9386d}"Used to store properties, where ♣ (black club) is Unicode character U+2663
"{59828bbb-3f72-4c1b-a420-b51ad66eb5d3}.XPRESS"Used during remote differential compression
"AFP_AfpInfo", "AFP_Resource"Used to store Macintosh operating system property lists
"encryptable"Used to store attributes relating to thumbnails in the thumbnails database
"favicon"Used to store favorite icons for web pages
"ms-properties"Used to store properties
"OECustomProperty"Used to store custom properties related to email files
"Zone.Identifier"Used to store the Internet Explorere URL security zone of the origin

ms-properties

The ms-properties alternate data stream contains a Windows Serialized Property Store (SPS).

TODO: link to Windows Serialized Property Store (SPS) format documentation

Zone.Identifier

The Zone.Identifier alternate data stream contains ASCII text in the form:

[ZoneTransfer]
ZoneId=3

Where ZoneId refers to the Internet Explorer URL security zone of the origin.

Transactional NTFS (TxF)

As of Vista Transactional NTFS (TxF) was added.

In TxF the resource manager (RM) keeps track of transactional metadata and log files. The TxF related metadata files are stored in the metadata directory:

$Extend\$RmMetadata

Resource manager repair information

The resource manager repair information metadata file: $Extend$RmMetadata$Repair consists of the following data streams:

  • the default (unnamed) data stream
  • the $Config data stream, contains the resource manager repair configuration information

TODO: determine the purpose of the default (unnamed) data stream

Resource manager repair configuration information

TODO: complete section

The $Repair:$Config data streams contains:

OffsetSizeValueDescription
04Unknown
44Unknown

Transactional NTFS (TxF) metadata directory

TODO: complete section

The transactional NTFS (TxF) metadata directory: $Extend$RmMetadata$Txf is used to isolate files for delete or overwrite operations.

TxF Old Page Stream (TOPS) file

The TxF Old Page Stream (TOPS) file: $Extend$RmMetadata$TxfLog$Tops consists of the following data streams:

  • the default (unnamed) data stream, contains metadata about the resource manager, such as its GUID, its CLFS log policy, and the LSN at which recovery should start
  • the $T data stream, contains the file data that is partially overwritten by a transaction as opposed to a full overwrite, which would move the file into the Transactional NTFS (TxF) metadata directory

TxF Old Page Stream (TOPS) metadata

TODO: complete section

The $Tops default (unnamed) data streams contains:

OffsetSizeValueDescription
02Unknown
22Size of TOPS metadata
44Unknown (Number of resource managers/streams?)
816Resource Manager (RM) identifier, which contains a GUID
248Unknown (empty)
328Base (or log start) LSN of TxFLog stream
408Unknown
488Last flushed LSN of TxFLog stream
568Unknown
648Unknown (empty)
728Unknown (Restart LSN?)
8020Unknown

TxF Old Page Stream (TOPS) file data

The $Tops:$T data streams contains the file data that is partially overwritten by a transaction. It consists of multiple pending transaction XML-documents.

TODO: describe start of each sector containing 0x0001

A pending transaction XML-document starts with an UTF-8 byte-order-mark. Is roughly contains the following data:

<?xml version='1.0' encoding='utf-8'?>
<PendingTransaction Version="2.0" Identifier="...">
   <Transactions>
      <Transaction TransactionId="...">
      <Install Application="..., Culture=..., Version=..., PublicKeyToken=...,
                           ProcessorArchitecture=..., versionScope=..."
               RefGuid="..."
               RefIdentifier="..."
               RefExtra="..."/>
      ...
      </Transaction>
   </Transactions>
   <ChangeList>
      <Change Family="..., Culture=..., PublicKeyToken=...,
                     ProcessorArchitecture=..., versionScope=..."
              New="..."/>
      ...
   </ChangeList>
   <POQ>
      <BeginTransaction id="..."/>

      <CreateFile path="..."
                  fileAttribute="..."/>
      <DeleteFile path="..."/>
      <MoveFile source="..." destination="..."/>
      <HardlinkFile source="..." destination="..."/>
      <SetFileInformation path="..."
                          securityDescriptor="binary base64:..."
                          flags="..."/>

       <CreateKey path="..."/>
       <SetKeyValue path="..."
                    name="..."
                    type="..."
                    encoding="base64"
                    value="..."/>
      <DeleteKeyValue path="..."
                      name="..."/>

      ...
   </POQ>
   <InstallerQueue Length="...">
      <Action Installer="..."
              Mode="..."
              Phase="..."
              Family="..., Culture=..., PublicKeyToken=...,
                     ProcessorArchitecture=..., versionScope=..."
              Old="..."
              New="..."/>

      ...
   </InstallerQueue >
</PendingTransaction>

Transactional NTFS (TxF) Common Log File System (CLFS) files

TxF uses a Common Log File System (CLFS) log store and the logged utility stream attribute named $TXF_DATA.

TODO: link to CLFS format documentation

The base log file (BLF) of the TxF log store is:

$Extend\$RmMetadata\$TxfLog\TxfLog.blf

Commonly the corresponding container files are:

$Extend\$RmMetadata\$TxfLog\TxfLogContainer00000000000000000001
$Extend\$RmMetadata\$TxfLog\TxfLogContainer00000000000000000002

TxF uses a multiplexed log store which contains the following streams:

  • the KtmLog stream used for Kernel Transaction Manager (KTM) metadata records
  • TxfLog stream, which contains the TxF log records.

Transactional data logged utility stream attribute

The transactional data ($TXF_DATA) logged utility stream attribute is 56 bytes in size and consist of:

OffsetSizeValueDescription
06Unknown (remnant data)
68Resource manager root file reference, which contains an NTFS file reference that refers to the MFT
148Unknown (USN index?)
228File identifier (TxID), which contains a TxF file identifier
308Data LSN, which contains a CLFS LSN of file data transaction records
388Metadata LSN, which contains a CLFS LSN of file system metadata transaction records
468Directory index LSN, which contains a CLFS LSN of directory index transaction records
542Unknown (Flags?)

Note that a single MFT entry can contain multiple Transactional data logged utility stream attributes.

Windows definitions

File attribute flags

The file attribute flags consist of the following values:

ValueIdentifierDescription
0x00000001FILE_ATTRIBUTE_READONLYIs read-only
0x00000002FILE_ATTRIBUTE_HIDDENIs hidden
0x00000004FILE_ATTRIBUTE_SYSTEMIs a system file or directory
0x00000008Is a volume label, which is not used by NTFS
0x00000010FILE_ATTRIBUTE_DIRECTORYIs a directory, which is not used by NTFS
0x00000020FILE_ATTRIBUTE_ARCHIVEShould be archived
0x00000040FILE_ATTRIBUTE_DEVICEIs a device, which is not used by NTFS
0x00000080FILE_ATTRIBUTE_NORMALIs normal file. Note that none of the other flags should be set
0x00000100FILE_ATTRIBUTE_TEMPORARYIs temporary
0x00000200FILE_ATTRIBUTE_SPARSE_FILEIs a sparse file
0x00000400FILE_ATTRIBUTE_REPARSE_POINTIs a reparse point or symbolic link
0x00000800FILE_ATTRIBUTE_COMPRESSEDIs compressed
0x00001000FILE_ATTRIBUTE_OFFLINEIs offline. The data of the file is stored on an offline storage
0x00002000FILE_ATTRIBUTE_NOT_CONTENT_INDEXEDDo not index content. The content of the file or directory should not be indexed by the indexing service
0x00004000FILE_ATTRIBUTE_ENCRYPTEDIs encrypted
0x00008000Unknown (seen on Windows 95 FAT)
0x00010000FILE_ATTRIBUTE_VIRTUALIs virtual

The following flags are mainly used in the file name attribute and sparsely in the standard information attribute. It could be that they have a different meaning in both types of attributes or that the standard information flags are not updated. For now the latter is assumed.

ValueIdentifierDescription
0x10000000Unknown (Is directory or has $I30 index? Note that an $Extend directory without this flag has been observed)
0x20000000Is index view

Corruption scenarios

Data steam with inconsistent data flags

An MFT entry contains an $ATTRIBUTE_LIST attribute that contains multiple $DATA attributes. The $DATA attributes define a LZNT1 compressed data stream though only the first $DATA attribute has the compressed data flag set.

Note that it is unclear if this is a corruption scenario or not.

MFT entry: 220 information:
    Is allocated                   : true
    File reference                 : 220-59
    Base record file reference     : Not set (0)
    Journal sequence number        : 51876429013
    Number of attributes           : 5

Attribute: 1
    Type                           : $STANDARD_INFORMATION (0x00000010)
    Creation time                  : Jun 05, 2019 06:56:26.032730300 UTC
    Modification time              : Oct 05, 2019 06:56:04.150940700 UTC
    Access time                    : Oct 05, 2019 06:56:04.150940700 UTC
    Entry modification time        : Oct 05, 2019 06:56:04.150940700 UTC
    Owner identifier               : 0
    Security descriptor identifier : 5862
    Update sequence number         : 11553149976
    File attribute flags           : 0x00000820
       Should be archived (FILE_ATTRIBUTE_ARCHIVE)
       Is compressed (FILE_ATTRIBUTE_COMPRESSED)

Attribute: 2
    Type                           : $ATTRIBUTE_LIST (0x00000020)

Attribute: 3
    Type                           : $FILE_NAME (0x00000030)
    Parent file reference          : 33996-57
    Creation time                  : Jun 05, 2019 06:56:26.032730300 UTC
    Modification time              : Oct 05, 2019 06:56:03.510061800 UTC
    Access time                    : Oct 05, 2019 06:56:03.510061800 UTC
    Entry modification time        : Oct 05, 2019 06:56:03.510061800 UTC
    File attribute flags           : 0x00000020
       Should be archived (FILE_ATTRIBUTE_ARCHIVE)
    Namespace                      : POSIX (0)
    Name                           : setupapi.dev.20191005_085603.log

Attribute: 4
    Type                           : $DATA (0x00000080)
    Data VCN range                 : 513 - 1103
    Data flags                     : 0x0000

Attribute: 5
    Type                           : $DATA (0x00000080)
    Data VCN range                 : 0 - 512
    Data size                      : 4487594 bytes
    Data flags                     : 0x0001

Directory entry with outdated file reference

The directory entry: \ProgramData\McAfee\Common Framework\Task\5.ini

File entry:
    Path                           : \ProgramData\McAfee\Common Framework\Task\5.ini
    File reference                 : 51106-400
    Name                           : 5.ini
    Parent file reference          : 65804-10
    Size                           : 723
    Creation time                  : Sep 16, 2011 20:47:54.561041200 UTC
    Modification time              : Apr 07, 2012 21:07:02.684060000 UTC
    Access time                    : Apr 07, 2012 21:07:02.652810200 UTC
    Entry modification time        : Apr 07, 2012 21:07:02.684060000 UTC
    File attribute flags           : 0x00002020
       Should be archived (FILE_ATTRIBUTE_ARCHIVE)
       Content should not be indexed (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED)

The corresponding MFT entry:

MFT entry: 51106 information:
    Is allocated                   : true
    File reference                 : 51106-496
    Base record file reference     : Not set (0)
    Journal sequence number        : 0
    Number of attributes           : 3

Attribute: 1
    Type                           : $STANDARD_INFORMATION (0x00000010)
    Creation time                  : Sep 16, 2011 20:47:54.561041200 UTC
    Modification time              : Apr 07, 2012 21:07:02.684060000 UTC
    Access time                    : Apr 07, 2012 21:07:02.652810200 UTC
    Entry modification time        : Apr 07, 2012 21:07:02.684060000 UTC
    Owner identifier               : 0
    Security descriptor identifier : 1368
    Update sequence number         : 1947271600
    File attribute flags           : 0x00002020
       Should be archived (FILE_ATTRIBUTE_ARCHIVE)
       Content should not be indexed (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED)

Attribute: 2
    Type                           : $FILE_NAME (0x00000030)
    Parent file reference          : 65804-10
    Creation time                  : Sep 16, 2011 20:47:54.561041200 UTC
    Modification time              : Apr 07, 2012 21:07:02.652810200 UTC
    Access time                    : Apr 07, 2012 21:07:02.652810200 UTC
    Entry modification time        : Apr 07, 2012 21:07:02.652810200 UTC
    File attribute flags           : 0x00002020
       Should be archived (FILE_ATTRIBUTE_ARCHIVE)
       Content should not be indexed (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED)
    Namespace                      : DOS and Windows (3)
    Name                           : 1.ini

Attribute: 3
    Type                           : $DATA (0x00000080)
    Data size                      : 723 bytes
    Data flags                     : 0x0000

TODO: determine if $LogFile could be used to recover from this corruption scenario

LZNT1 compressed block with data size of 0

Not sure if this is a corruption scenario or a data format edge case.

A compression unit (index 30) consisting of the following data runs:

reading data run: 60.
data run:
00000000: 11 01 01                                           ...

value sizes                               : 1, 1
number of cluster blocks                  : 1 (size: 4096)
cluster block number                      : 687143 (1) (offset: 0xa7c27000)

reading data run: 61.
data run:
00000000: 01 0f                                              ..

value sizes                               : 1, 0
number of cluster blocks                  : 15 (size: 61440)
cluster block number                      : 0 (0) (offset: 0x00000000)
        Is sparse

Contains the following data:

a7c27000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
...
a7c27ff0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

This relates to an empty LZNT1 compressed block.

compressed data offset                    : 0 (0x00000000)
compression chunk header                  : 0x0000
compressed chunk size                     : 1
signature value                           : 0
is compressed flag                        : 0

It was observed in 2 differnt NTFS implementations that the entire block is filled with 0-byte values.

TODO: verify behavior of Windows NTFS implementation.

Truncated LZNT1 compressed block

Not sure if this is a corruption scenario or a data format edge case.

A compression unit (index 0) consisting of the following data runs:

reading data run: 0.
data run:
00000000: 31 08 48 d8 01                                     1.H..

value sizes                               : 1, 3
number of cluster blocks                  : 8 (size: 32768)
cluster block number                      : 120904 (120904) (offset: 0x1d848000)

reading data run: 1.
data run:
00000000: 01 08                                              ..

value sizes                               : 1, 0
number of cluster blocks                  : 8 (size: 32768)
cluster block number                      : 0 (0) (offset: 0x00000000)
        Is sparse

Contains the following data:

1d848000  bd b7 50 44 46 50 00 01  00 01 00 40 e0 00 07 0b  |..PDFP.....@....|
...
1d84c000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
1d84fff0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

This relates to a LZNT1 compressed block that appears to be truncated at offset 16384 (0x00004000).

compressed data offset                    : 16384 (0x00004000)
compression flag byte                     : 0x00

Different behavior was observed in 2 differnt NTFS implementations:

  • one implementation fills the compressed block with the uncompressed data it could read and the rest with with 0-byte values
  • another implementation seems to provide the data that was already in its buffer

TODO: verify behavior of Windows NTFS implementation.

References

Assorted formats

Property list (plist) format

The property list (plist) formats are used to store various kinds of data, for example configuration data. The format is know to be used stand-alone as well as embedded in other data formats.

Overview

Known plist formats are:

  • ASCII plist format
  • Binary plist format
  • XML plist format

TODO: What about other plist formats like JSON?

Value types

TypeDescription
arrayCollection of plist values without key
booleanBoolean value
dataBinary data
dateDate and time value
dictionaryCollection of plist values with key
integerSigned integer value
realFloating-point value
stringString value

ASCII plist format

TODO: complete section

Binary plist format

A binary plist file consists of:

  • header
  • object table
  • offset table
  • trailer
CharacteristicsDescription
Byte orderbig-endian
Date and time valuesNumber of seconds since Jan 1, 2001 00:00:00 UTC
Character stringsUTF-16 big-endian

Binary plist header

The binary plist header (CFBinaryPlistHeader) is 8 bytes in size and consists of:

OffsetSizeValueDescription
06"bplist"Signature
62Format version

Format versions

VersionDescription
"00"Supported as of Tiger
"01"Supported as of Leopard
"0x"Supported as of Snow Leopard, where x is any character

Object table

The object table consists of:

  • zero or more objects

Objects are of variable size and consist of:

  • an object maker byte
  • (optional) object data

Object marker byte

ValueIdentifierDescription
0x00kCFBinaryPlistMarkerNullEmpty value (NULL)
0x08kCFBinaryPlistMarkerFalseBoolean False
0x09kCFBinaryPlistMarkerTrueBoolean True
0x0fkCFBinaryPlistMarkerFillUnknown (Fill byte?)
0x1#kCFBinaryPlistMarkerIntInteger, where 2^# is the number of bytes
0x2#kCFBinaryPlistMarkerIntFloating point, where 2^# is the number of bytes
0x33kCFBinaryPlistMarkerDateDate and time value, which is stored as a 64-bits floating point that contains the number of seconds since Jan 1, 2001 00:00:00 UTC
0x4#kCFBinaryPlistMarkerDataBinary data, where # is the number of bytes. If # is 15 then the object marker byte is followed by a 32-bit integer that contains the size of the data
0x5#kCFBinaryPlistMarkerASCIIStringASCII string, where # is the number of characters. If # is 15 then the object marker byte is followed by an integer object that contains the number of characters in the string. The string is stored in ASCII (with codepage?) without an end-of-string marker
0x6#kCFBinaryPlistMarkerUnicode16StringUnicode string, where # is the number of characters. If # is 15 then the object marker byte is followed by an integer object that contains the number of characters in the string. The string is stored in UTF-16 big-endian without an end-of-string marker
0x7#Unused
0x8#kCFBinaryPlistMarkerUIDUID, where # + 1 is the number of bytes
0x9#Unused
0xa#kCFBinaryPlistMarkerArrayArray of objects, where # is the number of elements. If # is 15 then the object marker byte is followed by an integer object that contains the number of elements in the array
0xb#Unused
0xc#kCFBinaryPlistMarkerSetSet of objects, where # is the number of elements. If # is 15 then the object marker byte is followed by an integer object that contains the number of ele,emts in the set
0xd#kCFBinaryPlistMarkerDictDictionary of key value pairs, where # is the number of key value pairs. If # is 15 then the object marker byte is followed by an integer object that contains the number of key value pairs in the dictionary
0xe#Unused
0xf#Unused

Array object

The array object consists of:

  • array object marker with number of elements
  • array of object references that identify the element objects.
  • the element object data

The byte size of the object reference is defined in the trailer. An object reference of 1 will refer to the first object in the (object) offset table.

Set object

The set object consists of:

  • set object marker with number of elements
  • array of object references that identify the element objects.
  • the element object data

The byte size of the object reference is defined in the trailer. An object reference of 1 will refer to the first object in the (object) offset table.

Dictionary object

The dictionary object consists of:

  • dictionary object marker with number of key and value pairs
  • array of key references that identify key objects.
  • array of object references that identify the value objects.
  • the key/value object data

The byte size of the key and object reference is defined in the trailer. A key and object reference of 1 will refer to the first object in the (object) offset table.

(Object) offset table

The offset table consists of an array of offsets. The trailer defines:

  • The location of the offset table
  • The offset byte size
  • The number of offsets in the table

The offset values are relative from the start of the file.

Binary plist trailer

The binary plist trailer (CFBinaryPlistTrailer) is 32 bytes in size and consists of:

OffsetSizeValueDescription
05 x 10Unknown (0-byte values)
510Unknown (Sort version)
61Offset byte size
71Key and object reference byte size
88Number of objects
168Root (or top-level) object
248Offset table offset, where the offset is relative to the start of the file

XML plist format

A XML plist file consists of:

  • optional XML declaration
  • optional Document Type Definition (DTD)
  • plist root XML element
  • key-value pair XML elements

For example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist SYSTEM "file://localhost/System/Library/DTDs/PropertyList.dtd">
<plist version="1.0">
...
</plist>

Zlib compressed data

Zlib compression is commonly used in file formats. The zlib compressed data format, as defined in RFC1950, allows for multiple techniques but only the Deflate compression method, a variation of LZ77, is used.

Overview

Zlib compressed data consist of:

  • data header
  • compressed data
  • Adler-32 checksum of the uncompressed data

Characteristics

CharacteristicsDescription
Byte orderbig-endian

Data header

The data header is 2 or 6 bytes in size and consist of:

OffsetSizeValueDescription
The bit values are stored a 8-bit values
0.04 bitsCompression method
0.44 bitsCompression information
Flags
1.05 bitsCheck bits
1.51 bitPreset dictionary flag
1.62 bitsCompression level. The compression level is used mainly for re-compression
If the dictionary identifier flag is set
24Preset dictionary identifier, which contains an Adler-32 used to identifier the preset dictionary
Common
......Compressed data
...4Checksum, which contains an Adler-32 of the compressed data

The check bits value must be such that when the first 2 bytes are represented as a 16-bit unsigned integer in big-endian byte order the value is a multiple of 31, such that:

((first * 256) + second) % 31 = 0

Compression method

ValueIdentifierDescription
8Deflate (RFC1951), with a maximum window size of 32 KiB
15Reserved for additional header data

Note that RFC1950 only defines 8 as a valid compression method.

Compression information

The value of the compression information is dependent on the compression method.

Compression information - compression method 8 (Deflate)

For compression method 8 (Deflate) the compression information contains the base-2 logarithm of the LZ77 window size minus 8.

OffsetSizeValueDescription
0.04 bitsWindow size, which consists of a base-2 logarithm (2n), with a maximum value of 7 (32 KiB)

To determine the corresponding window size:

1 << (7 + 8)

E.g. a compression information value of 7 indicates a 32768 bytes window size. Values larger than 7 are not allowed according to RFC1950 and thus the maximum window size is 32768 bytes.

Compression level

ValueIdentifierDescription
0Fastest
1Fast
2Default
3Slowest, maximum compression

Compressed data

Deflate compressed data

The deflate compressed data consists of one or more deflate compressed blocks. Each block consists of:

  • block header
  • block data

Note that a block can reference uncompressed data that is stored in a previous block.

Block header

The block header is 3 bits in size and consists of:

OffsetSizeValueDescription
01 bitLast block (in stream) marker, where 1 represents the last block and 0 otherwise
0.12 bitsBlock type

Block types

ValueIdentifierDescription
0Uncompressed (or stored) block
1Fixed Huffman compressed block
2Dynamic Huffman compressed block
3Reserved (not used)

Uncompressed block data

The uncompressed block data is of variable size and consists of:

OffsetSizeValueDescription
0.35 bitsEmpty values (not used)
12Uncompressed data size
32Copy of uncompressed data size, which contains a 1s complement of the uncompressed data size
5...Uncompressed data

The uncompressed data size can range between 0 and 65535 bytes.

Huffman compressed block data

The uncompressed block data is of variable size and consists of:

  • Optional dynamic Huffman table
  • Encoded bit-stream
  • End-of-stream (or end-of-block or end-of-data) marker
Dynamic Huffman table

The dynamic Huffman table consists of:

OffsetSizeValueDescription
0.35 bitsNumber of literal codes, which is value + 257. The number of literal codes must be smaller than 286
1.05 bitsNumber of distance codes, which is value + 1. The number of distance codes must be smaller than 30
1.54 bitsThe number of Huffman codes for the code sizes, which is value + 4
2.1...The code sizes
......Huffman encoded stream of the Huffman codes for the literals
......Huffman encoded stream of the Huffman codes for the distances

A single code size value is 3 bits of size. A value of 0 means the code size is not used in the Huffman encoding of the literal and distance codes.

The codes size values are stored in the following sequence:

16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15

The first value applies to a code size of 16, the second to 17, etc. Code sizes that are not stored default to 0.

The code size values are used to construct the code sizes Huffman table. This must be a complete Huffman table which is used to decode the literal and distance codes. The corresponding codes size Huffman encoding is defined as:

ValueIdentifierDescription
0 - 15Represents a code size of 0 - 15
16Copy the previous code size 3 - 6 times. The next 2 bits indicate repeat length (0 = 3, ... , 3 = 6), e.g. codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will expand to 12 code lengths of 8 (1 + 6 + 5)
17Repeat a code length of 0 for 3 - 10 times (3 bits of length)
18Repeat a code length of 0 for 11 - 138 times (7 bits of length)

Both the literal and distance Huffman codes are stored Huffman encoded using the code sizes Huffman table. Code sizes that are not stored default to 0. The code size for the literal code 256 (end-of-block) should be set and thus not 0.

Encoded bit-stream

The encoded bit-stream is stored in 8-bit integers, where bit values are stored back-to-front. So that 3 least-significant bits (LSB) would represent a 3-bit value at the start of the -stream. Note that the LSB of the 3-bit value is the LSB of the byte value.

Deflate uses a Huffman tree of 288 Huffman codes (or symbols) where the values:

  • 0 - 255; represent the literal byte values: 0 - 255
  • 256: represents the end of (compressed) stream (or block)
  • 257 - 285 (combined with extra-bits): represent a (size, offset) tuple (or match length) of 3 - 258 bytes
  • 286, 287: are not used (reserved) and their use is considered illegal although the values are still part of the tree

This document refers to this Huffman tree as the literals Huffman tree.

The bits in the encoded bit-stream correspond to values in the literals Huffman tree. If a symbol is found that represents a compression size and offset tuple (or match length code) the bits following the literals symbol contains a distance (Huffman) code. The match length coedes might require additional (or extra) bits to store the length (or size).

The distances Huffman tree contains space for 32 symbols. See section Distance codes. The distance code might require additional (or extra) bits to store the distance.

Literal codes

The literal codes consist of:

ValueIdentifierDescription
0x00 – 0xffliteral byte values
0x100end-of-block marker
0 additional bits
0x101Size of 3
0x102Size of 4
0x103Size of 5
0x104Size of 6
0x105Size of 7
0x106Size of 8
0x107Size of 9
0x108Size of 10
1 additional bit
0x109Size of 11 to 12
0x10aSize of 13 to 14
0x10bSize of 15 to 16
0x10cSize of 17 to 18
2 additional bits
0x10dSize of 19 to 22
0x10eSize of 23 to 26
0x10fSize of 27 to 30
0x110Size of 31 to 34
3 additional bits
0x111Size of 35 to 42
0x112Size of 43 to 50
0x113Size of 51 to 58
0x114Size of 59 to 66
4 additional bits
0x115Size of 67 to 82
0x116Size of 83 to 98
0x117Size of 99 to 114
0x118Size of 115 to 130
5 additional bits
0x119Size of 131 to 162
0x11aSize of 163 to 194
0x11bSize of 195 to 226
0x11cSize of 227 to 257
0 additional bits
0x11dSize of 258
Distance codes

The distance codes consist of:

ValueIdentifierDescription
0distance of 1
1distance of 2
2distance of 3
3distance of 4
1 additional bit
4distance of 5 - 6
5distance of 7 - 8
2 additional bits
6distance of 9 - 12
7distance of 13 - 16
3 additional bits
8distance of 17 - 24
9distance of 25 - 32
4 additional bits
10distance of 33 - 48
11distance of 49 - 64
5 additional bits
12distance of 65 - 96
13distance of 97 - 128
6 additional bits
14distance of 129 - 192
15distance of 193 - 256
7 additional bits
16distance of 257 - 384
17distance of 385 - 512
8 additional bits
18distance of 513 - 768
19distance of 769 - 1024
9 additional bits
20distance of 1025 - 1536
21distance of 1537 - 2048
10 additional bits
22distance of 2049 - 3072
23distance of 3073 - 4096
11 additional bits
24distance of 4097 - 6144
25distance of 6145 - 8192
12 additional bits
26distance 8193 - 12288
27distance 12289 - 16384
13 additional bits
28distance 16385 - 24576
29distance 24577 - 32768
other
30-31not used, reserved and illegal but still part of the tree

TODO: complete this section

Additional bits

The additional bits are stored in big-endian (MSB first) and indicate the index into the corresponding array of size values (or base size + additional size).

ValueIdentifierDescription
0 additional bits
0Offset of 1
1Offset of 2
2Offset of 3
3Offset of 4
1 additional bit

TODO: complete this section

Decompression

The decompression in pseudo code:

if block_header.type == HUFFMANN_FIXED:
{
    initialize the fixed Huffman trees
}

do
{
    read block_header from input stream

    if( block_header.type == UNCOMPRESSED )
    {
        align with next byte
        read and check block_header.size and block_header.size_copy
        read data of block_header.size
    }
    else
    {
        if( block_header.type == HUFFMANN_DYNAMIC )
        {
            read the dynamic Huffman trees (see subsection below)
        }
        loop (until end of block code recognized)
        {
            decode literal/length value from input stream
            if( value < 256 )
            {
                copy value (literal byte) to output stream
            }
            else if value = end of block (256)
            {
                 break from loop
             }
             else (value = 257..285)
             {
                 decode distance from input stream

                 move backwards distance bytes in the output
                 stream, and copy length bytes from this
                 position to the output stream.
            }
        }
    }
}
while( block_header.last_block_flag == 0 );

Adler-32 checksum

Zlib provides a highly optimized version of the algorithm provided below.

uint32_t adler32(
          uint8_t *buffer,
          size_t buffer_size,
          uint32_t previous_key )
{
    size_t buffer_iterator = 0;
    uint32_t lower_word    = previous_key & 0xffff;
    uint32_t upper_word    = ( previous_key >> 16 ) & 0xffff;

    for( buffer_iterator = 0;
         buffer_iterator < buffer_size;
         buffer_iterator++ )
    {
        lower_word += buffer[ buffer_iterator ];
        upper_word += lower_word;

        if( ( buffer_iterator != 0 )
         && ( ( buffer_iterator % 0x15b0 == 0 )
          ||  ( buffer_iterator == buffer_size - 1 ) ) )
        {
            lower_word = lower_word % 0xfff1;
            upper_word = upper_word % 0xfff1;
        }
    }
    return( ( upper_word << 16 ) | lower_word );
}

References