Introduction
Keramics provides read-only access to a collection of data formats.
This document is intended as a working document of specifications of data formats used by the Keramics project. These specifications are based on available documentation and analysis of data samples.
Note that these might differ from authorative format specifications and are works in progress.
Storage media image formats
A storage media image format is used to store data from storage media devices such as a hard disk, a floppy or optical disk like CD-ROM or DVD.
Formats
- Expert Witness Compression Format (EWF)
- Expert Witness Compression Format version 2 (EWF2)
- Mac OS sparse bundle
- Mac OS sparse image
- Parallels Disk Image (PDI)
- QEMU Copy-On-Write (QCOW)
- Universal Disk Image Format (UDIF)
- Virtual Hard Disk (VHD)
- Virtual Hard Disk version 2 (VHDX)
- VMWare Virtual Disk Format (VMDK)
Expert Witness Compression Format (EWF)
EWF is short for Expert Witness Compression Format. It is a file type used to store storage media images for digital forensic purposes. It is currently widely used in the field of computer forensics in proprietary tooling like EnCase en FTK. The original specification of the format was provided by ASR Data for the SMART application.
The EWF format was succeeded by the Expert Witness Compression Format version 2 in EnCase 7 (EWF2-Ex01 and EWF2-Lx01). EnCase 7 also uses a different version of EWF-L01 then its predecessors.
Overview
The Expert Witness Compression Format (EWF) is used to store:
- storage media images, such as hard disks, USB sticks, optical disks
- individual volumes or partitions
- “physical” RAM and process memory
EWF can store data compressed or uncompressed, in a single image in one or more segment files. Each segment file consist of a standard header, followed by multiple sections. A single section cannot span multiple files. Sections are arranged back-to-back.
Terminology
In this document when referred to the EWF format it refers to the original specification by ASR Data. The newer formats like that of EnCase are deducted from the original specification and will be referred to as the EWF-E01, because of the default file extension. Whereas the Logical File Evidence (LVF) format introduced in EnCase 5, which is also stored in the EWF format will be referred to as EWF-L01. The SMART format is viewed separately to allow for discussion if the implementation differs from the specification by ASR Data and will be referred to as the EWF-S01, because of the default file extension.
All offsets are relative to the beginning of an individual section, unless otherwise noted. EnCase allows a maximum size of a segment file to be 2000 MiB. This has to do with the size of the offset of the chunk of media data. This is a 32 bit value where the most significant bit (MSB) is used as a compression flag. Therefore the maximum offset size (31 bit) can address about 2048 MiB. In EnCase 6.7 an addition was made to the table value to provide for a base offset to allow for segment files greater than 2048 MiB.
A chunk is defined as the sector size (per default 512 bytes) multiplied by the block size, the number of sectors per chunk (block) (per default 64 sectors). The data within the EWF format is stored in little-endian. The terms block and chunk are used intermittently.
Segment file
EWF stores data in one or more segment files (or segments). Each segment file consists of:
- A file header.
- One or more sections.
File header
Each segment file starts with a file header.
EWF defines that the file header consists of 2 parts, namely:
- a signature part
- fields part
EWF, EWF-E01 and SMART (EWF-S01)
The file header, used by both the EWF-E01 and SMART (EWF-S01) formats, is 13 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | "EVF\x09\x0d\x0a\xff\x00" | Signature |
| 8 | 1 | 0x01 | Start of fields |
| 9 | 2 | Segment number, which must be 1 or higher | |
| 11 | 2 | 0x0000 | End of fields |
The segment number contains a number which refers to the number of the segment file, starting with 1 for the first file.
Note this means there could only be a maximum of 65535 (0xffff) files, if it is an unsigned value.
EWF-L01
The file header, used by the EWF-L01 format, is 13 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | "LVF\x09\x0d\x0a\xff\x00" | Signature |
| 8 | 1 | 0x01 | Start of fields |
| 9 | 2 | Segment number, which must be 1 or higher | |
| 11 | 2 | 0x0000 | End of fields |
The segment number contains a number which refers to the number of the segment file, starting with 1 for the first file.
Note this means there could only be a maximum of 65535 (0xffff) files, if it is an unsigned value.
Segment file extensions
The SMART (EWF-S01) and the EWF-E01 formats use a different naming convention for the segment files.
SMART (EWF-S01)
The SMART (EWF-S01) extension naming has two distinct parts.
- The first segment file has the extension ‘.s01’.
- The next segment file has the extension ’.s02.
- This will continue up to ‘.s99’.
- After which the next segment file has the extension ‘.saa’.
- The next segment file has the extension ‘.sab’.
- This will continue up to ‘.saz’.
- The next segment file has the extension ‘.sba’.
- This will continue up to ‘.szz’.
- The next segment file has the extension ‘.faa’.
- This will continue up to ‘.zzz’.
- Not confirmed but other sources report it will even continue to the use the extensions ‘.{aa’.
Keramics supports extensions up to .zzz
EWF-E01
The EWF-E01 extension naming has two distinct parts.
- The first segment file has the extension ‘.E01’.
- The next segment file has the extension ’.E02.
- This will continue up to ‘.E99’.
- After which the next segment file has the extension ‘.EAA’.
- The next segment file has the extension ‘.EAB’.
- This will continue up to ‘.EAZ’.
- The next segment file has the extension ‘.EBA’.
- This will continue up to ‘.EZZ’.
- The next segment file has the extension ‘.FAA’.
- This will continue up to ‘.ZZZ’.
- Not confirmed but other sources report it will even continue to the use the extensions ‘.[AA’.
Keramics supports extensions up to .ZZZ
EWF-L01
The EWF-L01 extension naming has two distinct parts.
- The first segment file has the extension ‘.L01’.
- The next segment file has the extension ’.L02.
- This will continue up to ‘.L99’.
- After which the next segment file has the extension ‘.LAA’.
- The next segment file has the extension ‘.LAB’.
- This will continue up to ‘.LAZ’.
- The next segment file has the extension ‘.LBA’.
- This will continue up to ‘.LZZ’.
- The next segment file has the extension ‘.MAA’.
- This will continue up to ‘.ZZZ’.
- Not confirmed but other sources report it will even continue to the use the extensions ‘.[AA’.
Keramics supports extensions up to .ZZZ
Segment file set identifier GUID
Segment file sets do not have a strict unique identifier. However the volume section contains a GUID that can be used for this purpose. Where:
- linen 5 to 6 use a time and MAC address based version (1) of the GUID
- EnCase 5 to 7 and linen 6 to 7 use a random based version (4) of the GUID
Note that in linen 6 the switch from a version 1 to 4 GUID was somewhere made between version 6.01 and 6.19.
See RFC4122 for more information about the different GUID versions.
The sections
The remainder of the segment file consists of sections. Every section starts with the same data this will be referred to as the section header.
Section header
The section header consist of 76 bytes, it contains information about a specific section.
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | Section type, a string containing the section type definition, such as "header" or "volume" | |
| 16 | 8 | Next section offset, where the offset is relative from the start of the segment file | |
| 24 | 8 | Section size | |
| 32 | 40 | 0x00 | Unknown (Padding) |
| 72 | 4 | Checksum, which contains an Adler-32 of all the previous data within the section header |
Some sections contain additional data, refer to paragraph section types for more information.
Note Expert Witness 1.35 (for Windows) does not set the section size.
Note that in EnCase 2 DOS version the padding itself does not contains 0-byte values but data, probably the memory is not filled with 0-byte values.
Section types
There are multiple section types. ASR Data - E01 Compression Format defines the following:
- Header section
- Volume section
- Table section
- Next and Done section
The following sections type were found analyzing more recent EnCase files (EWF-E01):
- Header2 section
- Disk section
- Sectors section
- Table2 section
- Data section
- Error2 section
- Session section
- Hash section
- Digest section
The following sections type were found analyzing more recent EnCase files (EWF-L01):
- Ltree section
- Ltypes section
Header2 section
The header2 section is identified in the section data type field as “header2”. Some aspects of this section are:
- Found in EWF-E01 in EnCase 4 to 7, and EWF-L01 in EnCase 5 to 7
- Found at the start of the first segment file. Not found in subsequent segment files.
- The same header2 section is found twice directly after one and other.
The additional data this section contains is the following:
| Offset | Size | Value | Description |
|---|---|---|---|
| 76 (0x4c) | ... | Information about the acquired media |
The information about the acquired media consists of zlib compressed data. It contains text in UTF16 format specifying information about the acquired media. The text multiple lines separated by an end of line character(s).
The first 2 bytes of the UTF16 string are the byte order mark (BOM):
- 0xff 0xfe for UTF-16 litte-endian
- 0xfe 0xff for UTF-16 big-endian
In the next paragraphs the various variants of the header2 section are described.
EnCase 4 (EWF-E01)
In EnCase 4 (EWF-E01) the header2 information consist of 5 lines, and contains the equivalent information as the header section.
| Line number | Value | Description |
|---|---|---|
| 1 | 1 | The number of categories provided |
| 2 | main | The name/type of the category provided |
| 3 | Identifiers for the values in the 4th line | |
| 4 | The data for the different identifiers in the 3rd line | |
| 5 | (an empty line) |
The end of line character(s) is a newline (0x0a).
Note this end of line character differs from the one used in the header section.
The 3rd and the 4th line consist of the following tab (0x09) separated values.
| Identifier number | Character in 3rd line | Value in 4th line |
|---|---|---|
| 1 | a | Unique description |
| 2 | c | Case number |
| 3 | n | Evidence number |
| 4 | e | Examiner name |
| 5 | t | Notes |
| 6 | av | Version, which contains the EnCase version used to acquire the media |
| 7 | ov | Platform, which contains the platform/operating system used to acquire the media |
| 8 | m | Acquisition date and time |
| 9 | u | System date and time |
| 10 | p | Password hash |
Also see header2 values
Note the hashing algorithm is the same as for the header section.
EnCase 5 to 7 (EWF-E01)
In EnCase 5 to 7 (EWF-E01) the header2 information consist of 17 lines, and contains:
| Line number | Value | Description |
|---|---|---|
| 1 | 3 | The number of categories provided |
| 2 | main | The name/type of the category provided |
| 3 | Identifier for the values in the category | |
| 4 | The data for the different identifiers in the category | |
| 5 | (an empty line) | |
| 6 | srce | The name/type of the category provided, also see sources category |
| 7 | ||
| 8 | Identifier for the values in the category | |
| 9 | The data for the different identifiers in the category | |
| 10 | ||
| 11 | (an empty line) | |
| 12 | sub | The name/type of the category provided, also see subjects category |
| 13 | ||
| 14 | Identifier for the values in the category | |
| 15 | The data for the different identifiers in the category | |
| 16 | ||
| 17 | (an empty line) |
The end of line character(s) is a newline (0x0a).
Main category
The 3rd and the 4th line consist of the following tab (0x09) separated values.
Note the actual values in this category are dependent on the version of EnCase.
| Identifier number | Character in 3rd line | Value in 4th line |
|---|---|---|
| 1 | a | Unique description |
| 2 | c | Case number |
| 3 | n | Evidence number |
| 4 | e | Examiner name |
| 5 | t | Notes |
| 6 | md | The model of the media, such as hard disk model (introduced in EnCase 6) |
| 7 | sn | The serial number of media (introduced in EnCase 6) |
| 8 | l | The device label (introduced in EnCase 6.19) |
| 9 | av | Version, which contains the EnCase version used to acquire the media. EnCase limits this value to 12 characters |
| 10 | ov | Platform, which contains the platform/operating system used to acquire the media |
| 11 | m | Acquisition date and time |
| 12 | u | System date and time |
| 13 | p | Password hash |
| 14 | pid | Process identifier, which contains the identifier of the process memory acquired (introduced in EnCase 6.12/Winen 6.11) |
| 15 | dc | Unknown |
| 16 | ext | Extents, which contains the extents of the process memory acquired (introduced in EnCase 6.12/Winen 6.11) |
Also see header2 values
Note that both the acquiry and system date and time are empty in a file created by winen.
Note that the date values in the header section (not the header2 section) are set to: “Thu Jan 1 00:00:00 1970”. Where the time is dependent on the time zone and daylight savings.
Note that in a Logicube Dossier generated header2 section an additional emtpy value in the 4th line was observed. The number of values in the 3rd and 4th can differ.
Sources category
Line 6 the srce category contains information about acquisition sources.
TODO: describe what a source is in the context of EnCase.
Line 7 consists of 2 values, namely the values are “0 1”.
The 8th line consist of the following tab (0x09) separated values.
Note that the actual values in this category are dependent on the version of EnCase.
| Identifier number | Character in 8rd line | Meaning |
|---|---|---|
| 1 | p | |
| 2 | n | |
| 3 | id | Identifier, which contains an integer identifying the source |
| 4 | ev | Evidence number, which contains a string |
| 5 | tb | Total bytes, which contains an integer |
| 6 | lo | Logical offset, which contains an integer which is -1 when value is not set |
| 7 | po | Physical offset, which contains an integer which is -1 when value is not set |
| 8 | ah | MD5 hash, which contains a string with the MD5 hash of the source |
| 9 | sh | SHA1 hash, contains a string with the SHA1 hash of the source (introduced in EnCase 6.19) |
| 10 | gu | Device GUID, which contains a string with a GUID or "0" if not set |
| 11 | pgu | Primary device GUID, which contains a string with a GUID or "0" if not set (introduced in EnCase 7) |
| 12 | aq | Acquisition date and time, which contains an integer with a POSIX timestamp |
Line 9 consists of 2 values, namely the values are “0 0”.
Line 10 contains the values defined by line 8.
Note the default values of some of these values has changed around EnCase 6.12.
If the “ha” value contains “00000000000000000000000000000000” this means the MD5 hash is not set. The same applies for the “sha” value when it contains “0000000000000000000000000000000000000000” the SHA1 has is not set.
Subjects category
Line 12 the sub category contains information about subjects.
TODO: describe what a subject is in the context of EnCase.
Line 13 consists of 2 values, namely the values are “0 1”.
The 14th line consist of the following tab (0x09) separated values.
| Identifier number | Character in 14rd line | Meaning |
|---|---|---|
| 1 | p | |
| 2 | n | |
| 3 | id | Identifier, which contains an integer identifying the subject |
| 4 | nu | Unknown (Number) |
| 5 | co | Unknown (Comment) |
| 6 | gu | Unknown (GUID) |
Line 15 consists of 2 values, namely the values are “0 0”.
Line 16 contains the values defined by line 14.
Note that the default values of some of these values has changed around EnCase 6.12.
EnCase 5 to 7 (EWF-L01)
The EnCase 5 to 7 (EWF-E01) header2 section specification also applies to the EnCase 5 to 7 (EWF-L01) format. However:
- both the acquired and system date and time are not set
Header2 values
| Identifier | Description | Notes |
|---|---|---|
| a | Unique description | Free form string. Note that EnCase might not respond when this value is large e.g. >= 1 MiB |
| av | Version | Free form string. EnCase limits this string to 12 - 1 characters |
| c | Case number | Free form string. EnCase limits this string to 3000 - 1 characters |
| dc | Unknown | |
| e | Examiner name | Free form string. EnCase limits this string to 3000 - 1 characters |
| ext | Extents | Extents header value |
| l | Device label | Free form string |
| m | Acquisition date and time | String containing POSIX 32-bit epoch timestamp, e.g. "1142163845" which represents the date: March 12 2006, 11:44:05 |
| md | Model | Free form string. EnCase limits this string to 3000 - 1 characters |
| n | Evidence number | Free form string. EnCase limits this string to 3000 - 1 characters |
| ov | Platform | Free form string. EnCase limits this string to 24 - 1 characters |
| pid | Process identifier | String containing the process identifier (pid) number |
| p | Password hash | String containing the password hash. If no password is set it should be simply the character '0' |
| sn | Serial Number | Free form string. EnCase limits this string to 3000 - 1 characters |
| t | Notes | Free form string. EnCase limits this string to 3000 - 1 characters |
| u | System date and time | String containing POSIX 32-bit epoch timestamp, e.g. "1142163845" which represents the date: March 12 2006, 11:44:05 |
Note the restrictions were tested with EnCase 7.02.01, older versions could have a restriction of 40 characters instead of 3000 characters.
Extents header value
An extents header value consist of:
number of entries
entries that consist of: S <1> <2> <3>
Header section
The header section is identified in the section data type field as “header”. Some aspects of this section are:
- Defined in ASR Data - E01 Compression Format
- Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)
- Found at the start of the first segment file or in EnCase 4 to 7 after the header2 section in the first segment file. Typically not found in subsequent segment files with the exception of Logicube Dossier generated EWF-E01 files.
The additional data this section contains is the following:
| Offset | Size | Value | Description |
|---|---|---|---|
| 76 (0x4c) | ... | Information about the acquired media |
The information about the acquired media consists of zlib compressed data. It contains text in ASCII format specifying information about the acquired media. The text multiple lines separated by an end of line character(s).
In the next paragraphs the various variants of the header section are described. In all cases the information consists of at least 4 lines:
| Line number | Value | Description |
|---|---|---|
| 1 | 1 | The number of categories provided |
| 2 | main | The name/type of the category provided |
| 3 | Identifiers for the values in the 4th line | |
| 4 | The data for the different identifiers in the 3rd line |
An additional 5th line is found in FTK Imager, EnCase 1 to 7 (EWF-E01).
| Line number | Value | Description |
|---|---|---|
| 5 | (an empty line) |
EWF format
Some aspects of this section are:
- ASR Data - E01 Compression Format specifies the end of line character(s) is a newline (0x0a).
According to ASR Data - E01 Compression Format the 3rd and the 4th line consist of the following tab (0x09) separated values:
| Identifier number | Character in 3rd line | Value in 4th line |
|---|---|---|
| 1 | c | Case number |
| 2 | n | Evidence number |
| 3 | a | Unique description |
| 4 | e | Examiner name |
| 5 | t | Notes |
| 6 | m | Acquisition date and time |
| 7 | u | System date and time |
| 8 | p | Password hash |
| 9 | r | Compression level |
Also see header values
ASR Data - E01 Compression Format states that the Expert Witness Compression uses ‘f’, fastest compression.
EnCase 1 (EWF-E01)
Some aspects of this section are:
- The header section is defined only once.
- It is the first section of the first segment file. It is not found in subsequent segment files.
- The header data itself is compressed using zlib.
- The end of line character(s) is a carriage return (0x0d) followed by a newline (0x0a).
The 3rd and the 4th line consist of the following tab (0x09) separated values“
| Identifier number | Character in 3rd line | Value in 4th line |
|---|---|---|
| 1 | c | Case number |
| 2 | n | Evidence number |
| 3 | a | Unique description |
| 4 | e | Examiner name |
| 5 | t | Notes |
| 6 | m | Acquisition date and time |
| 7 | u | System date and time |
| 8 | p | Password hash |
| 9 | r | Compression level |
Also see header values
SMART (EWF-S01)
Some aspects of this section are:
- The header section is defined once.
- It is the first section of the first segment file. It is not found in subsequent segment files.
- The header data is always processed by zlib, however the same compression level is used as for the chunks. This could mean compression level 0 which is no compression.
The SMART format uses the FTK Imager (EWF-E01) specification for this section. Note that this could be something FTK Imager specific.
EnCase 2 and 3 (EWF-E01)
Some aspects of this section are:
- The same header section defined twice.
- It is the first and second section of the first segment file. It is not found in subsequent segment files.
- The header data itself is compressed using zlib.
- The end of line character(s) is a carriage return (0x0d) followed by a newline (0x0a).
The 3rd and the 4th line consist of the following tab (0x09) separated values:
| Identifier number | Character in 3rd line | Value in 4th line |
|---|---|---|
| 1 | c | Case number |
| 2 | n | Evidence number |
| 3 | a | Unique description |
| 4 | e | Examiner name |
| 5 | t | Notes |
| 6 | av | Version, which contains the EnCase version used to acquire the media |
| 7 | ov | Platform, which contains the platform/operating system used to acquire the media |
| 8 | m | Acquisition date and time |
| 9 | u | System date and time |
| 10 | p | Password hash |
| 11 | r | Compression level |
Also see header values
EnCase 4 to 7 (EWF-E01)
Some aspects of this section are:
- The header is defined only once.
- It resides after the header2 sections of the first segment file. It is not found in subsequent segment files.
- The header data itself is compressed using zlib.
- The end of line character(s) is a carriage return (0x0d) followed by a newline (0x0a).
The 3rd and the 4th line consist of the following tab (0x09) separated values:
| Identifier number | Character in 3rd line | Value in 4th line |
|---|---|---|
| 1 | c | Case number |
| 2 | n | Evidence number |
| 3 | a | Unique description |
| 4 | e | Examiner name |
| 5 | t | Notes |
| 6 | av | Version, which contains the EnCase version used to acquire the media |
| 7 | ov | Platform, which contains the platform/operating system used to acquire the media |
| 8 | m | Acquisition date and time |
| 9 | u | System date and time |
| 10 | p | Password hash |
Also see header values
linen 5 to 7 (EWF-E01)
Some aspects of this section are:
- The same header section defined twice.
- It is the first and second section of the first segment file. It is not found in subsequent segment files.
- The header data itself is compressed using zlib.
- The end of line character(s) is a newline (0x0a).
The header information consist of 18 lines
The remainder of the string contains the following information:
| Line number | Value | Description |
|---|---|---|
| 1 | 3 | The number of categories provided |
| 2 | main | The name/type of the category provided |
| 3 | Identifier for the values in the 4th line | |
| 4 | The data for the different identifiers in the 3rd line | |
| 5 | (an empty line) | |
| 6 | srce | The name/type of the section provided, also see Sources category |
| 7 | ||
| 8 | Identifier for the values in the section | |
| 9 | ||
| 10 | ||
| 11 | (an empty line) | |
| 12 | sub | The name/type of the section provided, also see Subjects category |
| 13 | ||
| 14 | Identifier for the values in the section | |
| 15 | ||
| 16 | ||
| 17 | (an empty line) |
The end of line character(s) is a newline (0x0a).
Main category - linen 5
The 3rd and the 4th line consist of the following tab (0x09) separated values.
Note the actual values in this category are dependent on the version of linen.
| Identifier number | Character in 3rd line | Value in 4th line |
|---|---|---|
| 1 | a | Unique description |
| 2 | c | Case number |
| 3 | n | Evidence number |
| 4 | e | Examiner name |
| 5 | t | Notes |
| 6 | av | Version, which contains the linen version used to acquire the media |
| 7 | ov | Platform, which contains the platform/operating system used to acquire the media |
| 8 | m | Acquisition date and time |
| 9 | u | System date and time |
| 10 | p | Password hash |
Also see header values
Main category - linen 6 to 7
The 3rd and the 4th line consist of the following tab (0x09) separated values.
Note the actual values in this category are dependent on the version of linen.
| Identifier number | Character in 3rd line | Value in 4th line |
|---|---|---|
| 1 | a | Unique description |
| 2 | c | Case number |
| 3 | n | Evidence number |
| 4 | e | Examiner name |
| 5 | t | Notes |
| 6 | md | The model of the media, such as hard disk model (Introduced in linen 6) |
| 7 | sn | The serial number of media (Introduced in linen 6) |
| 8 | l | The device label (Introduced in linen 6.19) |
| 9 | av | Version, which contains the linen version used to acquire the media |
| 10 | ov | Platform, which contains the platform/operating system used to acquire the media |
| 11 | m | Acquisition date and time |
| 12 | u | System date and time |
| 13 | p | Password hash |
| 14 | pid | Process identifier, which contains the identifier of the process memory acquired (Introduced in linen 6.19 or earlier) |
| 15 | dc | Unknown (Introduced in linen 6) |
| 16 | ext | Extents, which contains the extents of the process memory acquired (Introduced in linen 6.19 or earlier) |
Note as of linen 6.19 the acquire date and time is in UTC and the system date and time is in local time. Where as before both values were in local time.
Also see header values
Sources category
Line 6 the srce category contains information about acquisition sources
TODO: describe what a source is in the context of EnCase.
Line 7 consists of 2 values, namely the values are “0 1”.
The 8th line consist of the following tab (0x09) separated values.
| Identifier number | Character in 8rd line | Meaning |
|---|---|---|
| 1 | p | |
| 2 | n | |
| 3 | id | Identifier, which contains an integer identifying the source |
| 4 | ev | Evidence number, which contains a string |
| 5 | tb | Total bytes, which contains an integer |
| 6 | lo | Logical offset, which contains an integer which is -1 when value is not set |
| 7 | po | Physical offset, which contains an integer which is -1 when value is not set |
| 8 | ah | Unknown (MD5?), which contains a string |
| 9 | sh | Unknown (SHA1?), which contains a string (Introduced in linen 6.19 or earlier) |
| 10 | gu | Device GUID, which contains a string with a GUID or "0" if not set |
| 11 | aq | Acquisition date and time, which contains an integer with a POSIX timestamp |
Line 9 consists of 2 values, namely the values are “0 0”.
Line 10 contains the values defined by line 8.
Note the default values of some of these values has changed around linen 6.19 or earlier.
Subjects category
Line 12 the sub category contains information about subjects.
TODO: describe what a subject is in the context of EnCase.
Line 13 consists of 2 values, namely the values are “0 1”.
The 14th line consist of the following tab (0x09) separated values.
| Identifier number | Character in 14rd line | Meaning |
|---|---|---|
| 1 | p | |
| 2 | n | |
| 3 | id | Identifier, which contains an integer identifying the subject |
| 4 | nu | Unknown (Number) |
| 5 | co | Unknown (Comment) |
| 6 | gu | Unknown (GUID) |
Line 15 consists of 2 values, namely the values are “0 0”.
Line 16 contains the values defined by line 14.
Note the default values of some of these values has changed around linen 6.19 or earlier.
FTK Imager (EWF-E01)
Some aspects of this section are:
- In FTK Imager (EWF-E01) the same header section defined twice.
- It is the first and second section of the first segment file. It is not found in subsequent segment files.
- The header data itself is compressed using zlib. Note that the compression level can be none and therefore the header looks uncompressed.
- In FTK Imager the end of line character(s) is a newline (0x0a).
The 3rd and the 4th line consist of the following tab (0x09) separated values:
| Identifier number | Character in 3rd line | Value in 4th line |
|---|---|---|
| 1 | c | Case number |
| 2 | n | Evidence number |
| 3 | a | Unique description |
| 4 | e | Examiner name |
| 5 | t | Notes |
| 6 | av | Version, which contains the FTK Imager version used to acquire the media |
| 7 | ov | Platform, which contains the platform/operating system used to acquire the media |
| 8 | m | Acquisition date and time |
| 9 | u | System date and time |
| 10 | p | Password hash |
| 11 | r | Compression level |
Also see header values
EnCase 5 to 7 (EWF-L01)
The EnCase 4 to 7 (EWF-E01) header section specification is also used for the EnCase 5 to 7 (EWF-L01) format, with the following aspects:
- In EnCase 5 both the acquired and system date and time are set to 0.
- In EnCase 6 and 7 both the acquired and system date and time are set to Jan 1, 1970 00:00:00 (the time is dependent on the local timezone and daylight savings)
Header values
| Identifier | Description | Notes |
|---|---|---|
| a | Unique description | Free form string. Note that EnCase might not respond when this value is large e.g. >= 1 MiB |
| av | Version | Free form string. EnCase limits this string to 12 - 1 characters |
| c | Case number | Free form string. EnCase limits this string to 3000 - 1 characters |
| dc | Unknown | |
| e | Examiner name | Free form string. EnCase limits this string to 3000 - 1 characters |
| ext | Extents | Extents header value |
| l | Device label | Free form string |
| m | Acquisition date and time | Contains a date and time header value |
| md | Model | Free form string. EnCase limits this string to 3000 - 1 characters |
| n | Evidence number | Free form string. EnCase limits this string to 3000 - 1 characters |
| ov | Platform | Free form string. EnCase limits this string to 24 -1 characters |
| pid | Process identifier | String containing the process identifier (pid) number |
| p | Password hash | String containing the password hash. If no password is set it should be simply the character '0' |
| r | Compression level | Compression header value |
| sn | Serial Number | Free form string. EnCase limits this string to 3000 - 1 characters |
| t | Notes | Free form string. EnCase limits this string to 3000 - 1 characters |
| u | Systemdate and time | Contains a date and time header value |
Note the restrictions were tested with EnCase 7.02.01, older versions could have a restriction of 40 characters instead of 3000 characters.
Date and time header value
In EnCase a date and time contains a string of individual values separated by a space, e.g. “2002 3 4 10 19 59”, which represents March 4, 2002 10:19:59.
In linen a date and time contains a string with a POSIX 32-bit epoch timestamp, e.g. “1142163845” which represents the date: March 12 2006, 11:44:05
Extents header value
An extents header value consist of:
number of entries
entries that consist of: S <1> <2> <3>
Compression header value
A compression header value consist of a single character that represent the compression level.
| Character value | Meaning |
|---|---|
| b | Best compression is used |
| f | Fastest compression is used |
| n | No compression is used |
Notes
There should not be a tab, carriage return and newline characters within the text in the 4th line. Or is there a method to escape these characters?
ASR Data - E01 Compression Format states that these characters should not be used in the free form text. Need to confirm this, the specification only speaks of a newline character.
Currently the password has no a additional value than allow an application check it. The data itself is not protected using the password. The password hashing algorithm is unknown. Need to find out. And does the algorithm differ per EnCase version? probably not. The algorithm does not differ in EnCase 1 to 7. FTK Imager does not bother with a password.
Volume section
The volume section is identified in the section data type field as “volume”. Some aspects of this section are:
- Defined in ASR Data - E01 Compression Format
- Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)
- Found after the header section of the first segment file. Not found in subsequent segment files.
In the next paragraphs the various versions of the volume section are described.
EWF specification
The specification according to ASR Data - E01 Compression Format.
The volume section data is 94 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | 0x01 | Unknown (Reserved) |
| 4 | 4 | The number of chunks within the all segment files | |
| 8 | 4 | The number of sectors per chunk, which contains 64 per default | |
| 12 | 4 | The number of bytes per sectors, which contains 512 per default | |
| 16 | 4 | The sectors count, the number of sectors within all segment files | |
| 20 | 20 | 0x00 | Unknown (Reserved) |
| 40 | 45 | 0x00 | Unknown (Padding) |
| 85 | 5 | Signature, which contains the EWF file header signature | |
| 90 | 4 | Checksum, which contains an Adler-32 of all the previous data within the volume section data |
The number of chunks is a 32-bit value this means it maximum of addressable chunks would be: 4294967295 (= 2^32 - 1). For a chunk size of 32768 x 4294967295 = about 127 TiB. The maximum segment file amount is 2^16 - 1 = 65535. This allows for an equal number of storage if a segment file is filled to its maximum number of chunks.
However Keramics is restricted at 14295 segment files, due to the extension naming schema of the segment files.
SMART (EWF-S01)
The SMART format uses the EWF specification for this section.
In SMART the signature (reverse) value is the string “SMART” (0x53 0x4d 0x41 0x52 0x54) instead of the file header signature.
FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01)
The specification for FTK Imager, EnCase 1 to 7 and linen 5 to 7.
The volume section data is 1052 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | Media type | |
| 1 | 3 | 0x00 | Unknown (empty values) |
| 4 | 4 | The number of chunks within the all segment files | |
| 8 | 4 | The number of sectors per chunk (or block size), which contains 64 per default. EnCase 5 is the first version which allows this value to be different than 64 | |
| 12 | 4 | The number of bytes per sector | |
| 16 | 8 | The sectors count, which contains the number of sectors within all segment files. This value probably has been changed in EnCase 6 from a 32-bit value to a 64-bit value to support media >2TiB | |
| 24 | 4 | The number of cylinders of the C:H:S value, which most of the time this value is empty (0x00) | |
| 28 | 4 | The number of heads of the C:H:S value, which most of the time this value is empty (0x00) | |
| 32 | 4 | The number of sectors of the C:H:S value, which most of the time this value is empty (0x00) | |
| 36 | 1 | Media flags | |
| 37 | 3 | 0x00 | Unknown (empty values) |
| 40 | 4 | PALM volume start sector | |
| 44 | 4 | 0x00 | Unknown (empty values) |
| 48 | 4 | SMART logs start sector, which contains an offset relative from the end of media, e.g. a value of 10 would refer to sector = number of sectors - 10 | |
| 52 | 1 | Compression level (Introduced in EnCase 5) | |
| 53 | 3 | 0x00 | Unknown (empty values, these values seem to be part of the compression level) |
| 56 | 4 | The sector error granularity, which contains the error block size (Introduced in EnCase 5) | |
| 60 | 4 | 0x00 | Unknown (empty values) |
| 64 | 16 | Segment file set identifier, which contains a GUID/UUID generated on the acquiry system probably used to uniquely identify a set of segment files (Introduced in EnCase 5) | |
| 80 | 963 | 0x00 | Unknown (empty values) |
| 1043 | 5 | 0x00 | Unknown (Signature) |
| 1048 | 4 | Checksum, which contains an Adler-32 of all the previous data within the volume section data |
TODO: a value that could be in the volume is the RAID stripe size
Note that EnCase requires for media that contains no partition table that the is physical media flag is not set and vice versa. Other tools like FTK check the actual storage media data.
EnCase 5 to 7 (EWF-L01)
The EWF-L01 format uses the EnCase 5 (EWF-E01) volume section specification. However:
- the volume type contains 0x0e
- the number of chunks is 0
- the number of bytes per sectors is some kind of block size value (4096), perhaps the source file system block size
- the sectors count, represents some other value because (sector_size x sector_amount != total_size). The total size is in the ltree section.
Media type
| Value | Identifier | Description |
|---|---|---|
| 0x00 | A removable storage media device | |
| 0x01 | A fixed storage media device | |
| 0x03 | An optical disc (CD/DVD/BD) | |
| 0x0e | Logical Evidence (LEF or L01) | |
| 0x10 | Physical Memory (RAM) or process memory |
Note that FTK imager versions, before version 2.9, set the storage media to fixed (0x01). The exact version of FTK imager where this behavior changed is unknown.
Media flags
| Value | Identifier | Description |
|---|---|---|
| 0x01 | Is an image file. In FTK Imager, EnCase 1 to 7 this bit is always set, when not set EnCase seems to see the image file as a device | |
| 0x02 | Is physical device or device type, where 0 represents a non physical device (logical) and 1 represents a physical device | |
| 0x04 | Fastbloc write blocker used | |
| 0x08 | Tableau write blocker used. This was added in EnCase 6.13 |
Note that if both the the Fastbloc and Tableau write blocker media flags are set EnCase only shows the Fastbloc.
Compression level
| Value | Identifier | Description |
|---|---|---|
| 0x00 | no compression | |
| 0x01 | good compression | |
| 0x02 | best compression |
Note that EnCase 7 no longer provides the fast and best compression options.
Disk section
The disk section is identified in the section data type field as “disk”. Some aspects of this section are:
- Not defined in ASR Data - E01 Compression Format.
- Not found in SMART (EWF-S01).
With a disk section in an FTK Imager 2.3 (EWF-E01) image it was confirmed that the disk section is the same as the volume section.
Note that the disk section was found only in FTK Imager 2.3 when acquiring a physical disk not a floppy. This requires additional research, it is currently assumed that the disk section some old method to differentiate between a partition (volume) image or a physical disk image.
Data section
The data section is identified in the section data type field as “data”. Some aspects of this section are:
- Not defined in ASR Data - E01 Compression Format.
- Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, and EWF-L01 in EnCase 5 to 7. Not found in SMART (EWF-S01).
- For multiple segment files it does not reside in the first segment file. For a single segment file it does.
- Found after the last table2 section in a single segment file or for multiple segment files at the start of the segment files, except for the first.
- The data section has data it should should contain the same information as the volume section.
The data section is a copy of the volume section.
FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01)
Note that in Logicube products (Talon (firmware predating April 2013) and Forensic dossier (before version 3.3.3RC16)) the checksum is not calculated and set to 0.
Sectors section
The sectors section is identified in the section data type field as “sectors”. Some aspects of this section are:
- Not defined in ASR Data - E01 Compression Format.
- Found in EWF-E01 in EnCase 2 to 7, or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7. Not found in EnCase 1 (EWF-E01) or SMART (EWF-S01).
- The first sectors section can be found after the volume section in the first segment file or at the after the data section in subsequent segment files. Successive sector data sections are found after the sector table2 section.
The sectors section contains the actual chunks of media data.
- The sectors section can contain multiple chunks.
- The default size of a chunk is 32768 bytes of data (64 standard sectors, with a size of 512 bytes per sector). It is possible in EnCase 5 and 6 and linen 5 and 6 to change the number of sectors per block to 64, 128, 256, 1024, 2048, 4096, 8192, 16384 or 32768. In EnCase 7 and linen 7 this has been reduced to 64, 128, 256, 1024.
Data chunk
The first chunk is often located directly after the section header, although the format does not require this.
When the data is compressed and the compressed data (with checksum) is larger than the uncompressed data (without the checksum) the data chunk is stored uncompressed. The default size of a chunk is 32768 bytes of data (64 standard sectors).
An uncompressed data chunk is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | ... | Uncompressed chunk data | |
| ... | 4 | Checksum, which contains an Adler-32 of the chunk data |
The compressed data chunk consist of zlib compressed data. The checksum of the compressed data chunk is part the zlib compressed data format.
Optical disc images
For a MODE-1 CD-ROM optical disc image EnCase only seems to support 2048 bytes per sector (the data).
The raw sector size of a MODE-1 CD-ROM is 2352 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | Synchronization bytes | |
| 16 | 2048 | Data | |
| 2054 | 4 | Error detection | |
| 2058 | 8 | 0x00 | Unknown (Empty values) |
| 2066 | 276 | Error correction |
TODO: add information about Mode-2 and Mode-XA
Table section
The table section is identified in the section data type field as “table”. Some aspects of this section are:
- Defined in ASR Data - E01 Compression Format.
- Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)
Note that the offsets within the section header are 8 bytes (64 bits) of size while the offsets in the table entry array are 4 bytes (32 bits) in size.
In the next paragraphs the various versions of the table section are described.
EWF specification
Some aspects of the table section according to the EWF specification are:
- The first table section resides after the volume section in the first segment file or after the file header in subsequent segment files.
- It can be found in every segment file.
The table section consists of:
- the table header
- an array of table entries
- the data chunks
Table header
The table header is 24 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | The number of entries | |
| 4 | 16 | 0x00 | Unknown (Padding) |
| 20 | 4 | Checksum, which contains an Adler-32 of all the previous data within the table header data |
According to ASR Data - E01 Compression Format
- the number of entries, contains 0x01
- the table can hold 16375 entries if more entries are required an additional table section should be created.
Table entry
The table entry is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Chunk data offset |
The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).
A chunk data offset points to the start of the chunk of media data, which resides in the same table section within the segment file. The offset contains a value relative to the start of the file.
Data chunk
The first chunk is often located directly after the last table entry, although the format does not require this.
A data chunk is always compressed even when no compression is required. This approach provides a checksum for each chunk. The default size of a chunk is 32768 bytes of data (64 standard sectors). The resulting size of the “compressed” chunk can therefore be larger than the default chunk size.
Note that this was deducted from the behavior of FTK Imager for SMART (EWF-S01).
The compressed data chunk consist of zlib compressed data. The checksum of the compressed data chunk is part the zlib compressed data format.
SMART (EWF-S01)
The table section in the SMART (EWF-S01) format is equivalent to that of the EWF specification.
EnCase 1 (EWF-E01)
Some aspects of this section are:
- The table section resides after the volume section in the first segment file or after the file header in subsequent segment files.
- It can be found in every segment file.
The table section consists of:
- the table header
- an array of table entries
- the table footer
- the data chunks
Table header
The table header is 24 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | The number of entries | |
| 4 | 16 | 0x00 | Unknown (Padding) |
| 20 | 4 | Checksum, which contains an Adler-32 of all the previous data within the table header data |
The table can hold 16375 entries if more entries are required an additional table section should be created.
Table entry
The table entry is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Chunk data offset |
The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).
A chunk data offset points to the start of the chunk of media data, which resides in the same table section within the segment file. The offset contains a value relative to the start of the file.
Table footer
The table footer is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Checksum, which contains an Adler-32 of the offset array |
Data chunk
The first chunk is often located directly after the table footer, although the format does not require this.
When the data is compressed and the compressed data (with checksum) is larger than the uncompressed data (without the checksum) the data chunk is stored uncompressed. The default size of a chunk is 32768 bytes of data (64 standard sectors).
An uncompressed data chunk is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | ... | Uncompressed chunk data | |
| ... | 4 | Checksum, which contains an Adler-32 of the chunk data |
The compressed data chunk consist of zlib compressed data. The checksum of the compressed data chunk is part the zlib compressed data format.
FTK Imager and EnCase 2 to 5 and linen 5 (EWF-E01)
Some aspects of this section are:
- The table section resides after the sectors section.
- It can be found in every segment file.
- The data chunks are no longer stored in this section but in the sectors section instead.
- The table2 section contains a mirror copy of the table section. In EWF-E01 it is always present.
The table section consists of:
- the table header
- an array of table entries
- the table footer
Table header
The sector table header is 24 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | The number of entries | |
| 4 | 16 | 0x00 | Unknown (Padding) |
| 20 | 4 | Checksum, which contains an Adler-32 of all the previous data within the table header data |
The table section can hold 16375 entries. A new table section should be created to hold more entries. Both FTK Imager and EnCase 5 can handle more than 16375, FTK 1 cannot. To contain more than 16375 chunks new sectors, table and table2 sections need to be created after the table2 section.
Table entry
The table entry is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Chunk data offset |
The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).
A chunk data offset points to the start of the chunk of media data, which resides in the preceding sectors section within the segment file. The offset contains a value relative to the start of the file.
Table footer
The table footer is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Checksum, which contains an Adler-32 of the offset array |
EnCase 6 to 7 and linen 6 to 7 (EWF-E01)
Some aspects of this section are:
- Every segment file contains its own table section.
- It resides after the sectors section.
- The data chunks are no longer stored in this section but in the sectors section instead.
- The table2 section contains a mirror copy of the table section. In EWF-E01 it is always present.
The table section consists of:
- the table header
- an array of table entries
- the table footer
Table header
The sector table header is 24 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | The number of entries | |
| 4 | 4 | 0x00 | Unknown (Padding) |
| 8 | 8 | The table base offset | |
| 16 | 4 | 0x00 | Unknown (Padding) |
| 20 | 4 | Checksum, which contains an Adler-32 of all the previous data within the table header data |
As of EnCase 6 the number of entries is no longer restricted to 16375 entries. The new limit seems to be 65534.
Table entry
The table entry is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Chunk data offset |
The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).
A chunk data offset points to the start of the chunk of media data, which resides in the preceding sectors section within the segment file. The offset contains a value relative to the table base offset.
In EnCase 6.7.1 the sectors section can be larger than 2048Mb. The table entries offsets are 31 bit values in EnCase6 the offset in a table entry value will actually use the full 32 bit if the 2048Mb has been exceeded. This behavior is no longer present in EnCase 6.8 so it is assumed to be a bug. Libewf currently assumes that the if the 31 bit value overflows the following chunks are uncompressed. This allows EnCase 6.7.1 faulty EWF files to be converted.
Table footer
The table footer is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Checksum, which contains an Adler-32 of the offset array |
EnCase 6 to 7 (EWF-L01)
The EWF-L01 format uses the EnCase 6 to 7 (EWF-E01) table section specification.
Table2 section
The table2 section is identified in the section data type field as “table2”. Some aspects of this section are:
- Not defined in ASR Data - E01 Compression Format.
- Found in EWF-E01 in EnCase 2 to 7, or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7. Not found in EnCase 1 (EWF-E01) or SMART (EWF-S01).
- Uses the same format as the table section.
- Resides directly after the table section.
FTK Imager and EnCase 2 to 7 and linen 5 to 7 (EWF-E01)
The table2 section contains a mirror copy of the table section. Probably intended for recovery purposes.
EnCase 5 to 7 (EWF-L01)
The EWF-L01 format uses the EWF-E01 table2 section specification.
Next section
The next section is identified in the section data type field as “next”. Some aspects of this section are:
- Defined in ASR Data - E01 Compression Format.
- Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)
- The last section within a segment other than the last segment file.
- The offset to the next section in the section header of the next section point to itself (the start of the next section).
- It should be the last section in a segment file, other than the last segment file.
SMART (EWF-S01)
It resides after the table or table2 section.
FTK Imager, EnCase and linen (EWF-E01)
It resides after the data section in a single segment file or for multiple segment files after the table2 section.
In the EnCase (EWF-E01) format the size in the section header is 0 instead of 76 (the size of the section header).
Note that FTK imager versions before 2.9 sets the section size to 76. At the moment it is unknown in which version this behavior was changed.
Ltypes section
The ltypes section is identifier in the section data type field as “ltypes”. Some aspects of this section are:
- Found in EWF-L01 in of EnCase 7
- Found in the last segment file after table2 section before tree section.
The additional ltypes section data is 6 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Unknown | |
| 2 | 2 | Unknown | |
| 4 | 2 | Unknown |
Ltree section
The ltree section is identifier in the section data type field as “ltree”. Some aspects of this section are:
- Found in EWF-L01 in of EnCase 5 to 7
- Found in the last segment file after ltypes section and before data section.
The ltree section consists of:
- ltree header
- ltree data
Ltree header
The ltree header is 48 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | Integrity hash, which contains the MD5 of the ltree data | |
| 16 | 8 | Data size | |
| 24 | 4 | Checksum, which contains an Adler-32 of all the data within the ltree header where the checksum value itself is zeroed out | |
| 28 | 20 | Unknown (empty values) |
Ltree data
The ltree data string consists of an UTF-16 little-endian encoded string without byte order mark. The ltree data is not strict UTF-16 since it allows for unpaired surrogates, such as “U+d800” and “U+dc00”.
Other observed characteristics where the names in the ltree deviate from the original source:
- [U+0001-U+0008] were converted to U+00ba
- [U+0009, U+000a] were stripped
- [U+000b, U+000c] were converted to U+0020
- U+000d was converted to U+0002
- U+00ba remained the same
Note that this behavior could be related to EnCase as well and might not be specific for EWF-L01.
The ltree data string contains the following information:
| Line number | Value | Description |
|---|---|---|
| 1 | 5 | The number of categories provided |
| 2 | rec | Information about unknown, also see Records category |
| ... | (an empty line) | |
| ... | perm | Information about file permissions, also see Permissions category |
| ... | (an empty line) | |
| ... | srce | Information about acquisition sources, also see sources category |
| ... | (an empty line) | |
| ... | sub | Information about unknown, also see subjects category |
| ... | (an empty line) | |
| ... | entry | Information about file entries, also see File entries category |
| ... | (an empty line) |
The end of line character(s) is a newline (0x0a).
Records category
The rec category contains information about records.
The 1st line of the category contains the string “rec”.
The 2nd line of the category contains tab (0x09) separated type indicators.
| Identifier number | Type indicator | Description |
|---|---|---|
| 1 | tb | Total bytes, which contains an integer with size of the logical file data (media data) |
| 2 | cl | Unknown (Clusters?) |
| 3 | n | Unknown (introduced in EnCase 6.19) |
| 4 | fp | Unknown (introduced in EnCase 7) |
| 5 | pg | Unknown (introduced in EnCase 7) |
| 6 | lg | Unknown (introduced in EnCase 7) |
| 7 | ig | Unknown (introduced in EnCase 7) |
The 3rd line of the category consist of the tab (0x09) separated values.
Permissions category
The perm category contains information about file permissions.
The 1st line of the category contains the string “perm”.
The 2nd line consists of the following 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | The number of permission groups in the category | |
| 2 | 1 | Unknown |
The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.
The remaining lines in the category consist of:
- category root entry
- zero or more permissions group entries
- zero or more permission entries
- zero or more permissions group entries
Each entry consist of 2 lines:
| Line number | Value | Description |
|---|---|---|
| 1 | Number of entries | |
| 2 | Tab (0x09) separated values that correspond to the type indicators |
The 1st line of the category root entry consists of the following 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | 0 | Unknown |
| 2 | The number of permission groups in the category |
The 1st line of the permission group entry consists of the following 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | 0 | Unknown |
| 2 | The number of permissions in the group |
The 1st line of the permission entry consists of the following 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | 0 | Unknown |
| 2 | 0 | Unknown |
Permission type indicators
| Identifier number | Type indicator | Description |
|---|---|---|
| 1 | p | Is parent, where 1 represents if the entry is a category root or permissions group and 0 represents if the entry is a permission |
| 2 | n | Name, which contains a string |
| 3 | s | Security identifier, which contains a string with either a Windows NT security identifier (SID) or a POSIX user (uid) or group identifier (gid) in the format " number:" such as " 99:" |
| 4 | pr | Property type, also see permission types |
| 5 | nta | Access mask |
| 6 | nti | Unknown (Windows NT access control entry (ACE) flags?, which contains an integer with a Windows NT access control entry (ACE) flags) |
| 7 | nts | Unknown (Permission?) (Removed in EnCase 6) |
Permission types
| Value | Identifier | Description |
|---|---|---|
| (empty) | Owner or category root | |
| 1 | Group | |
| 2 | Allow | |
| 6 | Other | |
| 10 | Unknown (permissions group?) | |
Access mask
Access mask seen in combination with property types 0, 1 and 6
| Value | Identifier | Description |
|---|---|---|
| (empty) | Owner or category root | |
| 0x00000001 | [Lst Fldr/Rd Data] | List folder / Read data |
| 0x00000002 | [Crt Fl/W Data] | Create file / Write data |
| 0x00000020 | [Trav Fldr/X Fl] | Traverse folder / Execute file |
Access mask seen in combination with property type 2
[0x001200a9] [R&X] [R] [Sync]
[0x001301bf] [M] [R&X] [R] [W] [Sync]
[0x001f01ff] [FC] [M] [R&X] [R] [W] [Sync]
| Value | Identifier | Description |
|---|---|---|
| (empty) | Owner or category root | |
| 0x00000001 | ||
| 0x00000002 | ||
| 0x00000004 | ||
| 0x00000008 | ||
| 0x00000010 | ||
| 0x00000020 | ||
| 0x00000040 | ||
| 0x00000080 | ||
| 0x00000100 | ||
| 0x00010000 | ||
| 0x00020000 | ||
| 0x00040000 | ||
| 0x00080000 | ||
| 0x00100000 | ||
Sources category
The srce category contains information about acquisition sources of the file entries.
TODO: describe what an acquisition source is in the context of EnCase.
The 1st line of the category contains the string “srce”.
The 2nd line consists of 2 values.
| Value index | Value | Description |
|---|---|---|
| 1 | The number of sources in the category | |
| 2 | 1 | Unknown |
The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.
The remaining lines in the category consist of:
- category root
- zero or more source entries
Each entry consist of 2 lines:
| Line number | Value | Description |
|---|---|---|
| 1 | Number of entries | |
| 2 | Tab (0x09) separated values that correspond to the type indicators |
The 1st line of the category root entry consists of the following 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | 0 | Unknown |
| 2 | The number of sources in the category |
The 1st line of the source entry consists of the following 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | 0 | Unknown |
| 2 | 0 | Unknown |
Source type indicators
| Identifier number | Type indicator | Description |
|---|---|---|
| 1 | p | |
| 2 | n | |
| 3 | id | Identifier, which contains an integer identifying the source |
| 4 | ev | Evidence number, which contains a string |
| 5 | do | Domain, which contains a string (introduced in EnCase 7.9) |
| 6 | loc | Location, which contains a string (introduced in EnCase 7.9) |
| 7 | se | Serial number, which contains a string (introduced in EnCase 7.9) |
| 8 | mfr | Manufacturer, which contains a string (introduced in EnCase 7.9) |
| 9 | mo | Model, which contains a string (introduced in EnCase 7.9) |
| 10 | tb | Total bytes, which contains an integer |
| 11 | lo | Logical offset, which contains an integer which is -1 when value is not set |
| 12 | po | Physical offset, which contains an integer which is -1 when value is not set |
| 13 | ah | MD5 hash, which contains a string with the MD5 hash of the source |
| 14 | sh | SHA1 hash, which contains a string with the SHA1 hash of the source (introduced in EnCase 6.19) |
| 15 | gu | Device GUID, which contains a string with a GUID or "0" if not set |
| 16 | pgu | Primary device GUID, which contains a string with a GUID or "0" if not set (introduced in EnCase 7) |
| 17 | aq | Acquisition date and time, which contains an integer with a POSIX timestamp |
| 18 | ip | IP address, which contains a string (introduced in EnCase 7.9) |
| 19 | si | Unknown (Static IP address?), Contains 1 if static, empty otherwise (introduced in EnCase 7.9) |
| 20 | ma | MAC address, which contains a string without separator characters (introduced in EnCase 7.9) |
| 21 | dt | Drive type, which contains a single character (introduced in EnCase 7.9) |
The acquisition date and time is in the form of: “1142163845”, which is a POSIX epoch timestamp and represents the date: March 12 2006, 11:44:05.
If the “ha” value contains “00000000000000000000000000000000” this means the MD5 hash is not set. The same applies for the “sha” value when it contains “0000000000000000000000000000000000000000” the SHA1 has is not set.
If the “ma” value contains “000000000000” this means the MAC address is not set.
Drive type
| Character value | Meaning |
|---|---|
| f | Fixed drive |
Subjects category
The sub category contains information about TODO
TODO: describe what a subject is in the context of EnCase.
The 1st line of the category contains the string “sub”.
The 2nd line consists of 2 values.
| Value index | Value | Description |
|---|---|---|
| 1 | The number of subjects in the category | |
| 2 | 1 | Unknown |
The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.
The remaining lines in the category consist of:
- category root
- zero or more subject entries
Each entry consist of 2 lines:
| Line number | Value | Description |
|---|---|---|
| 1 | Number of entries | |
| 2 | Tab (0x09) separated values that correspond to the type indicators |
The 1st line of the category root entry consists of the following 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | 0 | Unknown |
| 2 | The number of subject in the category |
The 1st line of the subject entry consists of the following 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | 0 | Unknown |
| 2 | 0 | Unknown |
Subject type indicators
| Identifier number | Type indicator | Description |
|---|---|---|
| 1 | p | |
| 2 | n | |
| 3 | id | Identifier, which contains an integer identifying the subject |
| 4 | nu | Unknown (Number) |
| 5 | co | Unknown (Comment) |
| 6 | gu | Unknown (GUID) |
File entries category
The entry category contains information about the file entries.
The 1st line of the category contains the string “entry”.
The 2nd line consists of 2 values.
| Value index | Value | Description |
|---|---|---|
| 1 | The number of file entries in the category or 1 if unknown | |
| 2 | 1 | Unknown |
The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.
The remaining lines in the category consist of:
- category root
- zero or more file entries
- zero or more sub file entries
- …
- zero or more sub file entries
- zero or more file entries
Each entry consist of 2 lines:
| Line number | Value | Description |
|---|---|---|
| 1 | Number of entries | |
| 2 | Tab (0x09) separated values that correspond to the type indicators |
The 1st line of the category root entry consists of the following 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | 0 if not set or 26 if Unknown | |
| 2 | The number of file entries in the category |
The 1st line of the file entry consists of the following 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | Number of file entries in the parent file entry or 0 if not set | |
| 2 | The number of sub file entries in the file entry |
EnCase 5 and 6 (EWF-L01) file entry type indicators
| Identifier number | Character in 29th line | Meaning |
|---|---|---|
| 1 | p | Is parent, where 1 => if the entry is a directory and (empty) => if the entry is a file |
| 2 | n | Name |
| 3 | id | Identifier, contains an integer identifying the file entry |
| 4 | opr | File entry flags |
| 5 | src | Source identifier, which contains an integer that corresponds to an identifier in the Sources category |
| 6 | sub | Subject identifier, which contains an integer that corresponds to an identifier in the Subjects category |
| 7 | cid | Unknown (record type) |
| 8 | jq | Unknown |
| 9 | cr | Creation date and time |
| 10 | ac | Access date and time, for which currently is assumed the precision is date only |
| 11 | wr | (File) modification (last written) date and time |
| 12 | mo | (File system) entry modification date and time |
| 13 | dl | Deletion date and time |
| 14 | aq | Acquisition date and time, which contains an integer with a POSIX timestamp |
| 15 | ha | MD5 hash, which contains a string with the MD5 hash of the file data |
| 16 | ls | File size in bytes. If the file size is 0 the data size should be 1 |
| 17 | du | Duplicate data offset, relative from the start of the media data |
| 18 | lo | Logical offset, which contains an integer which is -1 when value is not set |
| 19 | po | Physical offset, which contains an integer which is -1 when value is not set (or does this value contain the segment file in which the start of the data is stored, -1 for a single segment file?) |
| 20 | mid | GUID, which contains a string with a GUID (introduced in EnCase 6.19) |
| 21 | cfi | Unknown (introduced in EnCase 6.14) |
| 22 | be | Binary extents |
| 23 | pm | Permissions group index, which contains an integer that corresponds to an identifier in the Permissions category or -1 if not set. The value is 0 by default |
| 24 | lpt | Unknown (introduced in EnCase 6.19) |
The creation, access and last written date and time are in the form of: “1142163845”, which is a POSIX epoch timestamp and represents the date: March 12 2006, 11:44:05.
The “ha” value (Hash) consist of a MD5 hash string when file entries are hashed. If the “ha” value contains “00000000000000000000000000000000” this means the MD5 hash is not set.
Ltree file entries
The ltree entries of files and directories consist of entries starting with: 0 followed by the number of sub file entries.
The entries of files and directories:
| Line number | Value | Description |
|---|---|---|
| 1 | (empty) | The root directory |
| 2 | The target drive/mount point | |
| 3 | The actual single file entries |
EnCase 7 (EWF-L01) file entry type indicators
| Identifier number | Character in 29th line | Meaning |
|---|---|---|
| 1 | mid | GUID, which contains a string with a GUID |
| 2 | ls | File size, in bytes. If the file size is 0 the data size should be 1 |
| 3 | be | Binary extents |
| 4 | id | Identifier, which contains an integer identifying the file entry |
| 5 | cr | Creation date and time |
| 6 | ac | Access date and time |
| 7 | wr | (File) modification (last written) date and time |
| 8 | mo | (File system) entry modification date and time |
| 9 | dl | Deletion date and time |
| 10 | sig | Unknown (Introduced in EnCase 7) |
| 11 | ha | MD5 hash, which contains a string with the MD5 hash of the file data |
| 12 | sha | SHA1 hash, which contains a string with the SHA1 hash of the file data. (Introduced in EnCase 7) |
| 13 | ent | Unknown, seen "B" (Introduced in EnCase 7.9) |
| 14 | snh | Short name (or DOS 8.3 name) (Introduced in EnCase 7.9) |
| 15 | p | Is parent, where "1" represents that the entry is a directory and "" (an empty string) that the entry is a file |
| 16 | n | Name |
| 17 | du | Duplicate data offset, relative from the start of the media data |
| 18 | lo | Logical offset, which contains an integer which is -1 when value is not set |
| 19 | po | Physical offset, which contains an integer which is -1 when value is not set (or does this value contain the segment file in which the start of the data is stored, -1 for a single segment file?) |
| 20 | pm | Permissions group index, which contains an integer that corresponds to an identifier in the Permissions category or -1 if not set. The value is 0 by default |
| 21 | oes | Unknown (Original extents?) (Introduced in EnCase 7) |
| 22 | opr | File entry flags |
| 23 | src | Source identifier, which contains an integer that corresponds to an identifier in the Sources category |
| 24 | sub | Subject identifier, which contains an integer that corresponds to an identifier in the Subjects category |
| 25 | cid | Unknown (record type?) |
| 26 | jq | Unknown |
| 27 | alt | Unknown (Introduced in EnCase 7) |
| 28 | ep | Unknown (Introduced in EnCase 7) |
| 29 | aq | Acquisition date and time, which contains an integer with a POSIX timestamp |
| 30 | cfi | Unknown |
| 31 | sg | Unknown (Introduced in EnCase 7) |
| 32 | ea | Extended attributes (Introduced in EnCase 7.9) |
| 33 | lpt | Unknown |
If the “ha” value contains “00000000000000000000000000000000” this means the MD5 hash is not set. The same applies for the “sha” value when it contains “0000000000000000000000000000000000000000” the SHA1 has is not set.
File entry name
A file entry name (“n” value):
- can contain path segment separator characters like “\” and “/”
- uses the “MIDDLE DOT” Unicode character (U+00b7) as a (NTFS) alternative data stream (ADS) name seperator
Note that a regular “MIDDLE DOT” Unicode character will be encoded in the same way so no real way to reliably tell the difference.
An empty name has been observed to be represented as “NoName”.
Short name
The short name (“snh”) value contains 2 values:
| Value number | Value | Description |
|---|---|---|
| 1 | The number of characters in the short name including the end-of-string character | |
| 2 | The short name string, without an end-of-string character |
For example: “13 FILE10~1.TXT”
Original extents
TODO: add some text
1 30a555b 30a6000 12011ae00 9008d7 3f 43 1 12011ae00 30a6000 120113 30a6 9008d7 18530
Ltree file entries
The ltree entries of files and directories consist of entries starting with: 26 followed by the number of sub file entries.
The entries of files and directories:
| Line number | Value | Description |
|---|---|---|
| 1 | LogicalEntries | The root directory |
| 2 | The target drive/mount point | |
| 3 | The actual single file entries |
File entry flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | Unknown (Is read-only?) | |
| 0x00000002 | Hidden | Is hidden |
| 0x00000004 | System | Is system |
| 0x00000008 | Archive | Is archive |
| 0x00000010 | Sym Link | Is symbolic link, junction or reparse point |
| 0x00000080 | Deleted | Is deleted |
| 0x00001000 | Hard Linked | Is hard link |
| 0x00002000 | Stream | Is stream |
| 0x00100000 | Internal | Is internal (used in combination with 0x00000006?) |
| 0x00200000 | Unallocated Clusters | Unknown |
| 0x00400000 | Unknown | |
| 0x01000000 | Unknown | |
| 0x02000000 | Folder | Is folder |
| 0x04000000 | Data is sparse |
If 0x00002000 or 0x02000000 are not set the file entry is of type “File”.
If the sparse data flag is set:
- the data size should be 1 and data should consist of a single byte value.
- the data size should be equal to the file size and data should be the same.
If the duplicate data offset value is not set the single byte value in the data should be used to reconstruct the file data. E.g. if the file size is 4096 and the data contains the byte value 0x00 the resulting file should consists of 4096 x 0x00 byte values.
If the duplicate data offset value is set the single byte in the data is ignored and the duplicate data offset refers to the location where the data stored.
Binary extents value
The binary extents value contains 3 values separated by a space:
Unknown Offset Size
Where:
- unknown always is 1, could this be the number of extents?
- extent data offset, relative from the start of the media data
- extent data size
The offset and size are specified in hexadecimal values.
Note that the binary extents value contains only 1 value for the first single file entry.
Extended attributes value
The extended attributes value contains base-16 encoded data, which consists of:
- Extended attributes header (stored as an extended attribute)
- One or more extended attributes
Extended attributes header
The extended attributes header is 37 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | 0 | Unknown (0 => root, 1 => otherwise) |
| 4 | 1 | 1 | Unknown (0 => is leaf node, 1 => is branch node?) |
| 5 | 4 | 11 | Number of characters in name string including the end-of-string character |
| 9 | 4 | 1 | Number of characters in value string including the end-of-string character |
| 13 | 22 | "Attributes\0" | Name string, which contains an UTF-16 little-endian encoded string including end-of-string character |
| 35 | 2 | "\0" | Value string, which contains an UTF-16 little-endian encoded string including end-of-string character |
Extended attribute
An extended attributes is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Unknown (0 => root, 1 => otherwise) | |
| 4 | 1 | Unknown (0 => is leaf node, 1 => is branch node?) | |
| 5 | 4 | Number of characters in name string including the end-of-string character | |
| 9 | 4 | Number of characters in value string including the end-of-string character | |
| 13 | ... | Name string, which contains an UTF-16 little-endian encoded string including end-of-string character | |
| ... | ... | Value string, which contains an UTF-16 little-endian encoded string including end-of-string character |
TODO: complete section
Note that branch nodes are presuably used to group attributes, however these are not used consistently and are not shown by EnCase 7.
Map section
Some aspects of this section are:
- Found in EWF-L01 in of EnCase 7 (First seen in EnCase 7.4.1.10)
- Found in the last segment file after data section before done section.
The map consists of:
- map string
- map entries array
Map string
The map string consists of an UTF-16 little-endian encoded string without the UTF-16 endian byte order mark.
The map string contains the following information:
| Line number | Value | Description |
|---|---|---|
| 1 | 1 | The number of categories provided |
| 2 | r | Probably the type of information provided |
| 3 | c | Identifier for the values in the 4th line |
| 4 | The data for the different identifiers in the 3rd line | |
| 5 | (an empty line) |
Map string values
| Identifier number | Character in 29th line | Meaning |
|---|---|---|
| 1 | C | Number of map entries (count) |
The number of map entries should match the number of file entries in the ltree.
Map entry
A map entry is 24 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Unknown | |
| 4 | 4 | Unknown (empty values or part of previous value) | |
| 8 | 16 | Unknown |
Session section
The session section is identifier in the section data type field as “session”. Some aspects of this section are:
- Not defined in ASR Data - E01 Compression Format.
- It is not found in SMART (EWF-S01) and FTK Imager (EWF-E01).
- It is found in EnCase 5 and 6 (EWF-E01) files.
- It is only added to the last segment file for images of optical disc (CD/DVD/BD) media.
- It is found after the data section and before the error2 section.
The session section data consists of:
- The session header
- The session entries array
- The session footer
Session header
The session header is 36 byte in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Number of sessions | |
| 4 | 28 | Unknown (empty values) | |
| 32 | 4 | Checksum, which contains an Adler-32 of all the previous data within the additional session section data |
Session entry
A session entry is 32 byte in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Flags | |
| 4 | 4 | Start sector | |
| 8 | 24 | Unknown (empty values) |
EnCase stores audio tracks as 0 byte data with a sector size of 2048.
Note that for a CD the first session sector is stored as 16, although the actual session starts at sector 0. Could this value be overloaded to indicate the size of the reserved space between the start of the session and the ISO 9660 volume descriptor.
Session flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | If set the track is an audio track otherwise the track is a data track |
Session footer
The session footer is 4 byte in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Checksum, which contains an Adler-32 of all the data within the session entries array |
Error2 section
The error2 section is identifier in the section data type field as “error2”. Some aspects of this section are:
- Not defined in ASR Data - E01 Compression Format.
- It is not found in SMART (EWF-S01).
- It is found in, EnCase 3 to 7 and linen 5 to 7 (EWF-E01) files.
- It is only added to the last segment file when errors were encountered while reading the input.
TODO: check FTK Imager, EnCase 1 and 2 for presence of the error2 section.
It contains the sectors that have read errors. The sector where a read error occurred are filled with zero’s during acquiry by EnCase.
The error2 section data consists of:
- The error2 header
- The error2 entries array
- The error2 footer
Error2 header
The error2 header is 520 byte in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Number of entries | |
| 4 | 512 | Unknown (empty values) | |
| 516 | 4 | Checksum, which contains an Adler-32 of all the previous data within the error2 header data |
Error2 entry
An error2 entry is 8 byte in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Start sector | |
| 4 | 4 | The number of sectors |
Error2 footer
The error2 footer is 4 byte in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Checksum, which contains an Adler-32 of all the data within the error2 entries array |
Digest section
The digest section is identified in the section data type field as “digest”. Some aspects of this section are:
- It is found in EnCase 6 to 7 files, as of EnCase 6.12 and linen 6.12 (EWF-E01).
The digest section contains a MD5 and/or SHA1 hash of the data within the chunks.
The digest section data is 80 byte in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | MD5 hash of the media data | |
| 16 | 20 | SHA1 hash of the media data | |
| 36 | 40 | 0x00 | Unknown (Padding) |
| 76 | 4 | Checksum, which contains an Adler-32 of all the previous data within the digest section data |
Hash section
The hash section is identified in the section data type field as “hash”. Some aspects of this section are:
- Defined in ASR Data - E01 Compression Format.
- It is found in SMART (EWF-S01) and FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01) files.
- It is not found in EnCase 5 (EWF-L01).
- The hash section is optional, it does not need to be present. If it does it resides in the last segment file before the done section.
The hash section contains a MD5 hash of the data within the chunks.
The hash section data is 36 byte in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | MD5 hash of the media data | |
| 16 | 16 | Unknown | |
| 32 | 4 | Checksum, which contains an Adler-32 of all the previous data within the additional hash section data |
Notes
Observations regarding the unknown value:
- is zero in SMART
- is zero in EnCase 3 and below
- in EnCase 4 the first 4 bytes are 0, the next 8 bytes seem random, the last 4 bytes seem fixed
- in EnCase 5 and 6 the first 8 bytes seem random, the last 8 bytes equal the file header signature
- in linen 5 the first and last set of 4 bytes seem the same, the second set of 4 bytes seem to be random, the third set of 4 bytes seem to contain a piece of the file header signature
- in linen 6 the first and third set of 4 bytes seem random, the second and last set of 4 bytes seem to be the same
- EnCase5 seems to contain a GUID of the acquired device?
Test with EnCase 4 show that:
- The value does not equal the checksum of the media data
- Does not differentiate for the same media acquired within the same program session, using different formats, but differ for different media and different program sessions
Done section
The done section is identified in the section data type field as “done”. Some aspects of this section are:
- Defined in ASR Data - E01 Compression Format.
- It is found in SMART (EWF-S01), FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01) and EnCase 5 (EWF-L01) files.
- The done section is the last section within the last segment file.
- The offset to the next section in the section header of the done section point to itself (the start of the done section).
- It should be the last section in the last segment file.
SMART (EWF-S01)
It resides after the table or table2 section.
FTK Imager, EnCase and linen (EWF-E01)
It resides after the data section in a single segment file or for multiple segment files after the table2 section.
In the EnCase (EWF-E01) format the size in the section header is 0 instead of 76 (the size of the section header).
Note that FTK imager versions before 2.9 sets the section size to 76. At the moment it is unknown in which version this behavior was changed.
Incomplete section
The incomplete section is identified in the section data type field as “incomplete”.
This section is seen rarely. It was seen in an EnCase 6.13 (EWF-E01) file as the last last section within the last segment file. The incomplete section was preceded by a hash and digest section, although later in the set of EWF files another hash and digest section were defined.
It is currently assumed that the incomplete section indicates an incomplete image created using remote imaging. The incomplete section contains data but currently there is no indication what purpose the data has.
EWF-X
EWF-X (extended) is an experimental format to enhance the EWF format. EWF-X is based on the EWF-E01 format. EWF-X does not limit the table entries to 16375. EWF-X is not the same as version 2 of EWF.
TODO: add note about the table entry limit.
Sections
Additional sections provided in the EWF-X format are:
- xheader
- xhash
Xheader
The xheader section contains zlib compressed data containing XML data containing the header values.
<?xml version="1.0" encoding="UTF-8"?>
<xheader>
<case_number>1</case_number>
<description>Description</description>
<examiner_name>John D.</examiner_name>
<evidence_number>1.1</evidence_number>
<notes>Just a floppy in my system</notes>
<acquiry_operating_system>Linux</acquiry_operating_system>
<acquiry_date>Sat Jan 20 18:32:08 2007 CET</acquiry_date>
<acquiry_software>ewfacquire</acquiry_software>
<acquiry_software_version>20070120</acquiry_software_version>
</xheader>
Xhash
The xhash section contains zlib compressed data containing XML data containing the hash values.
<?xml version="1.0" encoding="UTF-8"?>
<xhash>
<md5>ae1ce8f5ac079d3ee93f97fe3792bda3</md5>
<sha1>31a58f090460b92220d724b28eeb2838a1df6184</sha1>
</xhash>
GUID
EWF-X uses a random based version of the GUID
Corruption scenarios
This chapter contains several corruption scenarios that have been encountered “in the wild”.
Corrupt uncompressed chunk
TODO: add description
Corrupt compressed chunk
TODO: add description
DEFLATE uncompressed block data with copy of uncompressed data size of 0
Seen in combination with some firmware versions of Tableau TD3 forensic imager.
In this corruption scenarion the copy of uncompressed data size value of the DEFLATE uncompressed block data is set to 0 instead of the 1s complement of the uncompressed data size.
Libewf currently does not handle this corruption scenario.
Corrupt section header
TODO: add description
reading section header from file IO pool entry: 1 at offset: 415912423
type : table2
next offset : 415978027
size : 65604
checksum : 0xf35f03e0
number of offsets : 16375
base offset : 0x00000000
checksum : 0x180d0137
reading section header from file IO pool entry: 1 at offset: 415978027
type : sectors
next offset : 415978027
size : 0
checksum : 0x1ad00464
Corrupt table section
TODO: add description
Scenarios:
- with and with out table 2
- corruption in number of entries
- corruption in entry data
Corrupted segment file header
TODO: add description
Partial segment file
TODO: add description
Missing segment file(s)
TODO: add description
Dual image: section size versus offset
The section headers define both the next section offset and the size of the section. If an implementation reads only one of the two to determine the next section, a dual EWF image can be crafted that consists of two separate images including hashes.
Keramics will mark such an image as corrupted.
Table entries offset overflow
In EnCase 6.7.1 the sectors section can be larger than 2048 MiB. The table entries offsets are 31 bit values in EnCase6 the offset in a table entry value will actually use the full 32 bit if the 2048 MiB has been exceeded. This behavior is no longer present in EnCase 6.8 so it is assumed to be a bug.
Libewf currently assumes that the if the 31 bit value overflows the following chunks are uncompressed. This allows EnCase 6.7.1 faulty EWF files to be converted by Keramics.
Multiple incomplete segment file set identifiers
Although rare it can occur that a set of EWF image files changes its segment file set identifier. This was seen in an image created by EnCase 6.13, presumably using remote imaging. The image contained 3 different segment file set identifiers. The first changes after an incomplete section. The second one changed without any clear indication. The corresponding data section also changed in some extent e.g. compression method and media flags, the is physical flag being dropped. The change was consistent across multiple segment files. It is unlikely that deliberate manipulation is involved. EnCase considers the image as invalid.
Although with some tweaking of the individual segment file sets could be read. In this case the data read from the segment file sets was heavily corrupted. For now Keramics does not support reading multiple segment files sets from a single image, but this might change in the future.
AD encryption
As of version 2.8 FTK Imager supports “AD encryption”. Although the output file uses the EWF extensions the file actually is a AES-256 encrypted container. The EWF can be encrypted using a pass-phrase or a certificate.
TODO: link to format definition
References
Expert Witness Compression Format version 2 (EWF2)
TODO: add description
Mac OS sparse bundle (.sparsebundle) format
The Mac OS sparse bundle (.sparsebundle) format is one of the disk image formats supported natively by Mac OS.
The sparse bundle disk image was introduced in Mac OS X 10.5.
Overview
A sparse bundle consists of a directory (bundle) with the .sparsbundle suffix containing:
- “Info.bckup” file
- “Info.plist” file
- “token” file
- “bands” directory containing the band files
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | N/A |
| Date and time values | N/A |
| Character strings | N/A |
Info.plist and Info.bckup files
The Info.plist and its backup (Info.bckup) contain a XML plist.
This plist is also referred to as “Information Property List” and contains a single dictionary with the following key-value pairs.
| Identifier | Value | Description |
|---|---|---|
| CFBundleInfoDictionaryVersion | "6.0" | The information property list format version |
| band-size | The maximum size of a band file in bytes | |
| bundle-backingstore-version | 1 | Unknown |
| diskimage-bundle-type | "com.apple.diskimage.sparsebundle" | The bundle type |
| size | The media size in bytes |
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleInfoDictionaryVersion</key>
<string>6.0</string>
<key>band-size</key>
<integer>8388608</integer>
<key>bundle-backingstore-version</key>
<integer>1</integer>
<key>diskimage-bundle-type</key>
<string>com.apple.diskimage.sparsebundle</string>
<key>size</key>
<integer>4194304</integer>
</dict>
</plist>
Token file
The token file is empty.
Bands directory
The bands directory contains files containing the actual data of the bands. The files are named using a hexadecimal naming scheme where “0” is the 1st band, “a” the 10th, “f” the 15th, “10” the 16th, etc.
Mac OS sparse image (.sparseimage) format
The Mac OS sparse image (.sparseimage) format is one of the disk image formats supported natively by Mac OS.
Overview
A sparse disk image consists of:
- header data
- bands data
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | big-endian |
| Date and time values | N/A |
| Character strings | N/A |
The number of bytes per sector is 512.
Header data
The header data is 4096 bytes in size and consist of:
- file header
- band numbers array
- trailing data, which should be filled with 0-byte values
File header
The file header is 64 bytes in size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "sprs" | Signature |
| 4 | 4 | Unknown (format version?), seen 3 | |
| 8 | 4 | Number of sectors per band | |
| 12 | 4 | Unknown, seen 1 | |
| 16 | 4 | The media data size in sectors | |
| 20 | 12 | 0 | Unknown (0-byte values) |
| 32 | 4 | Unknown | |
| 36 | 28 | 0 | Unknown (0-byte values) |
Band numbers array
The band numbers array consists of:
- one or more band numbers
Band number
A band number is 4 bytes in size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Band number, where 0 indicates a sparse range and any other value refers to a location in the media data |
Where the corresponding media offset can be calculated as following:
media_offset = (band_number - 1) * sectors_per_band * 512
The offset of band data can be calculated as following:
band_data_offset = 4096 + (array_index * sectors_per_band * 512)
For example if the first array entry contains a band number of 4, then the
band data is located at offset 4096 and the corresponding media offset is:
3 * sectors_per_band * 512.
Parallels Disk Image (PDI) format
The Parallels Disk Image format used in Parallels virtualization products as one of its image formats. It is both used the store hard disk images and snapshots.
Overview
A Parallels Disk Image consists of a directory, typically named “{NAME}.hdd” containing:
- Descriptor file (DiskDescriptor.xml) and backup (DiskDescriptor.xml.Backup)
- {NAME}.hdd file
- Storage data file ({NAME}.hdd.0.{GUID}.hds)
- {NAME}.hdd.drh
Where {NAME} is an arbitrary name and {GUID} is a unique identifier.
Disk types
The Parallels Disk Image format support multiple disk types:
| Identifier | Description |
|---|---|
| Expanding | Disk that consists of a single (dynamic size) sparse storage data file |
| Plain | Disk that consists of a single single (fixed size) raw storage data file |
| Split | Disk that consists of a one or more split storage data files, either expanding or plain, holding upto 2G of data |
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | little-endian |
| Character strings | UTF-8 by default, the encoding is defined in the disk descriptor XML file |
The number of bytes per sector is 512.
Descriptor file
The DiskDescriptor.xml and its backup (DiskDescriptor.xml.Backup) contain the “Parallels_disk_image” XML element tha consists of the following values:
| Identifier | Description |
|---|---|
| Disk_Parameters | The disk parameters |
| StorageData | Information about the storage data files |
| Snapshots | Information about snapshots |
<?xml version='1.0' encoding='UTF-8'?>
<Parallels_disk_image Version="1.0">
<Disk_Parameters>
<Disk_size>134217728</Disk_size>
<Cylinders>262144</Cylinders>
<PhysicalSectorSize>4096</PhysicalSectorSize>
<LogicSectorSize>512</LogicSectorSize>
<Heads>16</Heads>
<Sectors>32</Sectors>
<Padding>0</Padding>
<Encryption>
<Engine>{00000000-0000-0000-0000-000000000000}</Engine>
<Data></Data>
</Encryption>
<UID>{GUID}</UID>
<Name>{NAME}</Name>
<Miscellaneous>
<CompatLevel>level2</CompatLevel>
<Bootable>1</Bootable>
<ChangeState>0</ChangeState>
<SuspendState>0</SuspendState>
</Miscellaneous>
</Disk_Parameters>
<StorageData>
<Storage>
<Start>0</Start>
<End>134217728</End>
<Blocksize>2048</Blocksize>
<Image>
<GUID>{GUID}</GUID>
<Type>Compressed</Type>
<File>{NAME}.hdd.0.{GUID}.hds</File>
</Image>
...
</Storage>
...
</StorageData>
<Snapshots>
<Shot>
<GUID>{GUID}</GUID>
<ParentGUID>{GUID}</ParentGUID>
</Shot>
...
</Snapshots>
</Parallels_disk_image>
Disk parameters
The disk parameters are stored in the “Disk_Parameters” XML element and contains the following values.
| Identifier | Description |
|---|---|
| Cylinders | Number of cylinders |
| Disk_size | Disk size, in number of sectors |
| Encryption | "Encryption" sub XML element |
| Heads | Number of heads |
| Miscellaneous | "Miscellaneous" sub XML element |
| Name | Name of the disk |
| LogicSectorSize | Optional logical sector size, which is 512 bytes by default |
| Padding | Unknown (padding) |
| PhysicalSectorSize | Optional physical sector size, which is 4096 bytes by default |
| Sectors | Number of sectors per cylinder |
| UID | Unknown (identifier) |
Encryption
<Encryption>
<Engine>{00000000-0000-0000-0000-000000000000}</Engine>
<Data></Data>
<Salt></Salt>
</Encryption>
Miscellaneous
<Miscellaneous>
<CompatLevel>level2</CompatLevel>
<Bootable>1</Bootable>
<ChangeState>0</ChangeState>
<SuspendState>0</SuspendState>
<DupBlocksCnt>0</DupBlocksCnt>
<CorruptBlocksCnt>0</CorruptBlocksCnt>
<UnrefBlocksCnt>0</UnrefBlocksCnt>
<OutOfDiskBlocksCnt>0</OutOfDiskBlocksCnt>
<BatOverlapBlocksCnt>0</BatOverlapBlocksCnt>
<BlocksCnt>0</BlocksCnt>
<TruncatedBlocksCnt>0</TruncatedBlocksCnt>
<ReferencedBlocksCnt>0</ReferencedBlocksCnt>
<ShutdownState>0</ShutdownState>
<GuestToolsVersion>17.1.1-51537</GuestToolsVersion>
</Miscellaneous>
CompatLevel
Seen: level0 and level2
Storage data
The “StorageData” XML element contains the following values.
| Identifier | Description |
|---|---|
| Storage | One or more "Storage" XML sub elements |
Note that a split disks contains multiple “Storage” XML sub elements.
Storage
The “Storage” XML element contains the following values.
| Identifier | Description |
|---|---|
| Start | Start sector number of the segment stored in the storage data file |
| End | End sector number of the segment stored in the storage data file |
| Blocksize | Block size, in number of sectors |
| Image | One or more "Image" sub XML elements |
Image
The “Image” XML element contains the following values.
| Identifier | Description |
|---|---|
| GUID | Identifier of snapshot (or layer) |
| Type | Storage data file type |
| File | Name (or path) of the storage data file |
Snapshots data
The “Snapshots” XML element contains the following values.
| Identifier | Description |
|---|---|
| Shot | One or more "Shot" sub XML elements |
Shot
The “Shot” XML element contains the following values.
| Identifier | Description |
|---|---|
| GUID | Identifier of snapshot (or layer) |
| ParentGUID | Identifier of parent snapshot (or layer), which contains "{00000000-0000-0000-0000-000000000000}" if not set |
Storage data file
Storage data file types
| Value | Description |
|---|---|
| "Compressed" | Sparse storage data file |
| "Plain" | Raw storage data file |
Raw storage data file
The raw (or plain) storage data file contains the disk image data including free space.
Sparse storage data file
The sparse storage data file contains the actual disk image data without free space.
A sparse storage data file consists of:
- file header
- block allocation table (BAT)
- data blocks
Sparse storage data file header
The sparse storage data file header is 64 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | "WithoutFreeSpace" or "WithouFreSpacExt" | Signature |
| 16 | 4 | 2 | Format version |
| 20 | 4 | Number of heads | |
| 24 | 4 | Number of cylinders | |
| 28 | 4 | Block size (or number of tracks) in number of sectors | |
| 32 | 4 | Number of blocks, which is equivalent to the number of block allocation table entries | |
| 36 | 8 | Number of sectors | |
| 44 | 4 | Unknown (Creator?), seen: "\x00\x00\x00\x00", "pd17", "pd22" | |
| 48 | 4 | Data start sector number, which is relative to the start of the sparse storage data file | |
| 52 | 4 | Unknown (Flags?) | |
| 56 | 8 | Unknown (Features start sector?) |
Block allocation table (BAT)
The block allocation table consists of 32-bit entries. An entry contains the sector number where the data block starts is set to 0 if the block is sparse or stored in the parent disk image.
For example block allocation table entry 0 corresponds to disk image offset 0. If contains a value of 0x800 the corresponding data block is stored at file offset 0x100000 (0x800 x 512).
QEMU Copy-On-Write (QCOW) image file format
The QEMU Copy-On-Write (QCOW) image file format is used by the QEMU Open Source Process Emulator to store disk images (storage media)
Overview
A QCOW image file consists of:
- the file header
- optional file header extensions
- the level 1 table (cluster aligned)
- the reference count table (cluster aligned)
- reference count blocks
- snapshot headers (8-byte aligned on cluster boundary)
- clusters containing:
- level 2 tables
- storage media data
The storage media data is stored in clusters. Each cluster is a multitude of 512 bytes. The level 1 (L1) table contains level 1 reference of level 2 (L2) tables. The level 2 tables contain level 2 references of the storage media clusters.
There are multiple versions of the QCOW image file format. QCOW (version 1) and QCOW2 (version 2 and later) are sometimes considered even as separate image formats. Version 3 is considered as an extended version of QCOW2.
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | big-endian in most cases, note that some values are in little-endian |
| Date and time values | Number of seconds since Jan 1, 1970 00:00:00 UTC (POSIX epoch) |
| Character strings | UTF-8 |
Note that this docuement assumes that character strings are stored in UTF-8
The number of bytes per sector is 512.
Encryption
The QCOW image format can encrypted the media data stored in the image format. Currently supported encryption methods are:
- AES-CBC 128-bit
- Linux Unified Key Setup (LUKS)
If no encryption is used the encryption method in the file header is set to none (0).
Note it is currently unknown if the format supports compression and encryption at the same time. It does not appear to be supported by qemu-img.
AES-CBC 128-bit
Both encryption and decryption use:
- AES-CBC with a 128-bits key decryption of sector data
The key is direct copy of the first 16 characters of a user provided (narrow character) password. If the password is smaller than 16 characters. The remaining key data is set to 0-byte values.
Note that it is currently unclear which character sets are allowed and how characters outside the 7-bit ASCII set should be handled.
The initialization vector of the AES-CBC is using media data sector number (relative to the start of the disk) in little-endian format as the first 64 bits of the 128 bit initialization vector. The remaining initialization vector data is set to 0-byte values. The first sector number is 0 and the bytes per sector are 512.
Linux Unified Key Setup (LUKS)
TODO: complete section
File header
File header – version 1
The file header - version 1 is 48 bytes in size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "QFI\xfb" or "\x51\x46\x49\xfb" | The signature |
| 4 | 4 | 1 | Format version |
| 8 | 8 | Backing file name offset | |
| 16 | 4 | Backing file name size | |
| 20 | 4 | Modification date and time, which contains a POSIX timestamp | |
| 24 | 8 | Storage media size | |
| 32 | 1 | Number of cluster block bits | |
| 33 | 1 | Number of level 2 table bits | |
| 34 | 2 | [yellow-background]Unknown (empty values) | |
| 36 | 4 | Encryption method | |
| 40 | 8 | Level 1 table offset |
The cluster block size is calculated as:
cluster_block_size = 1 << number_of_cluster_block_bits
The level 2 table size is calculated as:
level2_table_size = (1 << number_of_level2_table_bits) * 8
The level 1 table size is calculated as:
level1_table_entry_size = cluster_block_size * (1 << number_of_level2_table_bits)
level1_table_size = media_size / level1_table_entry_size
if media_size % level1_table_entry_size != 0:
level1_table_size += 1
level1_table_size *= 8
The backing file name is set in snapshot image files and is normally stored after the file header.
File header – version 2
The file header - version 2 is 72 bytes in size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "QFI\xfb" or "\x51\x46\x49\xfb" | The signature |
| 4 | 4 | 2 | Format version |
| 8 | 8 | Backing file name offset | |
| 16 | 4 | Backing file name size | |
| 20 | 4 | Number of cluster block bits | |
| 24 | 8 | Storage media size | |
| 32 | 4 | Encryption method | |
| 36 | 4 | Number of level 1 table references | |
| 40 | 8 | Level 1 table offset | |
| 48 | 8 | Reference count table offset | |
| 56 | 4 | Reference count table clusters | |
| 60 | 4 | Number of snapshots | |
| 64 | 8 | Snapshots offset |
The cluster block size is calculated as:
cluster_block_size = 1 << number_of_cluster_block_bits
The number of level 2 table bits is calculated as:
number_of_level2_table_bits = number_of_cluster_block_bits - 3
The level 2 table size is calculated as:
level_table2_size = (1 << number_of_level2_table_bits) * 8
The level 1 table size is calculated as:
level1_table_size = number_of_level1_table_references * 8
The backing file name is set in snapshot image files and is normally stored after the file header.
File header – version 3
The file header - version 3 is 104 or 112 bytes in size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "QFI\xfb" or "\x51\x46\x49\xfb" | The signature |
| 4 | 4 | 3 | Format version |
| 8 | 8 | Backing file name offset | |
| 16 | 4 | Backing file name size | |
| 20 | 4 | Number of cluster block bits | |
| 24 | 8 | Storage media size | |
| 32 | 4 | Encryption method | |
| 36 | 4 | Number of level 1 table references | |
| 40 | 8 | Level 1 table offset | |
| 48 | 8 | Reference count table offset | |
| 56 | 4 | Reference count table clusters | |
| 60 | 4 | Number of snapshots | |
| 64 | 8 | Snapshots offset | |
| 72 | 8 | Incompatible feature flags | |
| 80 | 8 | Compatible feature flags | |
| 88 | 8 | Auto-clear feature flags | |
| 96 | 4 | Reference count order | |
| 100 | 4 | 104 or 112 | File header size, which contains the size of the file header, this value does not include the size of the file header extensions |
| If file header size equals 112 | |||
| 104 | 1 | Compression method | |
| 105 | 7 | Unknown (padding) | |
The cluster block size is calculated as:
cluster_block_size = 1 << number_of_cluster_block_bits
The number of level 2 table bits is calculated as:
number_of_level2_table_bits = number_of_cluster_block_bits - 3
The level 2 table size is calculated as:
level_table2_size = (1 << number_of_level2_table_bits) * 8
The level 1 table size is calculated as:
level1_table_size = number_of_level1_table_references * 8
The backing file name is set in snapshot image files and is normally stored after the file header.
Encryption methods
| Value | Identifier | Description |
|---|---|---|
| 0 | QCOW_CRYPT_NONE | No encryption |
| 1 | QCOW_CRYPT_AES | AES-CBC 128-bits encryption |
| 2 | QCOW_CRYPT_LUKS | Linux Unified Key Setup (LUKS) encryption |
Incompatible feature flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | QCOW2_INCOMPAT_DIRTY | |
| 0x00000002 | QCOW2_INCOMPAT_CORRUPT | |
| 0x00000004 | QCOW2_INCOMPAT_DATA_FILE | |
| 0x00000008 | QCOW2_INCOMPAT_COMPRESSION | |
| 0x00000010 | QCOW2_INCOMPAT_EXTL2 |
Compatible feature flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | QCOW2_COMPAT_LAZY_REFCOUNTS |
Auto-clear feature flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | QCOW2_AUTOCLEAR_BITMAPS | |
| 0x00000002 | QCOW2_AUTOCLEAR_DATA_FILE_RAW |
Compression methods
| Value | Identifier | Description |
|---|---|---|
| 0 | ZLIB compression |
File header extensions
A file header extension consist of:
- file header extension header
- file header extension data
File header extension header
The file header extension header is 8 bytes in size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | The extension type (signature) | |
| 4 | 4 | The extension data size |
File header extension types
| Value | Identifier | Description |
|---|---|---|
| 0x0537be77 | QCOW2_EXT_MAGIC_CRYPTO_HEADER | Crypto header |
| 0x23852875 | QCOW2_EXT_MAGIC_BITMAPS | Bitmaps |
| 0x44415441 or "DATA" | QCOW2_EXT_MAGIC_DATA_FILE | Data-file |
| 0x6803f857 | QCOW2_EXT_MAGIC_FEATURE_TABLE | Feature table |
| 0xe2792aca | QCOW2_EXT_MAGIC_BACKING_FORMAT | Backing format |
Backing format file header extension
The backing format file header extension header is of variable size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | ... | Backing format identifier, which contains an UTF-8 string without end-of-string character |
Bitmaps file header extension
TODO: complete section
Crypto header file header extension
The crypto header file header extension header is 16 bytes in size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | The crypto data offset | |
| 8 | 8 | The crypto data size |
Data-file file header extension
The data-file file header extension header is of variable size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | ... | Data-file file name, which contains an UTF-8 string without end-of-string character |
Feature table file header extension
TODO: complete section
Level 1 table
The level 1 table contains level 2 table references.
A reference value of 0 represents unused or unallocated and is considered as sparse or stored in a corresponding backing file.
Level 2 table reference – version 1
The level 2 table reference is 8-bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 63 bits | Level 2 table offset, which contains an offset relative from the start of the file | |
| 7.7 | 1 bit | QCOW_OFLAG_COMPRESSED | Is compressed flag |
Level 2 table reference – version 2 or 3
The level 2 table reference is 8-bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 62 bits | Level 2 table offset, which contains an offset relative from the start of the file | |
| 7.6 | 1 bit | QCOW_OFLAG_COMPRESSED | Is compressed flag |
| 7.7 | 1 bit | QCOW_OFLAG_COPIED | Is copied flag |
The is copied flag indicates that the reference count of the corresponding level 2 table is exactly one.
Level 2 table
The level 2 table contains cluster block references.
The level 2 table size is calculated as:
level2_table_size = (1 << number_of_level2_table_bits) * 8
A reference value of 0 represents unused or unallocated and is considered as sparse or stored in a corresponding backing file.
Cluster block reference – version 1
The cluster block reference - version 1 is 8-bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 63 bits | Cluster block offset, which contains an offset relative to the start of the cluster block | |
| 7.7 | 1 bit | QCOW_OFLAG_COMPRESSED | Is compressed flag |
Cluster block reference – version 2 or 3
The cluster block reference - version 2 or 3 is 8-bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 62 bits | Cluster block offset, which contains an offset relative to the start of the cluster block | |
| 7.6 | 1 bit | QCOW_OFLAG_COMPRESSED | Is compressed flag |
| 7.7 | 1 bit | QCOW_OFLAG_COPIED | Is copied flag |
The is copied flag indicates that the reference count of the corresponding cluster block is exactly one.
Reference count table
The cluster data blocks are referenced counted. For every cluster data block a 16-bit reference count is stored in the reference count table.
The reference count table is stored in cluster block sizes. The file header contains the number of blocks (or reference count table clusters).
TODO: complete section
Notes
reference count cluster block offset = cluster data block offset /
reference count table offset = cluster data block /
In order to obtain the reference count of a given cluster, you split the
cluster offset into a refcount table offset and refcount block offset.
Since a refcount block is a single cluster of 2 byte entries, the lower
cluster_size - 1 bits is used as the block offset and the rest of the bits are
used as the table offset.
One optimization is that if any cluster pointed to by an L1 or L2 table entry
has a refcount exactly equal to one, the most significant bit of the L1/L2
entry is set as a "copied" flag. This indicates that no snapshots are using
this cluster and it can be immediately written to without having to make a copy
for any snapshots referencing it.
Cluster data block
To retrieve a cluster data block corresponding a certain storage media offset:
Determine the level 1 table index from the offset:
level1_table_index_bit_shift = number_of_cluster_block_bits + number_of_level2_table_bits
For version 1:
level1_table_index = (offset & 0x7fffffffffffffff) >> level1_table_index_bit_shift
For version 2 and 3:
level1_table_index = (offset & 0x3fffffffffffffff) >> level1_table_index_bit_shift
Retrieve the level 2 table offset from the level 1 table. If the level 2 table offset is 0 and the image has a backing file the cluster data block is stored in the backing file otherwise the cluster block is considered sparse.
Read the corresponding level 2 table.
Determine the level 2 table index from the offset:
level2_table_index_bit_mask = ~(0xffffffffffffffff << number_of_level2_table_bits)
level2_table_index = (offset >> number_of_cluster_block_bits) >> level2_table_index_bit_mask
Retrieve the cluster block offset from the level 2 table. If the cluster block offset is 0 and the image has a backing file the cluster data block is stored in the backing file otherwise the cluster block is considered sparse.
Uncompressed cluster data block
If the is compressed flag (QCOW_OFLAG_COMPRESSED) is not set:
cluster_block_bit_mask = ~(0xffffffffffffffff << number_of_cluster_block_bits)
cluster_block_data_offset = (offset & cluster_block_bit_mask) + cluster_block_offset
Note that in version 2 or 3 the last cluster block in the file can be smaller than the cluster block size defined by the number of cluster block bits in the file header. This does not seem to be the case for version 1.
Compressed cluster data block
If the is compressed flag (QCOW_OFLAG_COMPRESSED) is set the cluster block data is stored using the compression method defined by the file header or DEFLATE by default.
Multiple compressed cluster data blocks are stored together in cluster block sizes. The compressed cluster data blocks are sector (512 bytes) aligned.
The compressed data uses a DEFLATE (inflate) window bits value of -12
Compressed chunk data block – version 1
compressed_size_bit_shift = 63 - number_of_cluster_block_bits
compressed_block_size = (
(cluster_block_offset & 0x7fffffffffffffff) >> compressed_size_bit_shift)
compressed_block_offset &= ~(0xffffffffffffffff << compressed_size_bit_shift)
Compressed chunk data block – version 2 or 3
compressed_size_bit_shift = 62 - (number_of_cluster_block_bits – 8)
According to “the QCOW2 Image Format” the compressed block size is calculated as following:
compressed_block_size = (
(((cluster_block_offset & 0x3fffffffffffffff) >> compressed_size_bit_shift) + 1) * 512)
Since the compressed block size is stored in 512 byte sectors this value does not contain the exact byte size of the compressed cluster block data. It sometimes lacks the size of the last partially filled sector and one sector should be added if possible within the bounds of the cluster blocks size and the file size.
cluster_block_offset &= ~(0xffffffffffffffff << compressed_size_bit_shift)
Snapshots
As of version 1 QCOW can use the backing file name in the file header to point to a backing file (or parent image) that contains the snapshot image where the current image only contains the modifications. Version 2 adds support to store snapshot inside the image.
Snapshot header - version 2 or 3
An in-image snapshot is created by adding a snapshot header, copying the L1 table and incrementing the reference counts of all L2 tables and data clusters referenced by the L1 table.
The snapshot header is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Level 1 table offset | |
| 8 | 4 | Level 1 size | |
| 12 | 2 | Identifier string size | |
| 14 | 2 | Name size | |
| 16 | 4 | Date in seconds | |
| 20 | 4 | Date in nano seconds | |
| 24 | 8 | VM clock in nano seconds | |
| 32 | 4 | VM state size | |
| 36 | 4 | Extra data size | |
| 40 | ... | Extra data | |
| ... | ... | Identifier string size | |
| ... | ... | Name |
TODO: complete section
References
- The QCOW Image Format, by Mark McLoughlin
- The QCOW2 Image Format, by Mark McLoughlin
Universal Disk Image Format (UDIF)
The Universal Disk Image Format (UDIF) (.dmg) is one of the disk image formats supported natively by Mac OS.
Overview
Known UDIF image types are:
| Identifier | Description |
|---|---|
| UDBZ | bzip2 compressed UDIF |
| UDCO | Apple Data Compression (ADC) compressed UDIF |
| UDIF | Read-write uncompressed UDIF |
| UDRO | Read-only uncompressed UDIF |
| UDxx | Uncompressed UDIF |
| UDZO | zlib/DEFLATE compressed UDIF |
| ULFO | LZFSE compressed UDIF |
| ULMO | LZMA compressed UDIF |
UDIF images are either uncompressed or compressed.
Uncompressed image format
An uncompressed UDIF image consist of:
- data
- optional file footer
Note that an uncompressed UDIF image without file footer is equivalent to a RAW storage media image (CRawDiskImage).
Compressed image format
A compressed UDIF image consist of:
- Data fork
- Optional resource fork
- Optional XML plist
- File footer the end of the image file
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | big-endian |
| Date and time values | N/A |
| Character strings | N/A |
The number of bytes per sector is 512.
File footer
The file footer (also known as resource file or metadata) is 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "koly" | Signature |
| 4 | 4 | 4 | Format version |
| 8 | 4 | 512 | File footer size in bytes |
| 12 | 4 | Image flags | |
| 16 | 8 | Unknown (RunningDataForkOffset) | |
| 24 | 8 | Data fork offset, where the offset is relative from the start of the image file | |
| 32 | 8 | Data fork size | |
| 40 | 8 | Resource fork offset, where the offset is relative from the start of the image file | |
| 48 | 8 | Resource fork size | |
| 56 | 4 | Unknown (SegmentNumber) | |
| 60 | 4 | Number of segments, which contains 0 if not set | |
| 64 | 16 | Segment identifier, which contains an UUID | |
| 80 | 4 | Data checksum type | |
| 84 | 4 | Data checksum size, as number of bits | |
| 88 | 128 | Data checksum | |
| 216 | 8 | XML plist offset, where the offset is relative from the start of the image file | |
| 224 | 8 | XML plist size | |
| 232 | 120 | Unknown (Reserved) | |
| 352 | 4 | Master checksum type | |
| 356 | 4 | Master checksum size, as number of bits | |
| 360 | 128 | Master checksum | |
| 488 | 4 | Image type (or variant) | |
| 492 | 8 | Number of sectors | |
| 500 | 4 | Unknown (reserved) | |
| 504 | 4 | Unknown (reserved) | |
| 508 | 4 | Unknown (reserved) |
Note that the XML plist size can be 0, such as in an UDIF stub (UDxx) image.
Image flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | kUDIFFlagsFlattened | Unknown (flattened?) |
| 0x00000004 | kUDIFFlagsInternetEnabled | Unknown (internet enabled?) |
Image types
| Value | Identifier | Description |
|---|---|---|
| 1 | kUDIFDeviceImageType | Device image |
| 2 | kUDIFPartitionImageType | Paritition image |
XML plist
TODO: complete section
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>resource-fork</key>
<dict>
<key>blkx</key>
<array>
<dict>
<key>Attributes</key>
<string>0x0050</string>
<key>CFName</key>
<string>Protective Master Boot Record (MBR : 0)</string>
<key>Data</key>
<data>
bWlzaAAAAAEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAA
AAgIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAIAAAAgQfL6MwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAACgAAABQAAAAMAAAAAAAAAAAAAAAAAAAABAAAA
AAAAIA0AAAAAAAAAH/////8AAAAAAAAAAAAAAAEAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAA=
</data>
<key>ID</key>
<string>-1</string>
<key>Name</key>
<string>Protective Master Boot Record (MBR : 0)</string>
</dict>
...
</array>
<key>plst</key>
<array>
<dict>
<key>Attributes</key>
<string>0x0050</string>
<key>Data</key>
<data>
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAQAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAA
</data>
<key>ID</key>
<string>0</string>
<key>Name</key>
<string></string>
</dict>
</array>
</dict>
</dict>
</plist>
The XML plist contains the following key-value pairs:
| Identifier | Description |
|---|---|
| resource-fork | dictionary |
XML plist resource-fork dictionary
The resource-fork dictionary contains the following key-value pairs:
| Identifier | Description |
|---|---|
| blkx | array of dictionaries |
| plst | array of dictionaries |
XML plist blkx array entry
A blkx array entry contains the following key-value pairs:
| Identifier | Description |
|---|---|
| Attributes | string that contains a hexadecimal formatted integer value |
| CFName | string |
| Data | string that contains base-64 encoded data of a block table |
| ID | string that contains a decimal formatted integer value |
| Name | string |
Block table
The block table (BLKXTable) is of variable size and consists of:
- block table header
- block table entries
The block table header
The block table header is 204 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "mish" | Signature |
| 4 | 4 | 1 | Format version |
| 8 | 8 | Start sector, which contains the sector number relative to the start of the media data | |
| 16 | 8 | Number of sectors | |
| 24 | 8 | Unknown (DataOffset), which seems to be always 0 | |
| 32 | 4 | Unknown (BuffersNeeded) | |
| 36 | 4 | Unknown (BlockDescriptors). Does this value correspond to the number of block table entries? | |
| 40 | 4 | 0 | Unknown (reserved) |
| 44 | 4 | 0 | Unknown (reserved) |
| 48 | 4 | 0 | Unknown (reserved) |
| 52 | 4 | 0 | Unknown (reserved) |
| 56 | 4 | 0 | Unknown (reserved) |
| 60 | 4 | 0 | Unknown (reserved) |
| 64 | 4 | Checksum type | |
| 68 | 4 | Checksum size | |
| 72 | 128 | Checksum | |
| 200 | 4 | Number of entries |
Block table entry
The block table entry (BLKXChunkEntry) is 40 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Entry type | |
| 4 | 4 | Unknown (comment) | |
| 8 | 8 | Start sector, which contains the sector number relative to the start of the start sector of the block table | |
| 16 | 8 | Number of sectors | |
| 24 | 8 | Data offset, which contains the byte offset relative to the start of the UDIF image file | |
| 32 | 8 | Data size, which contain the number of bytes of data stored, which is 0 for sparse data |
UDIF block table entry types
| Value | Identifier | Description |
|---|---|---|
| 0x00000000 | Unknown (sparse) | |
| 0x00000001 | Uncompressed (raw) data | |
| 0x00000002 | Sparse (used for Apple_Free) | |
| 0x7ffffffe | Comment | |
| 0x80000004 | ADC compressed data | |
| 0x80000005 | zlib compressed data | |
| 0x80000006 | bzip2 compressed data | |
| 0x80000007 | LZFSE compressed data | |
| 0x80000008 | LZMA compressed data | |
| 0xffffffff | Block table entries terminator |
UDIF comment
TODO: complete section
UDIF data fork
TODO: complete section
UDIF resource fork
TODO: complete section
Notes
Is the maximum compressed chunk size 2048 sectors?
Comment seems to reference compressed data but has no size or number of sectors value.
Virtual Hard Disk (VHD) image format
The Virtual Hard Disk (VHD) format is used by Microsoft vitualization products as one of its image formats. It is both used the store hard disk images and snapshots.
Overview
There are multiple types of VHD images, namely:
- Fixed-size VHD image
- Dynamic-size (or sparse) VHD image
- Differential (or differencing) VHD image
Fixed-size hard disk image
A fixed-size VHD image consists of:
- data
- file footer
Note that a fixed-size VHD image is equivalent to a raw storage media image with an additional footer.
Dynamic-size (or sparse) hard disk image
A dynamic-size (or sparse) VHD image consists of:
- copy of file footer
- dynamic disk header
- block allocation table
- data in blocks
- file footer
Differential hard disk image
A differential (or differencing) VHD image consists of:
- copy of file footer
- dynamic disk header
- block allocation table
- data in blocks
- file footer
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | big-endian |
| Date and time values | Number of seconds since January 1, 2000 00:00:00 UTC |
| Character strings | UCS-2 big-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00" |
The number of bytes per sector is 512.
Undo disk image
Virtual PC has a feature to create “Undo Disks”. This undo disk feature stores a differential hard disk image in files named something similar like:
VirtualPCUndo_<name>_0_0_hhmmssMMDDYYYY.vud
Where the date and time seems to be stored in UTC and <name> represents the name of the parent image.
File footer
The file footer is 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | "conectix" | Signature (also referred to as cookie) |
| 8 | 4 | Features | |
| 12 | 4 | 0x00010000 | Format version, where the upper 16-bit are the major version and the lower 16-bit the minor version |
| 16 | 8 | Next offset, which contains the offset to the next (metadata) structure. The offset is relative from the start of the file. It should only be set in dynamic and differential disk images. In fixed disk images it should be set to 0xffffffffffffffff (-1) | |
| 24 | 4 | Modification time, which contains the number of seconds since January 1, 2000 00:00:00 UTC | |
| 28 | 4 | Creator application | |
| 32 | 4 | Creator version, where the upper 16-bit are the major version and the lower 16-bit the minor version | |
| 36 | 4 | Creator (host) operating system | |
| 40 | 8 | Disk size, which contains the size of the disk in bytes | |
| 48 | 8 | Data size, which contains the size of the data in bytes | |
| 56 | 4 | Disk geometry | |
| 60 | 4 | Disk type | |
| 64 | 4 | Checksum, which contains a one's complement of the sum of the file footer excluding the checksum itself | |
| 68 | 16 | Identifier, which contains a big-endian UUID | |
| 84 | 1 | Saved state, which contains a flag to indicate the image is in saved state | |
| 85 | 427 | 0 | Unknown (Reserved should contain 0-byte values) |
Features
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 1 bit | Is temporary disk, which indicates that this disk is a candidate for deletion on shutdown | |
| 0.1 | 1 bit | Unknown (Reserved, must be set to 1) | |
| 0.2 | 30 bits | Unknown (Reserved, must be set to 0) |
A value of 0 represents no features are enabled.
Creator application
| Value | Identifier | Description |
|---|---|---|
| "d2v\x00" | Disk2vhd | |
| "qemu" | Qemu | |
| "vpc\x20" | Virtual PC | |
| "vs\x20\x20" | Virtual Server | |
| "win\x20" | Windows (Disk Management) |
Creator host operating system
| Value | Identifier | Description |
|---|---|---|
| "Mac\x20" | Macintosh | |
| "Wi2k" | Windows |
Disk geometry
The disk geometry is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Number of cylinders | |
| 2 | 1 | Number of heads | |
| 3 | 1 | Number of sectors per track (cylinder) |
Disk type
| Value | Identifier | Description |
|---|---|---|
| 0 | None | |
| 1 | Unknown (Deprecated) | |
| 2 | Fixed hard disk | |
| 3 | Dynamic hard disk | |
| 4 | Differential hard disk | |
| 5 | Unknown (Deprecated) | |
| 6 | Unknown (Deprecated) |
Dynamic disk header
The dynamic disk header is 1024 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | "cxsparse" | Signature (Cookie) |
| 8 | 8 | Next offset, which contains the offset to the next (metadata) structure. The offset is relative from the start of the file. Currently this is unused and should be set to 0xffffffffffffffff (-1) | |
| 16 | 8 | Block allocation table offset, whic contains the offset to the block allocation table structure. The offset is relative from the start of the file | |
| 24 | 4 | 0x00010000 | Format version, where the upper 16-bit are the major version and the lower 16-bit the minor version |
| 28 | 4 | Number of blocks, which is equivalent to the number of block allocation table entries | |
| 32 | 4 | Block size. The block size must be a power-of-two multitude of the sector size and does not include the size of the sector bitmap. The default block size is 4096 x 512-byte sectors (2 MiB) | |
| 36 | 4 | Checksum, which contains a one's complement of the sum of the dynamic disk header excluding the checksum itself | |
| 40 | 16 | Parent identifier, which contains a big-endian UUID that identifies the parent image. Only used by differential hard disk images | |
| 56 | 4 | Parent last modification time, which contains the number of seconds since January 1, 2000 00:00:00 UTC. Only used by differential hard disk images | |
| 60 | 4 | 0 | Unknown (Reserved should contain 0-byte values) |
| 64 | 512 | Parent name, which contains an UCS-2 big-endian string. Only used by differential hard disk images | |
| 576 | 8 x 24 = 192 | Array of parent locator entries. Only used by differential hard disk images | |
| 768 | 256 | 0 | Unknown (Reserved should contain 0-byte values) |
The maximum number of block allocation table entries should match the maximum possible number of blocks in the disk.
Note that the parent name can also contain a full path, e.g. in .avhd files. The part segments are separated by the \ character.
Parent locator entry
The parent locator entry is 24 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Locator platform code | |
| 4 | 4 | Platform data space, which contains the number of 512-byte sectors needed to store the parent hard disk locator | |
| 8 | 4 | Locator data size | |
| 12 | 4 | 0 | Unknown (Reserved should contain 0-byte values) |
| 16 | 8 | Locator data offset, which contains the offset to the locator data. The offset is relative from the start of the file |
Locator platform code
| Value | Identifier | Description |
|---|---|---|
| 0 | None | |
| "Mac\x20" | Mac OS alias stored as a blob | |
| "MacX" | File URL with UTF-8 encoding conforming to RFC 2396 | |
| "W2ku" | Absolute Windows path, which contains an UCS-2 big-endian string | |
| "W2ru" | Windows path relative to the differential image, which contains an UCS-2 big-endian string | |
| "Wi2k" | Unknown (Deprecated) | |
| "Wi2r" | Unknown (Deprecated) |
Block allocation table
The block allocation table is only used in dynamic and differential disk images.
The block allocation table consists of 32-bit entries. An entry contains the sector number where the data block starts or is set to 0xffffffff (-1) if the block is sparse or stored in the parent disk image.
if block_allocation_table_entry == 0xffffffff:
block is sparse or stored in parent
else:
file_offset = (block_allocation_table_entry * 512 ) + sector_bitmap_size
Unused block in a dynamic disk are sparse and should be filled with zero byte values. In a differential disk the block is stored in the parent disk image.
Data blocks
Data blocks are only used in dynamic and differential disk images.
A data block consists of:
- sector bitmap
- sector data
size_of_bitmap (in bytes) = block_size / (512 * 8)
The size of the bitmap is rounded up to the next multitude of the sector size.
Sector bitmap
In dynamic disk images the sector bitmap indicates which sectors contain data (bit set to 1) or are sparse (bit set to 0).
In differential disk images the sector bitmap indicates which sectors are stored within the image (bit set to 1) or in the parent (bit set to 0).
The bitmap is padded to a 512-byte sector boundary.
The bitmap is stored on a per-byte basis with the MSB represents the first bit in the bitmap.
References
- VHD Specifications, by Microsoft
Virtual Hard Disk version 2 (VHDX) image format
The Virtual Hard Disk version 2 (VHDX) format is used by Microsoft vitualization products as one of its image formats. It is both used the store hard disk images and snapshots.
Overview
A VHDX image file consist of:
- file header
- 2x image headers
- 2x region tables
- log or metadata journal
- block allocation table (BAT) region
- metadata region
- metadata table
- metadata items
- image (content) data
The elements are stored in 64 KiB (65536 bytes) aligned blocks
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | little-endian |
| Date and time values | N/A |
| Character strings | UCS-2 little-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00" |
The number of bytes per sector is 512 or 4096 depending on the logical sector size.
File hader
The file header of (file type identifier) is 64 KiB (65536 bytes) in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | "vhdxfile" | Signature |
| 8 | 512 | Creator application and version, with contains an UCS-2 little-endian string with end-of-string character | |
| 520 | 65016 | Unknown (reserved) |
Image header
The image header is 4 KiB (4096 bytes) in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "head" | Signature |
| 4 | 4 | Checksum | |
| 8 | 8 | Sequence number | |
| 16 | 16 | File write identifier, which contains a GUID | |
| 32 | 16 | Data write identifier, which contains a GUID | |
| 48 | 16 | Log identifier, which contains a GUID | |
| 64 | 2 | Log format version | |
| 66 | 2 | 1 | Format version |
| 68 | 4 | Log size, which according to MS-VHDX this value must be a multitude of 1 MiB | |
| 72 | 8 | Log offset, which according to MS-VHDX this value must be a multitude of 1 MiB and greater than or equal to 1 MiB | |
| 80 | 4016 | 0 | Unknown (reserved), which according to MS-VHDX this value must be set to 0 |
Checksum calculation
The CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.
The checksum is calculated over the 4 KiB bytes of data of the image header, where the image header checkum value is considered to be 0 during calculation.
Region table
The region table is stored in a block of 64 KiB (65536 bytes) and consists of:
- region table header
- 0 or more region table entries
- Unknown (reserved)
TODO: determine if 0 entries is actually supported
Region table header
The region table header is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "regi" | Signature |
| 4 | 4 | Checksum | |
| 8 | 4 | Number of table entries, which according to MS-VHDX this value must be less than or equal to 2047 | |
| 12 | 4 | 0 | Unknown (reserved), which according to MS-VHDX this value must be set to 0 |
The CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.
The checksum is calculated over the 64 KiB bytes of data of the region table where the image header checkum value is considered to be 0 during calculation.
Region table entry
The region table entry is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | Region type identifier, which contains a GUID | |
| 16 | 8 | Region data offset, which contains an offset relative to the start of the file. According to MS-VHDX this value must be a multitude of 1 MiB and greater than or equal to 1 MiB | |
| 24 | 4 | Region data size, which according to MS-VHDX this value must be a multitude of 1 MiB | |
| 28 | 4 | Is required flag, which contains 1 to indicate the region type needs to be supported |
Region type identifiers
| Value | Identifier | Description |
|---|---|---|
| 2dc27766-f623-4200-9d64-115e9bfd4a08 | Block allocation table (BAT) region | |
| 8b7ca206-4790-4b9a-b8fe-575f050f886e | Metadata region |
Metadata region
The metadata region contains:
- metadata table
- metadata items
Metadata table
The metadata table is stored in a block of 64 KiB (65536 bytes) and consists of:
- metadata table header
- 0 or more metadata table entries
- Unknown (reserved)
TODO: determine if 0 entries is actually supported
Metadata table header
The metadata table header is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | "metadata" | Signature |
| 8 | 2 | 0 | Unknown (reserved), which according to MS-VHDX this value must be set to 0 |
| 10 | 2 | Number of table entries, which according to MS-VHDX this value must be less than or equal to 2047 | |
| 12 | 20 | 0 | Unknown (reserved), which according to MS-VHDX this value must be set to 0 |
Metadata table entry
The metdata table entry is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | Metadata item identifier, which contains a GUID | |
| 16 | 4 | Metadata item offset, which contains an offset relative to the start of the metadata region. According to MS-VHDX this value must be greater than 64 KiB | |
| 20 | 4 | Metadata item size | |
| 24 | 8 | Unknown |
TODO: describe last 8 bytes
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | IsUser | |
| 0x00000002 | IsVirtualDisk | |
| 0x00000004 | IsRequired |
Metadata items
Metadata item identifiers
| Value | Identifier | Description |
|---|---|---|
| 2fa54224-cd1b-4876-b211-5dbed83bf4b8 | Virtual disk size | |
| 8141bf1d-a96f-4709-ba47-f233a8faab5f | Logical sector size | |
| a8d35f2d-b30b-454d-abf7-d3d84834ab0c | Parent locator | |
| beca12ab-b2e6-4523-93ef-c309e000c746 | Virtual disk identifier | |
| caa16737-fa36-4d43-b3b6-33f0aa44e76b | File parameters | |
| cda348c7-445d-4471-9cc9-e9885251c556 | Physical sector size |
File parameters metadata item
The file parameters metadata item is 8 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Block size, which according to MS-VHDX this value must be a power of 2 and greater than or equal to 1 MiB and not greater than 256 MiB | |
| 4.0 | 1 bit | Blocks remain allocated flag, which is used to indicate the file is a fixed-size image | |
| 4.1 | 1 bit | Has parent flag, which indicates if the VHDX file contains a differential image that has a parent image | |
| 4.2 | 30 bits | 0 | Unknown (reserved), which according to MS-VHDX this value must be set to 0 |
Logical sector size metadata item
The logical sector size metadata item is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Logical sector size, which according to MS-VHDX this value must be either 512 or 4096 |
Parent locator metadata item
The parent locator metadata item is of variable size and consits of:
- parent locator header
- 0 or more parent locator entry
- parent locator key and value data
TODO: determine if 0 entries is actually supported
Parent locator header
The parent locator header is 20 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | Parent locator type indicator, which contains the GUID: b04aefb7-d19e-4a81-b789-25b8e9445913 | |
| 16 | 2 | 0 | Unknown (reserved), which according to MS-VHDX this value must be set to 0 |
| 18 | 2 | Number of entries (or key-value pairs) |
Parent locator entry
The parent locator entry is 12 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Key data offset, which contains the offset relative from the start of the parent locator header | |
| 4 | 4 | Value data offset, which contains the offset relative from the start of the parent locator header | |
| 8 | 2 | Key data size | |
| 10 | 2 | Value data size |
Parent locator key and value data
A parent locator key or value is stored as UCS-2 little-endian string without end-of-string character.
Known keys are:
| Value | Description |
|---|---|
| absolute_win32_path | The value contains an absolute drive Windows path "\?\c:\file.vhdx" |
| parent_linkage | The value contains a string of a GUID. This GUID should correspond to the data write identifier of the parent image |
| parent_linkage2 | The value contains a string of a GUID |
| relative_path | The value contains a relative Windows path "..\file.vhdx" |
| volume_path | The value contains an absolute volume Windows path with "\?\Volume{%GUID%}\file.vhdx" |
Physical sector size metadata item
The physical sector size metadata item is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Physical sector size, which according to MS-VHDX this value must be either 512 or 4096 |
Virtual disk identifier metadata item
The virtual disk identifier metadata item is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | Virtual disk identifier, which contains a GUID |
Note that in contrast to VHD (version 1) the virtual disk identifier does not change between a differential image and its parent. The data write identifier seems to be used instead.
Virtual disk size metadata item
The virtual disk size metadata item is 8 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Virtual disk size |
Block allocation table (BAT) region
The block allocation table (BAT) region contains the block allocation table. The entries of this table describe the location of either blocks containing image content data (or payload blocks) or blocks containing a sector bitmap.
The size of an individual sector bitmap block is 1 MiB which allows for 2^23
sectors to be represented by the bitmap.
Block allocation table (BAT) entries are grouped in chunks. The size of a chunk can be calculated as following:
number_of_entries_per_chunk = (2^23 * logical_sector_size) / block_size
The block allocation table (BAT) consists of:
- one or more chunks containing:
- number of entries per chunk x BAT entry describing image content data
- 1 x BAT entry describing the a sector bitmap
Unused BAT entries are filled with 0-byte values.
The block allocation table (BAT) of:
- a fixed-size image does not contain sector bitmap entries;
- a dynamic-size image does contain sector bitmap entries, although according to MS-VHDX are not used;
- a differential image does contain sector bitmap entries.
Block allocation table (BAT) entry
The block allocation table (BAT) entry is 64 bits in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 3 bits | Block state | |
| 0.3 | 17 bits | 0 | Unknown (reserved), which according to MS-VHDX this value must be set to 0 |
| 2.4 | 44 bits | Block offset, which contains the offset relative from the start of the file as a multitude of 1 MiB |
Block states
Payload block states
| Value | Identifier | Description |
|---|---|---|
| 0 | PAYLOAD_BLOCK_NOT_PRESENT | Block is new and therefore not (yet) stored in the file |
| 1 | PAYLOAD_BLOCK_UNDEFINED | Block is not stored in the file |
| 2 | PAYLOAD_BLOCK_ZERO | Block is sparse and therefore filled with 0-byte values |
| 3 | PAYLOAD_BLOCK_UNMAPPED | Block has been unmapped |
| 6 | PAYLOAD_BLOCK_FULLY_PRESENT | Block is stored in the file |
| 7 | PAYLOAD_BLOCK_PARTIALLY_PRESENT | Block is stored in the parent |
Sector bitmap block states
| Value | Identifier | Description |
|---|---|---|
| 0 | SB_BLOCK_NOT_PRESENT | Block is new and therefore not (yet) stored in the file |
| 6 | SB_BLOCK_PRESENT | Block is stored in the file |
Sector bitmap
In differential disk images the sector bitmap indicates which sectors are stored within the image (bit set to 1) or in the parent (bit set to 0).
The bitmap is stored in a 1 MiB block.
The bitmap is stored on a per-byte basis with the LSB represents the first bit in the bitmap.
Log (metadata journal)
TODO: complete section
The log serves as metadata journal is of variable size and consist of contiguous circular (ring) buffer that contains log entries.
Log entry
TODO: complete section
4 KiB (4096 bytes) in size
Log entry header
TODO: complete section
Zero descriptor
TODO: complete section
Data descriptor
TODO: complete section
Data sector
TODO: complete section
References
- MS-VHDX: Virtual Hard Disk v2 (VHDX) File Format, by Microsoft
VMware Virtual Disk (VMDK) format
The VMware Virtual Disk (VMDK) format is used by VMware virtualization products as one of its image format.
Overview
A VMDK disk image can consist of multiple files, such as:
- descriptor file
- extent data files
- raw extent data file
- VMDK sparse extent data file
- COWD sparse extent data file
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | little-endian |
| Date and time values | |
| Character strings | narrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a codepage defined in the descriptor file |
The number of bytes per sector is 512.
Disk types
There are multiple types of VMKD images, namely:
The 2GbMaxExtentFlat (or twoGbMaxExtentFlat) disk image, which consists of:
- a descriptor file (<name>.vmdk)
- raw data extent files (<name>-f###.vmdk), where ### is contains a decimal value starting with 1.
The 2GbMaxExtentSparse (or twoGbMaxExtentSparse) disk image, which consists of:
- a descriptor file (<name>.vmdk)
- VMDK sparse data extent files (<name>-s###.vmdk), where ### is contains a decimal value starting with 1.
The monolithicFlat disk image, which consists of:
- a descriptor file (<name>.vmdk)
- raw data extent file (<name>-f001.vmdk)
The monolithicSparse disk image, which consists of:
- VMDK sparse data extent file (<name>.vmdk) also contains the descriptor file data.
The vmfs disk image, which consists of:
- a descriptor file (<name>.vmdk)
- raw data extent file (<name>-flat.vmdk)
The vmfsSparse differential disk image, which consists of:
- a descriptor file (<name>.vmdk)
- COWD sparse data extent files (<name>-delta.vmdk)
TODO: describe more disk types
Delta links
A delta link is similar to a differential image where the image contains the changes (or delta) in comparison of a parent image. According to the Virtual Disk Format 5.0 specification one delta image can chain to another delta image.
TODO: Name <name>-delta.vmdk
Descriptor file
The descriptor file is a case-insensitive text based file that contains the following information:
- optional comment and empty lines
- header
- extent descriptions
- optional change tracking file
- disk data base (DDB)
Note that the descriptor file can contains leading and trailing whitespace. Lines are separated by a line feed character (0x0a). And leading comment (starting with #) and empty lines.
Header
The header of a descriptor file looks similar to the data below.
# Disk DescriptorFile
version=1
CID=12345678
parentCID=ffffffff
createType="twoGbMaxExtentSparse"
The header consists of the following values:
| Value | Description |
|---|---|
| "# Disk DescriptorFile" | Section header (or file signature) |
| version | Format version |
| encoding | Encoding |
| CID | Content identifier, which contains a random 32-bit value updated the first time the content of the virtual disk is modified after the virtual disk is opened |
| parentCID | The content identifier of the parent, which contains a 32-bit value identifying the parent content, where a value of 'ffffffff' (-1) represents no parent content |
| isNativeSnapshot | TODO: add description. A value of "no" has been observed in a VMWare Player 9 descriptor file |
| createType | Disk type |
| parentFileNameHint | Contains the path to the parent image, which is only present if the image is a differential image (delta link) |
TODO: confirm if a content identifier of ‘fffffffe’ (-2) represents that the long content identifier should be used
Format versions
| Value | Description |
|---|---|
| 1 | TODO: add description |
| 2 | TODO: add description |
| 3 | TODO: add description |
Encodings
Note that it is currently unknown which encodings are supported, currently it is assumed that at least the Windows codepages are supported and that the default is UTF-8.
| Value | Description |
|---|---|
| Big5 | Big5 assumed to be equivalent to Windows codepage 950 |
| GBK | GBK assumed to be equivalent to Windows codepage 936, which was observed in VMWare Workstation for Windows, Chinese edition |
| Shift_JIS | Shift_JIS assumed to be equivalent to Windows codepage 932, which was observed in VMWare Workstation for Windows, Japanese edition |
| UTF-8 | UTF-8 |
| windows-949-2000 | Windows codepage 949, 2000 version |
| windows-1252 | Windows codepage 1252, which was observed in VMWare Player 9 descriptor file |
Disk types
| Value | Description |
|---|---|
| 2GbMaxExtentFlat, twoGbMaxExtentFlat | The disk is split into fixed-size extents of maximum 2 GB, which consists of raw extent data files |
| 2GbMaxExtentSparse, twoGbMaxExtentSparse | The disk is split into sparse (dynamic-size) extents of maximum 2 GB, which consists of VMDK sparse extent data files |
| custom | TODO: add description. Descriptor file with arbitrary extents, used to mount v2i-format |
| fullDevice | The disk uses a full physical disk device |
| monolithicFlat | The disk is a single raw extent data file |
| monolithicSparse | The disk is a single VMDK sparse extent data file |
| partitionedDevice | The disk uses a full physical disk device, using access per partition |
| streamOptimized | The disk is a single compressed VMDK sparse extent data file |
| vmfs | The disk is a single raw extent data file, which is similar to the "monolithicFlat" |
| vmfsEagerZeroedThick | The disk is a single raw extent data file |
| vmfsPreallocated | The disk is a single raw extent data file |
| vmfsRaw | The disk uses a full physical disk device |
| vmfsRDM, vmfsRawDeviceMap | The disk uses a full physical disk device, which is also referred to as Raw Device Map (RDM) |
| vmfsRDMP, vmfsPassthroughRawDeviceMap | The disk uses a full physical disk device, which is similar to the Raw Device Map (RDM), but sends SCSI commands to underlying hardware |
| vmfsSparse | The disk is split into COWD sparse (dynamic-size) extents |
| vmfsThin | The disk is split into COWD sparse (dynamic-size) extents |
Extent descriptions
The extent descriptions of a descriptor file looks similar to the data below.
# Extent description
RW 4192256 SPARSE "test-s001.vmdk"
# Extent description
RW 1048576 FLAT "test-f001.vmdk" 0
The extent descriptions consists of the following values:
| Value | Description |
|---|---|
| "# Extent description" | Section header |
| Extent descriptors |
Extent descriptor
The extent descriptor consists of the following values:
| Value | Description |
|---|---|
| 1st | Access mode |
| 2nd | The number of sectors |
| 3rd | Extent type |
| If extent type is not ZERO | |
| 4th | Path of the VMDK extent data file, relative to the location of the VMDK descriptor file |
| Optional | |
| 5th | The extent start sector |
| Seen in VMWare Player 9 in combination with a physical device extent on Windows | |
| 6th and 7th | "partitionUUID" followed by a device identifier |
The extent offset is specified only for flat extents and corresponds to the offset in the file or device where the extent data is located. For device-backed virtual disks (physical or raw disks) the extent offset can be non-zero. For raw extent data files the extent offset should be zero.
Extent access mode
The extent access mode consists of the following values:
| Value | Description |
|---|---|
| NOACCESS | No access |
| RDONLY | Read only |
| RW | Read write |
Extent types
The extent type consists of the following values:
| Value | Description |
|---|---|
| FLAT | raw extent data file |
| SPARSE | VMDK sparse extent data file |
| ZERO | Sparse extent that consists of 0-byte values |
| VMFS | raw extent data file |
| VMFSSPARSE | COWD sparse extent data file |
| VMFSRDM | Unknown (Physical disk device that uses RDM?) |
| VMFSRAW | Unknown (Physical disk device?) |
Note that VMWare Player 9 has been observed to use “FLAT” for Windows devices
Change tracking file section
The change tracking file section was introduced in version 3 and looks similar to:
# Change Tracking File
changeTrackPath="test-flat.vmdk"
The change tracking file section consists of the following values:
| Value | Description |
|---|---|
| "# Change Tracking File" | Section header |
| changeTrackPath | Unknown (The path to the change tracking file?) |
Disk database
The disk data base of a descriptor file looks similar to the data below.
# The Disk Data Base
#DDB
ddb.virtualHWVersion = "4"
ddb.geometry.cylinders = "16383"
ddb.geometry.heads = "16"
ddb.geometry.sectors = "63"
ddb.adapterType = "ide"
ddb.toolsVersion = "0"
The disk data base consists of the following values:
| Value | Description |
|---|---|
| "# The Disk Data Base" | Section header |
| "#DDB" | Currently assumed to be part of the section header |
| ddb.deletable | Unknown (seen: "true") |
| ddb.virtualHWVersion | The virtual hardware version. For VMWare Player and Workstation this seems to correspond with the application version |
| ddb.longContentID | The long content identifier, which contains a 128-bit base16 encoded value, without spaces |
| ddb.uuid | UUIDm which contains a 128-bit base16 encoded value, with spaces between bytes |
| ddb.geometry.cylinders | The number of cylinders |
| ddb.geometry.heads | The number of heads |
| ddb.geometry.sectors | The number of sectors |
| ddb.geometry.biosCylinders | The number of cylinders as reported by the BIOS |
| ddb.geometry.biosHeads | The number of heads as reported by the BIOS |
| ddb.geometry.biosSectors | The number of sectors as reported by the BIOS |
| ddb.adapterType | Disk adapter type |
| ddb.toolsVersion | String containing the version of the installed VMWare tools version |
| ddb.thinProvisioned | Unknown (seen: "1") |
VirtualBox has been observed to use a different case for “disk” in the section header:
# The disk Data Base
Virtual hardware version
| Value | Description |
|---|---|
| 4 | TODO: add description |
| 6 | TODO: add description |
| 7 | TODO: add description |
| 9 | VMWare Player/Workstation 9.0 |
Disk adapter types
| Value | Description |
|---|---|
| ide | TODO: add description |
| buslogic | TODO: add description |
| lsilogic | TODO: add description |
| legacyESX | TODO: add description |
The buslogic and lsilogic values are for SCSI disks and show which virtual SCSI adapter is configured for the virtual machine. The legacyESX value is for older ESX Server virtual machines when the adapter type used in creating the virtual machine is not known.
The raw extent data file
The raw extent data file contains the actual disk data. The raw extent data file can be a file or a device.
This type of extent data file is also known as “Simple” or “Flat Extent”.
The VMDK sparse extent data file
The VMDK sparse extent data file contains the actual disk data. A VMDK sparse extent data file consists of:
- file header
- optional embedded descriptor file
- optional secondary grain directory
- optional secondary grain tables
- (primary) grain directory
- (primary) grain tables
- grains
- optional backup file header
This type of extent data file is also known as “Hosted Sparse Extent” or “Stream-Optimized Compressed Sparse Extent” when markers are used.
Note that the actual layout can vary per file, Stream-Optimized Compressed Sparse Extent have been observed to use secondary file headers.
Changes in format version 2:
- added encrypted disk support (though this feature never seem to never have been implemented).
Changes in format version 3:
- the size of extent files is no longer limited to 2 GiB;
- added support for persistent changed block tracking (CBT).
Note that “CBT”, the changeTrackPath value in the descriptor file references a file that describes changed areas on the virtual disk.
File header
The file header is 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "KDMV" | Signature |
| 4 | 4 | 1, 2 or 3 | Format version |
| 8 | 4 | Flags | |
| 12 | 8 | Maximum data number of sectors (capacity) | |
| 20 | 8 | Sectors per grain, which must be a power of 2 and > 8 | |
| 28 | 8 | Embedded descriptor file start sector, which is relative from the start of the file or 0 if not set | |
| 36 | 8 | Embedded descriptor file size in sectors | |
| 44 | 4 | 512 | The number of grains table entries |
| 48 | 8 | Secondary grain directory start sector, which is relative from the start of the file or 0 if not set | |
| 56 | 8 | Primary grain directory start sector, which is relative from the start of the file, 0 if not set or 0xffffffffffffffff (GD_AT_END) if relative from the end of the file | |
| 64 | 8 | Metadata size in sectors | |
| 72 | 1 | Value to determine if the extent data file was cleanly closed (or dirty flag) | |
| 73 | 1 | '\n' | Single end of line character |
| 74 | 1 | ' ' | Non end of line character |
| 75 | 1 | '\r' | First double end of line character |
| 76 | 1 | '\n' | Second double end of line character |
| 77 | 2 | Compression method | |
| 79 | 433 | 0 | Unknown (Padding) |
The end of line characters are used to detect corruption due to file transfers that alter line end characters.
According to Virtual Disk Format 5.0 specification the maximum data number of sectors (capacity) should be a multitude of the sectors per grain. Note that it has been observed that this is not always the case.
If the primary grain directory start sector is 0xffffffffffffffff (GD_AT_END) in a Stream-Optimized Compressed Sparse Extent there should be a secondary file header stored at offset -1024 relative from the end of the file (stream) that contains the correct grain directory start sector.
Flags
The flags consist of the following values:
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | Valid new line detection test | |
| 0x00000002 | Use secondary grain directory. The secondary (redundant) grain directory should be used instead of the primary grain directory | |
| As of format version 2 | ||
| 0x00000004 | Use zeroed-grain table entry. The zeroed-grain table entry overloads grain data sector number 1 to indicate the grain is sparse | |
| Common | ||
| 0x00010000 | Has compressed grain data | |
| 0x00020000 | Contains metadata, where the file contains markers to identify metadata or data blocks | |
Compression method
The compression method consist of the following values:
| Value | Identifier | Description |
|---|---|---|
| 0x00000000 | COMPRESSION_NONE | No compression |
| 0x00000001 | COMPRESSION_DEFLATE | Compression using Deflate (RFC1951) |
Markers
The markers are used in Stream-Optimized Compressed Sparse Extents. The corresponding flag must be set for markers to be present. An example of the layout of a Stream-Optimized Compressed Sparse Extent that uses markers is:
- file header
- embedded descriptor
- compressed grain markers
- grain table marker
- grain table
- grain directory marker
- grain directory
- footer marker
- secondary file header
- end-of-stream marker
The marker
The marker is 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Value | |
| 8 | 4 | Marker data size | |
| If marker data size equals 0 | |||
| 12 | 4 | Marker type | |
| 16 | 496 | 0 | Unknown (Padding) |
| If marker data size > 0 | |||
| 12 | ... | Compressed grain data | |
If the marker data size > 0 the marker is a compressed grain marker.
Marker types
| Value | Identifier | Description |
|---|---|---|
| 0x00000000 | MARKER_EOS | End-of-stream marker |
| 0x00000001 | MARKER_GT | Grain table (metadata) marker |
| 0x00000002 | MARKER_GD | Grain directory (metadata) marker |
| 0x00000003 | MARKER_FOOTER | Footer (metadata) marker |
Compressed grain marker
The compressed grain marker indicates that compressed data follows.
| Offset | Size | Value | Description |
|---|---|---|---|
| Compressed grain header | |||
| 0 | 8 | 0 | Logical sector number |
| 8 | 4 | Compressed data size | |
| 12 | ... | Compressed data, which contains Deflate compressed data | |
Note that the compressed grain data can be larger than the grain data size.
End of stream marker
The end-of-stream marker indicates the end of the virtual disk. Basically the end-of-stream marker is an empty sector block.
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | 0 | Value |
| 8 | 4 | 0 | Marker data size |
| 12 | 4 | MARKER_EOS | Marker type |
| 16 | 496 | 0 | Unknown (Padding) |
Grain table marker
The grain table marker indicates that a grain table follows the marker sector block.
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | 0 | Value |
| 8 | 4 | 0 | Marker data size |
| 12 | 4 | MARKER_GT | Marker type |
| 16 | 496 | 0 | Unknown (Padding) |
| 512 | ... | Grain table |
Grain directory marker
The grain directory marker indicates that a grain directory follows the marker sector block.
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | 0 | Value |
| 8 | 4 | 0 | Marker data size |
| 12 | 4 | MARKER_GD | Marker type |
| 16 | 496 | 0 | Unknown (Padding) |
| 512 | ... | Grain directory |
Footer marker
The footer marker indicates that a footer follows the marker sector block.
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | 0 | Value |
| 8 | 4 | 0 | Marker data size |
| 12 | 4 | MARKER_FOOTER | Marker type |
| 16 | 496 | 0 | Unknown (Padding) |
| 512 | ... | Footer |
Grain directory
The grain directory is also referred to as level 0 metadata.
The size of the grain directory is dependent on the number of grains in the extent data file. The number of entries in the grain directory can be determined as following:
grain table size = number of grain table entries * grain size
number of grain directory entries = maximum data size / grain table size
if maximum data size % grain table size > 0:
number of grain directory entries += 1
The grain directory consists of 32-bit grain table offsets:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Grain table start sector, which is relative from the start of the file or 0 if sparse or the sector is stored in the parent image |
The grain directory is stored in a multitude of 512 byte sized blocks.
Note that as of VMDK sparse extent data file version 2 if the “use zeroed-grain table entry” flag is set, a start sector of 1 indicates the grain table is sparse.
Grain table
The grain table is also referred to as level 1 metadata.
The size of the grain table is of variable size. The number of entries in the grain table is stored in the file header. Note that the number of entries in the last grain table is dependent on the maximum data size and not necessarily the same as the value stored in the file header.
The grain directory consists of 32-bit grain table offsets:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Grain data sector number, which is relative from the start of the file or 0 if sparse or the sector is stored in the parent image |
The number of entries in a grain table and should be 512, therefore the size of the grain table is 512 x 4 = 2048 bytes.
The grain table is stored in a multitude of 512 byte sized blocks.
Note that as of VMDK sparse extent data file version 2 if the “use zeroed-grain table entry” flag is set, a sector number of 1 indicates the grain table is sparse.
Grain data
In an uncompressed sparse extent data file the data is stored at the grain data sector number.
In a compressed sparse extent data file every non-sparse grain is assumed to be stored compressed.
Compressed grain data
The compressed grain data is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| Compressed grain header | |||
| 0 | 8 | 0 | Logical sector number |
| 8 | 4 | Compressed data size | |
| 12 | ... | Compressed data, which contains zlib compressed data | |
| ... | ... | Unknown (Padding) | |
The uncompressed data size should be the grain size or less for the last grain.
Footer
The footer is only used in Stream-Optimized Compressed Sparse Extents. The footer is the same as the file header. The footer should be the last block of the disk and immediately followed by the end-of-stream marker so that they together make up the last two sectors of the disk.
The header and footer differ in that the grain directory offset value in the header is set to 0xffffffffffffffff (GD_AT_END) and in the footer to the correct value.
Changed block tracking (CBT)
TODO: complete section
The COWD sparse extent data file
The copy-on-write disk (COWD) sparse extent data file contains the actual disk data. The COW sparse extent data file consists of:
- file header
- grain directory
- grain tables
- grains
This type of extent data file is also known as ESX Server Sparse Extent.
File header
The file header is 2048 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "COWD" | Signature |
| 4 | 4 | 1 | Format version |
| 8 | 4 | 0x00000003 | Unknown (Flags) |
| 12 | 4 | Maximum data number of sectors (capacity) | |
| 16 | 4 | Sectors per grain | |
| 20 | 4 | 4 | Grain directory start sector, which is relative from the start of the file or 0 if not set |
| 24 | 4 | Number of grain directory entries | |
| 28 | 4 | The next free sector | |
| In root extent data file | |||
| 32 | 4 | The number of cylinders | |
| 36 | 4 | The number of heads | |
| 40 | 4 | The number of sectors | |
| 44 | 1016 | Unknown (Empty values) | |
| In child extent data files | |||
| 32 | 1024 | Parent file name | |
| 1056 | 4 | Parent generation | |
| Common | |||
| 1060 | 4 | Generation | |
| 1064 | 60 | Name | |
| 1124 | 512 | Description | |
| 1636 | 4 | Saved generation | |
| 1640 | 8 | Unknown (Reserved) | |
| 1648 | 4 | Value to determine if the extent data file was cleanly closed (or dirty flag) | |
| 1652 | 396 | Unknown (Padding) | |
Note that the parent file name seems not to be set in recent delta sparse extent files.
Grain directory
The grain directory is also referred to as level 0 metadata.
The size of the grain directory is dependent on the number of grains in the extent data file. The number of entries in the grain directory is stored in the file header.
The grain directory consists of 32-bit grain table offsets:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Grain table start sector, which is relative from the start of the file or 0 if not set |
The grain directory is stored in a multitude of 512 byte sized blocks. Unused bytes are set to 0.
Grain table
The grain table is also referred to as level 1 metadata.
The size of the grain table is of variable size. The number of entries in a grain table is the fixed value of 4096.
The grain directory consists of 32-bit grain table offsets:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Grain sector number, which is relative from the start of the file or 0 if not set |
The grain table is stored in a multitude of 512 byte sized blocks. Unused bytes are set to 0.
Change tracking file
TODO: complete section
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "\xa2\x72\x19\xf6" | Unknown (signature?) |
| 4 | 4 | 1 | Unknown (version?) |
| 8 | 4 | Unknown (empty values) | |
| 12 | 4 | 0x200 | Unknown |
| 16 | 8 | Unknown | |
| 24 | 8 | Unknown | |
| 32 | 4 | Unknown | |
| 36 | 4 | Unknown | |
| 40 | 4 | Unknown | |
| 44 | 16 | Unknown (UUID?) | |
| 60 | ... | Unknown (empty values?) |
Corruption scenarios
The total size specified by the number of grain table entries is lager than size specified by the maximum number of sectors. Seen in VMDK images generated by qemu-img.
Notes
The markers can be used to scan for the individual parts of the VMDK sparse extent data file if the stream has been truncated, but not that this can be very expensive process IO-wise.
References
- Virtual Disk Format 5.0, by VMWare
Volume system formats
A volume (or logical drive) is a single continous accessible storage area, typically containing a file system. A volume system format is used to manage the storage of one or more volumes.
Although related, a volume is a different concept as a partition.
Formats
Apple Partition Map (APM) format
The Apple Partition Map (APM) format is used on Motorola based Macintosh computers. On Intel based Macintosh computers the GUID Partition Table (GPT) format is used.
Overview
An Apple Partition Map (APM) consists of:
- a drive descriptor
- partition map entry of type “Apple_partition_map”
- zero or more partition map entries
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | big-endian |
| Date and time values | N/A |
| Character strings | ASCII |
Terminology
| Term | Description |
|---|---|
| Physical block | A fixed location on the storage media defined by the storage media |
| Logical block | An abstract location on the storage media defined by software |
The drive descriptor
The driver descriptor identifies the device drivers installed on a storage medium. The driver descriptor can contain refer to multiple device drivers. Every device driver is stored in a separate partition.
The drive descriptor is situated in the first block of the storage medium. This block is referred to as the device driver block. The driver descriptor block is not considered part of any partition.
The drive descriptor is 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | "\x45\x52" or "ER" | Signature |
| 2 | 2 | The block size of the device in bytes | |
| 4 | 4 | The number of blocks on the device | |
| 8 | 2 | Device type (Reserved) | |
| 10 | 2 | Device identifier (Reserved) | |
| 12 | 4 | Device data (Reserved) | |
| 16 | 2 | The number of driver descriptors | |
| 18 | 8 | The first device driver descriptor | |
| 26 | 484 | Additional driver descriptors, where unused entries are 16-bit integer values filled with 0 |
The device driver descriptor
The device driver descriptor is 8 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Start block of the device driver | |
| 4 | 2 | Device driver number of blocks | |
| 6 | 2 | Operating system type, where is 1 represents "Mac OS" |
The partition map
The partition map is stored after the drive descriptor. The partition map consists of multiple entries that must be stored continuously. The partition map itself is considered a partition therefore the first entry in the partition map describes the partition map itself.
The partition map entry
A partition map entry is 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | "\x50\x4d" or "PM" | Signature |
| 2 | 2 | 0x00 | Unknown (Reserved) |
| 4 | 4 | Total number of entries in the partition map | |
| 8 | 4 | Partition start sector | |
| 12 | 4 | Partition number of sectors | |
| 16 | 32 | Partition name, which contains an ASCII string | |
| 48 | 32 | Partition type, which contains an ASCII string | |
| 80 | 4 | Data area start sector | |
| 84 | 4 | Data area number of sectors | |
| 88 | 4 | Status flags | |
| 92 | 4 | Boot code start sector | |
| 96 | 4 | Boot code number of sectors | |
| 100 | 4 | Boot code address | |
| 104 | 4 | Unknown (Reserved) | |
| 108 | 4 | Boot code entry point | |
| 112 | 4 | Unknown (Reserved) | |
| 116 | 4 | Boot code checksum | |
| 120 | 16 | Processor type | |
| 136 | 188 x 2 = 376 | 0x00 | Unknown (Reserved) |
Note that the partition name can be empty.
Partition types
The partition types consist of the following values:
| Value | Identifier | Description |
|---|---|---|
| "Apple_Boot" | ||
| "Apple_Boot_RAID" | ||
| "Apple_Bootstrap" | ||
| "Apple_Driver" | ||
| "Apple_Driver43" | ||
| "Apple_Driver43_CD" | ||
| "Apple_Driver_ATA" | ||
| "Apple_Driver_ATAPI" | ||
| "Apple_Driver_IOKit" | ||
| "Apple_Driver_OpenFirmware" | ||
| "Apple_Extra" | ||
| "Apple_Free" | ||
| "Apple_FWDriver" | ||
| "Apple_HFS" | ||
| "Apple_HFSX" | ||
| "Apple_Loader" | ||
| "Apple_MDFW" | ||
| "Apple_MFS" | ||
| "Apple_partition_map" | ||
| "Apple_Patches" | ||
| "Apple_PRODOS" | ||
| "Apple_RAID" | ||
| "Apple_Rhapsody_UFS" | ||
| "Apple_Scratch" | ||
| "Apple_Second" | ||
| "Apple_UFS" | ||
| "Apple_UNIX_SVR2" | ||
| "Apple_Void" | ||
| "Be_BFS" | ||
| "MFS" |
Status flags
The partition status flags consist of the following values:
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | Is valid | |
| 0x00000002 | Is allocated | |
| 0x00000004 | Is in use | |
| 0x00000008 | Contains boot information | |
| 0x00000010 | Is readable | |
| 0x00000020 | Is writable | |
| 0x00000040 | Boot code is position independent | |
| 0x00000100 | Contains a chain-compatible driver | |
| 0x00000200 | Contains a real driver | |
| 0x00000400 | Contains a chain driver | |
| 0x40000000 | Automatic mount at startup | |
| 0x80000000 | Is startup partition |
Note that the “is in use” status flags does not appear to be used consistently.
GUID Partition Table (GPT) format
The GUID Partition Table (GPT) is a partitioning schema that is the successor to the Master Boot Record (MBR) Partition Table for Intel x86 based computers.
Overview
A GUID Partition Table (GPT) consists of:
- A protective or hybrid Master Boot Record (MBR) stored in block (LBA) 0
- A GPT partition table header stored in block (LBA) 1
- GPT partition entries stored in blocks (LBA) 2 - 33
- paritions area
- GPT partitions
- MBR partitions if hybrid MBR/GPT
- backup GPT partition entries (typically stored the blocks (LBA) before the last block -33 - -2)
- A backup GPT partition table header (typically stored in the last block (LBA) -1)
The GPT partition table header signature can be used to determine the block (LBA) (or sector) size.
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | little-endian |
| Date and time values | N/A |
| Character strings | UTF-16 little-endian without byte order mark (BOM) |
Master Boot Record (MBR)
Hybrid Master Boot Record (MBR)
In hybrid configuration both GPT and MBR are used concurrently. Depending on the operating system one might have precedence over the other.
Protective Master Boot Record (MBR)
The Protective Master Boot Record (MBR) is an MBR with a single partition of type “EFI GPT protective partition” (0xee) that allocated as much of the drive as possible.
GPT partition table header
The GPT partition table header is 92 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | "EFI PART" | Signature |
| 8 | 2 | 0 | Minor format version |
| 10 | 2 | 1 | Major format version |
| 12 | 4 | 92 | Header data size, which contains the size of the GPT partition table header data |
| 16 | 4 | Header data checksum | |
| 20 | 4 | 0 | Unknown (Reserved) |
| 24 | 8 | Partition header block number (LBA) | |
| 32 | 8 | Backup partition header block number (LBA) | |
| 40 | 8 | Partitions area start block number (LBA) | |
| 48 | 8 | Partitions area end block number (LBA), where the block number is included in the partitions area block range | |
| 56 | 16 | Disk identifier (GUID) | |
| 72 | 8 | Partition entries start block number (LBA) | |
| 80 | 4 | Number of partition entries | |
| 84 | 4 | 128 | Partition entry data size |
| 88 | 4 | Partition entries data checksum | |
| 92 | ... | 0 | Unknown (Reserved) |
The partition entries start block number (LBA) of the backup GPT partition table header points to backup GPT partition entries.
Note that the number of partition entries value contains the number of available partition entries not the number of used partition entries. Empty partition entries have a unused entry partition type identifier.
Checksum calculation
The CRC-32 algorithm with polynominal 0x04c11db7 and initial value of 0 is used to calculate the checksums.
The checksum is calculated over the 92 bytes of the table header data, where the header data checkum value is considered to be 0 during calculation.
GPT partition entries
GPT Partition entry
The GPT partition entry is 128 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | Partition type identifier (GUID) | |
| 16 | 16 | Partition identifier (GUID) | |
| 32 | 8 | Partition start block number (LBA) | |
| 40 | 8 | Partition end block number (LBA), where the block number is included in the partition block range | |
| 48 | 8 | Attribute flags | |
| 56 | 72 | Partition name, which contains a UTF-16 little-endian string |
Partition types
| Value | Identifier | Description |
|---|---|---|
| 00000000-0000-0000-0000-000000000000 | Unused entry | |
| 024dee41-33e7-11d3-9d69-0008c781f39f | MBR partition scheme | |
| c12a7328-f81f-11d2-ba4b-00a0c93ec93b | EFI System | |
| 21686148-6449-6e6f-744e-656564454649 | BIOS boot partition | |
| d3bfe2de-3daf-11df-ba40-e3a556d89593 | Intel Fast Flash (iFFS) partition (for Intel Rapid Start technology) | |
| f4019732-066e-4e12-8273-346c5641494f | Sony boot partition | |
| bfbfafe7-a34f-448a-9a5b-6213eb736c22 | Lenovo boot partition | |
| Windows | ||
| e3c9e316-0b5c-4db8-817d-f92df00215ae | Microsoft reserved | |
| ebd0a0a2-b9e5-4433-87c0-68b6b72699c7 | (Microsoft) Basic data | |
| 5808c8aa-7e8f-42e0-85d2-e1e90434cfb3 | Logical Disk Manager (LDM) metadata partition | |
| af9b60a0-1431-4f62-bc68-3311714a69ad | Logical Disk Manager data partition | |
| de94bba4-06d1-4d40-a16a-bfd50179d6ac | Windows recovery environment | |
| 37affc90-ef7d-4e96-91c3-2d7ae055b174 | IBM General Parallel File System (GPFS) partition | |
| e75caf8f-f680-4cee-afa3-b001e56efc2d | Storage Spaces partition | |
| HP-UX | ||
| 75894c1e-3aeb-11d3-b7c1-7b03a0000000 | Data partition | |
| e2a1e728-32e3-11d6-a682-7b03a0000000 | Service Partition | |
| Linux | ||
| 0fc63daf-8483-4772-8e79-3d69d8477de4 | Linux filesystem data | |
| a19d880f-05fc-4d3b-a006-743f0f84911e | RAID partition | |
| 44479540-f297-41b2-9af7-d131d5f0458a | Root partition (x86) | |
| 4f68bce3-e8cd-4db1-96e7-fbcaf984b709 | Root partition (x86-64) | |
| 69dad710-2ce4-4e3c-b16c-21a1d49abed3 | Root partition (32-bit ARM) | |
| b921b045-1df0-41c3-af44-4c6f280d3fae | Root partition (64-bit ARM/AArch64) | |
| 0657fd6d-a4ab-43c4-84e5-0933c84b4f4f | Swap partition | |
| e6d6d379-f507-44c2-a23c-238f2a3df928 | Logical Volume Manager (LVM) partition | |
| 933ac7e1-2eb4-4f13-b844-0e14e2aef915 | /home partition | |
| 3b8f8425-20e0-4f3b-907f-1a25a76f98e8 | /srv (server data) partition | |
| 7ffec5c9-2d00-49b7-8941-3ea10a5586b7 | Plain dm-crypt partition | |
| ca7d7ccb-63ed-4c53-861c-1742536059cc | LUKS partition | |
| 8da63339-0007-60c0-c436-083ac8230908 | Reserved | |
| FreeBSD | ||
| 83bd6b9d-7f41-11dc-be0b-001560b84f0f | Boot partition | |
| 516e7cb4-6ecf-11d6-8ff8-00022d09712b | Data partition | |
| 516e7cb5-6ecf-11d6-8ff8-00022d09712b | Swap partition | |
| 516e7cb6-6ecf-11d6-8ff8-00022d09712b | Unix File System (UFS) partition | |
| 516e7cb8-6ecf-11d6-8ff8-00022d09712b | Vinum volume manager partition | |
| 516e7cba-6ecf-11d6-8ff8-00022d09712b | ZFS partition | |
| Darwin / Mac OS | ||
| 48465300-0000-11aa-aa11-00306543ecac | Hierarchical File System Plus (HFS+) partition | |
| 7c3457ef-0000-11aa-aa11-00306543ecac | Apple APFS | |
| 55465300-0000-11aa-aa11-00306543ecac | Apple UFS container | |
| 6a898cc3-1dd2-11b2-99a6-080020736631 | ZFS | |
| 52414944-0000-11aa-aa11-00306543ecac | Apple RAID partition | |
| 52414944-5f4f-11aa-aa11-00306543ecac | Apple RAID partition, offline | |
| 426f6f74-0000-11aa-aa11-00306543ecac | Apple Boot partition (Recovery HD) | |
| 4c616265-6c00-11aa-aa11-00306543ecac | Apple Label | |
| 5265636f-7665-11aa-aa11-00306543ecac | Apple TV Recovery partition | |
| 53746f72-6167-11aa-aa11-00306543ecac | Apple Core Storage (i.e. Lion FileVault) partition | |
| b6fa30da-92d2-4a9a-96f1-871ec6486200 | SoftRAID_Status | |
| 2e313465-19b9-463f-8126-8a7993773801 | SoftRAID_Scratch | |
| fa709c7e-65b1-4593-bfd5-e71d61de9b02 | SoftRAID_Volume | |
| bbba6df5-f46f-4a89-8f59-8765b2727503 | SoftRAID_Cache | |
| Solaris / illumos | ||
| 6a82cb45-1dd2-11b2-99a6-080020736631 | Boot partition | |
| 6a85cf4d-1dd2-11b2-99a6-080020736631 | Root partition | |
| 6a87c46f-1dd2-11b2-99a6-080020736631 | Swap partition | |
| 6a8b642b-1dd2-11b2-99a6-080020736631 | Backup partition | |
| 6a898cc3-1dd2-11b2-99a6-080020736631 | /usr partition | |
| 6a8ef2e9-1dd2-11b2-99a6-080020736631 | /var partition | |
| 6a90ba39-1dd2-11b2-99a6-080020736631 | /home partition | |
| 6a9283a5-1dd2-11b2-99a6-080020736631 | Alternate sector | |
| 6a8d2ac7-1dd2-11b2-99a6-080020736631 | Reserved partition | |
| 6a945a3b-1dd2-11b2-99a6-080020736631 | Reserved partition | |
| 6a96237f-1dd2-11b2-99a6-080020736631 | Reserved partition | |
| 6a9630d1-1dd2-11b2-99a6-080020736631 | Reserved partition | |
| 6a980767-1dd2-11b2-99a6-080020736631 | Reserved partition | |
| NetBSD | ||
| 49f48d32-b10e-11dc-b99b-0019d1879648 | Swap partition | |
| 49f48d5a-b10e-11dc-b99b-0019d1879648 | FFS partition | |
| 49f48d82-b10e-11dc-b99b-0019d1879648 | LFS partition | |
| 49f48daa-b10e-11dc-b99b-0019d1879648 | RAID partition | |
| 2db519c4-b10f-11dc-b99b-0019d1879648 | Concatenated partition | |
| 2db519ec-b10f-11dc-b99b-0019d1879648 | Encrypted partition | |
| Chrome OS | ||
| fe3a2a5d-4f32-41a7-b725-accc3285a309 | Chrome OS kernel | |
| 3cb8e202-3b7e-47dd-8a3c-7ff2a13cfcec | Chrome OS rootfs | |
| 2e0a753d-9e48-43b0-8337-b15192cb1b5e | Chrome OS future use | |
| Container Linux by CoreOS | ||
| 5dfbf5f4-2848-4bac-aa5e-0d9a20b745a6 | /usr partition (coreos-usr) | |
| 3884dd41-8582-4404-b9a8-e9b84f2df50e | Resizable rootfs (coreos-resize) | |
| c95dc21a-df0e-4340-8d7b-26cbfa9a03e0 | OEM customizations (coreos-reserved) | |
| be9067b9-ea49-4f15-b4f6-f36f8c9e1818 | Root filesystem on RAID (coreos-root-raid) | |
| Haiku | ||
| 42465331-3ba3-10f1-802a-4861696b7521 | Haiku BFS | |
| MidnightBSD | ||
| 85d5e45e-237c-11e1-b4b3-e89a8f7fc3a7 | Boot partition | |
| 85d5e45a-237c-11e1-b4b3-e89a8f7fc3a7 | Data partition | |
| 85d5e45b-237c-11e1-b4b3-e89a8f7fc3a7 | Swap partition | |
| 0394ef8b-237e-11e1-b4b3-e89a8f7fc3a7 | Unix File System (UFS) partition | |
| 85d5e45c-237c-11e1-b4b3-e89a8f7fc3a7 | Vinum volume manager partition | |
| 85d5e45d-237c-11e1-b4b3-e89a8f7fc3a7 | ZFS partition | |
| Ceph | ||
| 45b0969e-9b03-4f30-b4c6-b4b80ceff106 | Journal | |
| 45b0969e-9b03-4f30-b4c6-5ec00ceff106 | dm-crypt journal | |
| 4fbd7e29-9d25-41b8-afd0-062c0ceff05d | OSD | |
| 4fbd7e29-9d25-41b8-afd0-5ec00ceff05d | dm-crypt OSD | |
| 89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be | Disk in creation | |
| 89c57f98-2fe5-4dc0-89c1-5ec00ceff2be | dm-crypt disk in creation | |
| cafecafe-9b03-4f30-b4c6-b4b80ceff106 | Block | |
| 30cd0809-c2b2-499c-8879-2d6b78529876 | Block DB | |
| 5ce17fce-4087-4169-b7ff-056cc58473f9 | Block write-ahead log | |
| fb3aabf9-d25f-47cc-bf5e-721d1816496b | Lockbox for dm-crypt keys | |
| 4fbd7e29-8ae0-4982-bf9d-5a8d867af560 | Multipath OSD | |
| 45b0969e-8ae0-4982-bf9d-5a8d867af560 | Multipath journal | |
| cafecafe-8ae0-4982-bf9d-5a8d867af560 | Multipath block | |
| 7f4a666a-16f3-47a2-8445-152ef4d03f6c | Multipath block | |
| ec6d6385-e346-45dc-be91-da2a7c8b3261 | Multipath block DB | |
| 01b41e1b-002a-453c-9f17-88793989ff8f | Multipath block write-ahead log | |
| cafecafe-9b03-4f30-b4c6-5ec00ceff106 | dm-crypt block | |
| 93b0052d-02d9-4d8a-a43b-33a3ee4dfbc3 | dm-crypt block DB | |
| 306e8683-4fe2-4330-b7c0-00a917c16966 | dm-crypt block write-ahead log | |
| 45b0969e-9b03-4f30-b4c6-35865ceff106 | dm-crypt LUKS journal | |
| cafecafe-9b03-4f30-b4c6-35865ceff106 | dm-crypt LUKS block | |
| 166418da-c469-4022-adf4-b30afd37f176 | dm-crypt LUKS block DB | |
| 86a32090-3647-40b9-bbbd-38d8c573aa86 | dm-crypt LUKS block write-ahead log | |
| 4fbd7e29-9d25-41b8-afd0-35865ceff05d | dm-crypt LUKS OSD | |
| OpenBSD | ||
| 824cc7a0-36a8-11e3-890a-952519ad3f61 | Data partition | |
| QNX | ||
| cef5a9ad-73bc-4601-89f3-cdeeeee321a1 | Power-safe (QNX6) file system | |
| Plan 9 | ||
| c91818f9-8025-47af-89d2-f030d7000c2c | Plan 9 partition | |
| VMware ESX | ||
| 9d275380-40ad-11db-bf97-000c2911d1b8 | vmkcore (coredump partition) | |
| aa31e02a-400f-11db-9590-000c2911d1b8 | VMFS filesystem partition | |
| 9198effc-31c0-11db-8f78-000c2911d1b8 | VMware Reserved | |
| Android-IA | ||
| 2568845d-2332-4675-bc39-8fa5a4748d15 | Bootloader | |
| 114eaffe-1552-4022-b26e-9b053604cf84 | Bootloader2 | |
| 49a4d17f-93a3-45c1-a0de-f50b2ebe2599 | Boot | |
| 4177c722-9e92-4aab-8644-43502bfd5506 | Recovery | |
| ef32a33b-a409-486c-9141-9ffb711f6266 | Misc | |
| 20ac26be-20b7-11e3-84c5-6cfdb94711e9 | Metadata | |
| 38f428e6-d326-425d-9140-6e0ea133647c | System | |
| a893ef21-e428-470a-9e55-0668fd91a2d9 | Cache | |
| dc76dda9-5ac1-491c-af42-a82591580c0d | Data | |
| ebc597d0-2053-4b15-8b64-e0aac75f4db1 | Persistent | |
| c5a0aeec-13ea-11e5-a1b1-001e67ca0c3c | Vendor | |
| bd59408b-4514-490d-bf12-9878d963f378 | Config | |
| 8f68cc74-c5e5-48da-be91-a0c8c15e9c80 | Factory | |
| 9fdaa6ef-4b3f-40d2-ba8d-bff16bfb887b | Factory (alt) | |
| 767941d0-2085-11e3-ad3b-6cfdb94711e9 | Fastboot / Tertiary | |
| ac6d7924-eb71-4df8-b48d-e267b27148ff | OEM | |
| Android 6.0+ ARM | ||
| 19a710a2-b3ca-11e4-b026-10604b889dcf | Android Meta | |
| 193d1ea4-b3ca-11e4-b075-10604b889dcf | Android EXT | |
| Open Network Install Environment (ONIE) | ||
| 7412f7d5-a156-4b13-81dc-867174929325 | Boot | |
| d4e6e2cd-4469-46f3-b5cb-1bff57afc149 | Config | |
| PowerPC | ||
| 9e1a2d38-c612-4316-aa26-8b49521e5a8b | PReP boot | |
| freedesktop.org OSes (Linux, etc.) | ||
| bc13c2ff-59e6-4262-a352-b275fd6f7172 | Shared boot loader configuration | |
| Atari TOS | ||
| 734e5afe-f61a-11e6-bc64-92361f002671 | Basic data partition (GEM, BGM, F32) | |
Partition attribute flags
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 1 bit | Partition is required by the platform, e.g. an OEM partition | |
| 0.1 | 1 bit | EFI firmware should ignore the content of the partition | |
| 0.2 | 1 bit | Partition contains bootable legacy BIOS, equivalent to MBR active flag | |
| 0.3 | 45 bits | Unknown (Reserved) | |
| 6.0 | 16 bits | Flags specific to the partition type |
Microsoft basic partition type attribute flags
| Offset | Size | Value | Description |
|---|---|---|---|
| 7.4 | 1 bit | Partition is read-only | |
| 7.5 | 1 bit | Partition is a shadow copy (of another partition) | |
| 7.6 | 1 bit | Partition is hidden | |
| 7.7 | 1 bit | Partition should not have a drive letter assigned (no auto-mount) |
ChromeOS partition type attribute flags
| Offset | Size | Value | Description |
|---|---|---|---|
| 6.0 | 4 bits | Priority, where 15 is thehighest priority, 1 is the lowest and 0 indicates the partition is not bootable | |
| 6.4 | 4 bits | Number of tries to attempt to boot from the partition | |
| 7.0 | 1 bit | Partition was previously successfully booted from |
Master Boot Record (MBR) partition table format
The Master Boot Record (MBR) partition table is mainly used on the family of Intel x86 based computers.
Overview
A MBR partition table consists of:
- Master Boot Record (MBR)
- Extended Partition Records (EPRs)
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | little-endian |
| Date and time values | N/A |
| Character strings | N/A |
Terminology
| Term | Description |
|---|---|
| Physical block | A fixed location on the storage media defined by the storage media |
| Logical block | An abstract location on the storage media defined by software |
Sector size(s)
Traditionally the size of sector is 512 bytes, but modern hard disk drives use 4096 bytes. The linux fdisk utility supports sector sizes of: 512, 1024, 2048 and 4096.
The location of of the “boot signature” of the MBR does not indicate the sector size. Methods to derive the sector size from the data:
- check the “boot signature” of the first EPR, if present
- check the content of well known partition types
Cylinder Head Sector (CHS) address
The Cylinder Head Sector (CHS) address is 24 bits in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 8 bits | Head | |
| 1.0 | 6 bits | Sector | |
| 1.5 | 10 bits | Cylinder |
The logical block address (LBA) can be determined from the CHS with the following calculation:
lba = (((cylinder * heads_per_cylinder) + head) * sectors_per_track) + sector - 1
The Master Boot Record (MBR)
The Master Boot Record (MBR) is a data structure that describes the properties of the storage medium and its partitions.
The classical MBR can only contain 4 partition table entries. Additional partition entries must be stored using extended partition records (EPR). The classical MBR has evolved into different variants like:
- The modern MBR
- The Advanced Active Partitions (AAP) MBR
- The NEWLDR MBR
- The AST/NEC MS-DOS and SpeedStor MBR
- The Disk Manager MBR
The classical MBR
The classical MBR is 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 446 | The boot (loader) code | |
| 446 | 16 | Partition table entry 1 | |
| 462 | 16 | Partition table entry 2 | |
| 478 | 16 | Partition table entry 3 | |
| 494 | 16 | Partition table entry 4 | |
| 510 | 2 | "\x55\xaa" | The (boot) signature |
The modern MBR
The modern MBR is 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 218 | The first part of the boot (loader) code | |
| Disk timestamp used by Microsoft Windows 95, 98 and ME | |||
| 218 | 2 | 0x0000 | Unknown (Reserved) |
| 220 | 1 | Unknown (Original physical drive), which contains a value that ranges from 0x80 to 0xff, where 0x80 is the first drive, 0x81 the second, etc. | |
| 221 | 1 | Seconds, which contains a value that ranges from 0 to 59 | |
| 222 | 1 | Minutes, which contains a value that ranges from 0 to 59 | |
| 223 | 1 | Hours, which contains a value that ranges from 0 to 23 | |
| Without disk identity | |||
| 224 | 222 | The second part of the boot (loader) code | |
| With disk identity, used by UEFI, Microsoft Windows NT or later | |||
| 224 | 216 | The second part of the boot (loader) code | |
| 440 | 4 | Disk identity (signature) | |
| 444 | 2 | 0x0000 or 0x5a5a | copy-protection marker |
| Common | |||
| 446 | 16 | Partition table entry 1 | |
| 462 | 16 | Partition table entry 2 | |
| 478 | 16 | Partition table entry 3 | |
| 494 | 16 | Partition table entry 4 | |
| 510 | 2 | "\x55\xaa" | The (boot) signature |
The extended partition record
The extended partition record (EPR) (also referred to as extended boot record (EBR)) starts with a 64 byte (extended) partition record (EPR) like the MBR. This partition table contains information about the logical partition (volume) and additional extended partition tables.
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 446 | 0x00 | Unknown (Unused), which should contain zero bytes |
| 446 | 16 | Partition table entry 1 | |
| 462 | 16 | Partition table entry 2, which should contain an extended partition | |
| 478 | 16 | 0x00 | Partition table entry 3, which should be unused and contain zero bytes |
| 494 | 16 | 0x00 | Partition table entry 4, which should be unused and contain zero bytes |
| 510 | 2 | "\x55\xaa" | Signature |
The second partition entry contains an extended partition which points to the next EPR. The LBA addresses in the EPR are relative to the start of the first EPR.
The first EPR typically has a partition type of 0x05 but certain version of Windows are known to use a partition type 0x0f, such as Windows 98.
The partition table entry
The partition table entry is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | Partition flags | |
| 1 | 3 | The partition start address, which contains a CHS relative from the start of the harddisk | |
| 4 | 1 | Partition type | |
| 5 | 3 | The partition end address, which contains a CHS relative from the start of the harddisk | |
| 8 | 4 | The partition start address, which contains a LBA (sectors) relative from the start of the harddisk | |
| 12 | 4 | Size of the partition in number of sectors |
Partition flags
The partition flags consist of the following values:
| Value | Identifier | Description |
|---|---|---|
| 0x80 | Partition is boot-able |
Partition types
The partition types consist of the following values:
| Value | Identifier | Description |
|---|---|---|
| 0x00 | Empty | |
| 0x01 | FAT12 (CHS) | |
| 0x02 | XENIX root | |
| 0x02 | XENIX user | |
| 0x04 | FAT16 (16 MiB -32 MiB CHS) | |
| 0x05 | Extended (CHS) | |
| 0x06 | FAT16 (32 MiB - 2 GiB CHS) | |
| 0x07 | HPFS/NTFS | |
| 0x08 | AIX | |
| 0x09 | AIX bootable | |
| 0x0a | OS/2 Boot Manager | |
| 0x0b | FAT32 (CHS) | |
| 0x0c | FAT32 (LBA) | |
| 0x0e | FAT16 (32 MiB - 2 GiB LBA) | |
| 0x0f | Extended (LBA) | |
| 0x10 | OPUS | |
| 0x11 | Hidden FAT12 (CHS) | |
| 0x12 | Compaq diagnostics | |
| 0x14 | Hidden FAT16 (16 MiB - 32 MiB CHS) | |
| 0x16 | Hidden FAT16 (32 MiB - 2 GiB CHS) | |
| 0x17 | Hidden HPFS/NTFS | |
| 0x18 | AST SmartSleep | |
| 0x1b | Hidden FAT32 (CHS) | |
| 0x1c | Hidden FAT32 (LBA) | |
| 0x1e | Hidden FAT16 (32 MiB - 2 GiB LBA) | |
| 0x24 | NEC DOS | |
| 0x27 | Unknown (PackardBell recovery/installation partition) | |
| 0x39 | Plan 9 | |
| 0x3c | PartitionMagic recovery | |
| 0x40 | Venix 80286 | |
| 0x41 | PPC PReP Boot | |
| 0x42 | SFS or LDM: Microsoft MBR (Dynamic Disk) | |
| 0x4d | QNX4.x | |
| 0x4e | QNX4.x 2nd part | |
| 0x4f | QNX4.x 3rd part | |
| 0x50 | OnTrack DM | |
| 0x51 | OnTrack DM6 Aux1 | |
| 0x52 | CP/M | |
| 0x53 | OnTrack DM6 Aux3 | |
| 0x54 | OnTrackDM6 | |
| 0x55 | EZ-Drive | |
| 0x56 | Golden Bow | |
| 0x5c | Priam Edisk | |
| 0x61 | SpeedStor | |
| 0x63 | GNU HURD or SysV | |
| 0x64 | Novell Netware 286 | |
| 0x65 | Novell Netware 386 | |
| 0x70 | DiskSecure Multi-Boot | |
| 0x75 | PC/IX | |
| 0x78 | XOSL | |
| 0x80 | Old Minix | |
| 0x81 | Minix / old Linux | |
| 0x82 | Solaris x86 or Linux swap | |
| 0x83 | Linux | |
| 0x84 | Hibernation or OS/2 hidden C: drive | |
| 0x85 | Linux extended | |
| 0x86 | NTFS volume set | |
| 0x87 | NTFS volume set | |
| 0x8e | Linux LVM | |
| 0x93 | Amoeba | |
| 0x94 | Amoeba BBT | |
| 0x9f | BSD/OS | |
| 0xa0 | IBM Thinkpad hibernation | |
| 0xa1 | Hibernation | |
| 0xa5 | FreeBSD | |
| 0xa6 | OpenBSD | |
| 0xa7 | NeXTSTEP | |
| 0xa8 | Mac OS X | |
| 0xa9 | NetBSD | |
| 0xab | Mac OS X Boot | |
| 0xaf | Mac OS X | |
| 0xb7 | BSDI | |
| 0xb8 | BSDI swap | |
| 0xbb | Boot Wizard hidden | |
| 0xc1 | DRDOS/sec (FAT-12) | |
| 0xc4 | DRDOS/sec (FAT-16 < 32M) | |
| 0xc6 | DRDOS/sec (FAT-16) | |
| 0xc7 | Syrinx | |
| 0xda | Non-FS data | |
| 0xdb | CP/M / CTOS / ... | |
| 0xde | Dell Utility | |
| 0xdf | BootIt | |
| 0xe1 | DOS access | |
| 0xe3 | DOS R/O | |
| 0xe4 | SpeedStor | |
| 0xeb | BeOS | |
| 0xee | EFI GPT protective partition | |
| 0xef | EFI system partition (FAT) | |
| 0xf0 | Linux/PA-RISC boot | |
| 0xf1 | SpeedStor | |
| 0xf2 | DOS secondary | |
| 0xf4 | SpeedStor | |
| 0xfb | VMWare file system | |
| 0xfc | VMWare swap | |
| 0xfd | Linux RAID auto-detect | |
| 0xfe | LANstep | |
| 0xff | BBT |
File system formats
A file system format is used to manage the storage of files.
Terminology
- File entry (file system entry): an object that represent an element within the file system, such as a file or directory. A file system typically stores metadata of a file entry, such as the name, size, permissions, date and time values, and location of the content.
- Data fork (or data stream): a file system object that represents the content of a file entry. NTFS and HFS support multiple data forks (or data streams) for an individual file entry.
- Extended attribute: A file system object that represents additional (or extended) metadata of an individual file entry.
- Reparse point: a file system object that redirects to another location or implementation (filter driver), such as Windows Overlay Filter (WOF) compression. NTFS and ReFS support reparse points.
Formats
- Apple File System (APFS)
- Extended File System (ext)
- Extensible File Allocation Table (exFAT)
- File Allocation Table (FAT)
- Hierarchical File System (HFS)
- Macintosh File System (MFS)
- New Technologies File System (NTFS)
Apple File System (APFS)
TODO: add description
Apple File System Compression (decmpfs)
Hierarchical File System (HFS) and Apple File System (APFS) use Apple File System Compression (decmpfs) to compress file contents.
Overview
An Apple File System Compression (decmpfs) compressed file consists of:
- an extended attribute named “com.apple.decmpfs”
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | little-endian |
decmpfs extended attribute
The decmpfs extended attribute consists of:
- decmpfs header
- optional compressed data
decmpfs header
The decmpfs header is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "fpmc" | Signature |
| 4 | 4 | Compression method | |
| 8 | 8 | Uncompressed data size |
Note that the signature is likely stored in little-endian and represents “cmpf”.
Compression methods
| Value | Identifier | Description |
|---|---|---|
| 1 | CMP_Type1 | Unknown (uncompressed extended attribute data) |
| 3 | ZLIB (DEFLATE) compressed extended attribute data, where the compressed data is stored in the extended attribute after the compressed data header | |
| 4 | 64k chunked ZLIB (DEFLATE) compressed resource fork, where the compressed data is stored in the resource fork | |
| 5 | Unknown (sparse compressed extended attribute data), where the uncompressed data contains 0-byte values | |
| 6 | Unknown (unused) | |
| 7 | LZVN compressed extended attribute data, where the compressed data is stored in the extended attribute after the compressed data header | |
| 8 | 64k chunked LZVN compressed resource fork, where the compressed data is stored in the resource fork | |
| 9 | Unknown (uncompressed extended attribute data, different than CMP_Type1) | |
| 10 | Unknown (64k chunked uncompressed data resource fork), where the compressed data is stored in the resource fork | |
| 11 | LZFSE compressed extended attribute data, where the compressed data is stored in the extended attribute after the compressed data header | |
| 12 | 64k chunked LZFSE compressed resource fork, where the compressed data is stored in the resource fork | |
| 0x80000001 | Unknown (faulting file) |
Note that if the ZLIB (DEFLATE) compressed data starts with 0xff the data is stored uncompressed after the first compressed data byte.
Note that if the LZVN compressed data starts with 0x06 (end of stream oppcode) the data is stored uncompressed after the first compressed data byte.
Extended File System (ext) format
The Extended File System (ext) is one of the more common file system used in Linux.
There are multiple version of ext.
| Version | Remarks |
|---|---|
| 1 | Introduced in April 1992 |
| 2 | Introduced in January 1993 |
| 3 | Introduced in November 2001, which featured journaling, dynamic growth and large directory indexing (HTree) |
| 4 | Introduces in October 2006 as unstable and becmae stable in October 2008, which featured extents and improved timestamps |
Overview
An Extended File System (ext) consists of:
- one or more block groups
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | little-endian, with the exception of UUID values that are stored in big-endian |
| Date and time values | number of seconds since January 1, 1970 00:00:00 (POSIX epoch), disregarding leap seconds. Or number of nanoseconds, when extra precision is enabled. Date and time values are stored in UTC |
| Character strings | UTF-8 or a narrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a system defined codepage |
Block group
A block group consists of:
- optional 1024 bytes of boot code or zero bytes (at offset: 0)
- optional superblock
- optional group descriptor table
- block bitmap
- inode bitmap
- allocated and unallocated blocks
The primary superblock is stored at offset 1024 relative from the start of the volume. Backup superblocks are stored at offset 1024 relative from the start of the block group if block size <= 1024 or otherwise at offset 0 from the start of the block group.
The group descriptor table is stored in the block after the superblock.
An ext2 file system with revision 0 stores a copy at the start of every block group, along with backups of the group descriptor table. Later revisions reduce the number of backup copies by only putting backups in specific groups (sparse superblock feature EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER).
Not all values in a backup superblock and backup group descriptor tables match those of the primary superblock and group descriptor table.
Note that backup superblocks can be empty (filled with 0-byte values) or contain remnant data on an Android ext file system with sparse_super.
Flex block groups
Flex (or flexible) block groups are a set of block groups that treated as a single logical block group. Metadata such as the superblock, group descriptors, data block bitmaps spans the entire logical block group and not the individual block groups part of the set.
Meta block groups
Meta block groups (META_BG) are a set (or cluster) of block groups, for which its group descriptor structures can be stored in a single block.
The first meta block group value in the superblock indicates what the first
meta block group value is 256, and the number of group descriptors that can be stored in a single block 64, then the group descriptors for the block groups [0, 16383] are stored in the group descriptor table after the primary superblock and corresponding locations of backups.
Successive group descriptor tables, for example [16384, 16447], are stored in the first block group of a meta block group and backups in the second and last block groups of the meta block group.
Blocks
The volume is devided in blocks:
block offset = block number * block size
The block size is defined in the superblock.
Note that mke2fs indicates the maximum block size is 65536.
The superblock
The ext2 superblock
The ext2 superblock is 208 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Number of inodes | |
| 4 | 4 | Number of blocks | |
| 8 | 4 | Number of reserved blocks. Reserved blocks are used to prevent the file system from filling up | |
| 12 | 4 | Number of unallocated blocks | |
| 16 | 4 | Number of unallocated inodes | |
| 20 | 4 | First data block number. The block number is relative from the start of the volume | |
| 24 | 4 | Block size, which contains the number of bits to shift 1024 to the MSB (left) | |
| 28 | 4 | Fragment size, which contains the number of bits to shift 1024 to the MSB (left) | |
| 32 | 4 | Number of blocks per block group | |
| 36 | 4 | Number of fragments per block group | |
| 40 | 4 | Number of inodes per block group | |
| 44 | 4 | Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 48 | 4 | Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 52 | 2 | The (current) mount count | |
| 54 | 2 | Maximum mount count | |
| 56 | 2 | "\x53\xef" | Signature |
| 58 | 2 | File system state flags | |
| 60 | 2 | Error-handling status | |
| 62 | 2 | Minor format revision | |
| 64 | 4 | Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 68 | 4 | Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 72 | 4 | Creator operating system | |
| 76 | 4 | Format revision | |
| 80 | 2 | Reserved block owner (or user) identifier (UID) | |
| 82 | 2 | Reserved block group identifier (GID) | |
| Dynamic inode information, if major version is EXT2_DYNAMIC_REV | |||
| 84 | 4 | First non-reserved inode | |
| 88 | 2 | Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024 | |
| 90 | 2 | Block group, which contains a block group number | |
| 92 | 4 | Compatible feature flags | |
| 96 | 4 | Incompatible feature flags | |
| 100 | 4 | Read-only compatible feature flags | |
| 104 | 16 | File system identifier, which contains a big-endian UUID | |
| 120 | 16 | Volume label, which contains a narrow character string without end-of-string character | |
| 136 | 64 | Last mount path, which contains a narrow character string without end-of-string character | |
| 200 | 4 | Algorithm usage bitmap | |
| Performance hints, if EXT2_COMPAT_PREALLOC is set | |||
| 204 | 1 | Number of pre-allocated blocks per file | |
| 205 | 1 | Number of pre-allocated blocks per directory | |
| 206 | 2 | Unknown (padding) | |
The ext3 superblock
The ext3 superblock is 336 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Number of inodes | |
| 4 | 4 | Number of blocks | |
| 8 | 4 | Number of reserved blocks. Reserved blocks are used to prevent the file system from filling up | |
| 12 | 4 | Number of unallocated blocks | |
| 16 | 4 | Number of unallocated inodes | |
| 20 | 4 | First data block number. The block number is relative from the start of the volume | |
| 24 | 4 | Block size, which contains the number of bits to shift 1024 to the MSB (left) | |
| 28 | 4 | Fragment size, which contains the number of bits to shift 1024 to the MSB (left) | |
| 32 | 4 | Number of blocks per block group | |
| 36 | 4 | Number of fragments per block group | |
| 40 | 4 | Number of inodes per block group, which can be 0 in combination with EXT3_FEATURE_INCOMPAT_JOURNAL_DEV | |
| 44 | 4 | Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 48 | 4 | Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 52 | 2 | The (current) mount count | |
| 54 | 2 | Maximum mount count | |
| 56 | 2 | "\x53\xef" | Signature |
| 58 | 2 | File system state flags | |
| 60 | 2 | Error-handling status | |
| 62 | 2 | Minor format revision | |
| 64 | 4 | Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 68 | 4 | Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 72 | 4 | Creator operating system | |
| 76 | 4 | Format revision | |
| 80 | 2 | Reserved block owner (or user) identifier (UID) | |
| 82 | 2 | Reserved block group identifier (GID) | |
| Dynamic inode information, if major version is EXT2_DYNAMIC_REV | |||
| 84 | 4 | First non-reserved inode | |
| 88 | 2 | Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024 | |
| 90 | 2 | Block group, which contains a block group number | |
| 92 | 4 | Compatible feature flags | |
| 96 | 4 | Incompatible feature flags | |
| 100 | 4 | Read-only compatible feature flags | |
| 104 | 16 | File system identifier, which contains a big-endian UUID | |
| 120 | 16 | Volume label, which contains a narrow character string without end-of-string character | |
| 136 | 64 | Last mount path, which contains a narrow character string without end-of-string character | |
| 200 | 4 | Algorithm usage bitmap | |
| Performance hints, if EXT2_COMPAT_PREALLOC is set | |||
| 204 | 1 | Number of pre-allocated blocks per file | |
| 205 | 1 | Number of pre-allocated blocks per directory | |
| 206 | 2 | Unknown (padding) | |
| Journalling support, if EXT3_FEATURE_COMPAT_HAS_JOURNAL is set | |||
| 208 | 16 | Journal identifier, which contains a big-endian UUID | |
| 224 | 4 | Journal inode | |
| 228 | 4 | Unknown (Journal device) | |
| 232 | 4 | Unknown (Head of orphan inode list). The orphan inode list is a list of inodes to delete | |
| 236 | 4 x 4 | hash-tree seed | |
| 252 | 1 | Default hash version | |
| 253 | 1 | Journal backup type | |
| 254 | 2 | Group descriptor size | |
| 256 | 4 | Default mount options | |
| 260 | 4 | First meta block group (or metablock) | |
| 264 | 4 | File system creation time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 268 | 17 x 4 | Backup journal inodes | |
The ext4 superblock
The superblock is 1024 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Number of inodes | |
| 4 | 4 | Number of blocks, which contains the lower 32-bit of the value | |
| 8 | 4 | Number of reserved blocks, which contains the lower 32-bit of the value. Reserved blocks are used to prevent the file system from filling up | |
| 12 | 4 | Number of unallocated blocks, which contains the lower 32-bit of the value | |
| 16 | 4 | Number of unallocated inodes, which contains the lower 32-bit of the value | |
| 20 | 4 | Root group block number. The block number is relative from the start of the volume | |
| 24 | 4 | Block size, which contains the number of bits to shift 1024 to the most-significant-bit (MSB) | |
| 28 | 4 | Fragment size, which contains the number of bits to shift 1024 to the most-significant-bit (MSB) | |
| 32 | 4 | Number of blocks per block group | |
| 36 | 4 | Number of fragments per block group | |
| 40 | 4 | Number of inodes per block group, which can be 0 in combination with EXT4_FEATURE_INCOMPAT_JOURNAL_DEV | |
| 44 | 4 | Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 48 | 4 | Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 52 | 2 | The (current) mount count | |
| 54 | 2 | Maximum mount count | |
| 56 | 2 | "\x53\xef" | Signature |
| 58 | 2 | File system state flags | |
| 60 | 2 | Error-handling status | |
| 62 | 2 | Minor format revision | |
| 64 | 4 | Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 68 | 4 | Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 72 | 4 | Creator operating system | |
| 76 | 4 | Format revision | |
| 80 | 2 | Reserved block owner (or user) identifier (UID) | |
| 82 | 2 | Reserved block group identifier (GID) | |
| Dynamic inode information, if major version is EXT2_DYNAMIC_REV | |||
| 84 | 4 | First non-reserved inode | |
| 88 | 2 | Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024 | |
| 90 | 2 | Block group | |
| 92 | 4 | Compatible feature flags | |
| 96 | 4 | Incompatible feature flags | |
| 100 | 4 | Read-only compatible feature flags | |
| 104 | 16 | File system identifier, which contains a big-endian UUID | |
| 120 | 16 | Volume label, which contains a narrow character string without end-of-string character | |
| 136 | 64 | Last mount path, which contains a narrow character string without end-of-string character | |
| 200 | 4 | Algorithm usage bitmap | |
| Performance hints, if EXT2_COMPAT_PREALLOC is set | |||
| 204 | 1 | Number of pre-allocated blocks per file | |
| 205 | 1 | Number of pre-allocated blocks per directory | |
| 206 | 2 | Unknown (padding) | |
| Journalling support, if EXT3_FEATURE_COMPAT_HAS_JOURNAL is set | |||
| 208 | 16 | Journal identifier, which contains a big-endian UUID | |
| 224 | 4 | Journal inode | |
| 228 | 4 | Unknown (Journal device) | |
| 232 | 4 | Unknown (Head of orphan inode list). The orphan inode list is a list of inodes to delete | |
| 236 | 4 x 4 | hash-tree seed | |
| 252 | 1 | Default hash version | |
| 253 | 1 | Journal backup type | |
| 254 | 2 | Group descriptor size | |
| 256 | 4 | Default mount options | |
| 260 | 4 | First meta block group (or metablock) | |
| 264 | 4 | File system creation time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 268 | 17 x 4 | Backup journal inodes | |
| If 64-bit support (EXT4_FEATURE_INCOMPAT_64BIT) is enabled | |||
| 336 | 4 | Number of blocks, which contains the upper 32-bit of the value | |
| 340 | 4 | Number of reserved blocks, which contains the upper 32-bit of the value | |
| 344 | 4 | Number of unallocated blocks, which contains the upper 32-bit of the value | |
| 348 | 2 | Minimum inode size | |
| 350 | 2 | Reserved inode size | |
| 352 | 4 | Miscellaneous flags | |
| 356 | 2 | RAID stride | |
| 358 | 2 | Multiple mount protection (MMP) update interval in seconds | |
| 360 | 8 | Block for multi-mount protection | |
| 368 | 4 | Unknown (blocks on all data disks (N*stride)) | |
| 372 | 1 | Number of block groups per flex block group, which is stored as: 2 ^ value | |
| 373 | 1 | Checksum type | |
| 374 | 1 | Unknown (encryption level) | |
| 375 | 1 | Unknown (padding) | |
| 376 | 8 | Unknown (s_kbytes_written) | |
| 384 | 4 | Inode number of active snapshot | |
| 388 | 4 | Identifier of active snapshot | |
| 392 | 8 | Unknown (reserved s_snapshot_r_blocks_count) | |
| 400 | 4 | Inode number of snapshot list head | |
| 404 | 4 | Unknown (s_error_count) | |
| 408 | 4 | Unknown (s_first_error_time) | |
| 412 | 4 | Unknown (s_first_error_ino) | |
| 416 | 8 | Unknown (s_first_error_block) | |
| 424 | 32 | Unknown (s_first_error_func) | |
| 456 | 4 | Unknown (s_first_error_line) | |
| 460 | 4 | Unknown (s_last_error_time) | |
| 464 | 4 | Unknown (s_last_error_ino) | |
| 468 | 4 | Unknown (s_last_error_line) | |
| 472 | 8 | Unknown (s_last_error_block) | |
| 480 | 32 | Unknown (s_last_error_func) | |
| 512 | 64 | Unknown (s_mount_opts) | |
| 576 | 4 | Unknown (s_usr_quota_inum) | |
| 580 | 4 | Unknown (s_grp_quota_inum) | |
| 584 | 4 | Unknown (s_overhead_clusters) | |
| 588 | 2 x 4 | Unknown (s_backup_bgs) | |
| 596 | 4 | Unknown (s_encrypt_algos) | |
| 600 | 16 | Unknown (s_encrypt_pw_salt) | |
| 616 | 4 | Unknown (s_lpf_ino) | |
| 620 | 4 | Unknown (s_prj_quota_inum) | |
| 624 | 4 | Metadata checksum seed | |
| 628 | 1 | Unknown (s_wtime_hi) | |
| 629 | 1 | Unknown (s_mtime_hi) | |
| 630 | 1 | Unknown (s_mkfs_time_hi) | |
| 631 | 1 | Unknown (s_lastcheck_hi) | |
| 632 | 1 | Unknown (s_first_error_time_hi) | |
| 633 | 1 | Unknown (s_last_error_time_hi) | |
| 634 | 1 | Unknown (s_first_error_errcode) | |
| 635 | 1 | Unknown (s_last_error_errcode) | |
| 636 | 2 | Unknown (s_encoding) | |
| 638 | 2 | Unknown (s_encoding_flags) | |
| 640 | 4 | Unknown (s_orphan_file_inum) | |
| 644 | 94 x 4 = 376 | Unknown (reserved) | |
| 1020 | 4 | Checksum | |
If checksum type is CRC-32C, the checksum is stored as 0xffffffff - CRC-32C.
Note that some versions of mkfs.ext set the file system creation time even for ext2 and when EXT3_FEATURE_COMPAT_HAS_JOURNAL is not set.
TODO: Is the only way to determine the file system version the compatibility and equivalent flags?
Checksum calculation
If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.
The checksum is calculated over the 1020 bytes of data of the suberblock.
Metadata checksum seed calculation
If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.
The checksum is calculated over:
- the 16 byte file system identifier in the superblock
If EXT4_FEATURE_INCOMPAT_CSUM_SEED is set the metadata checksum seed value stored in the superblock should be used instead of calculating it based on the file system identifier.
If checksum type is CRC-32C, the metadata checksum seed is stored as 0xffffffff - CRC-32C.
File system state flags
| Value | Identifier | Description |
|---|---|---|
| 0x0001 | Is clean | |
| 0x0002 | Has errors | |
| 0x0004 | Recovering orphan inodes |
Error-handling status
| Value | Identifier | Description |
|---|---|---|
| 1 | Continue | |
| 2 | Remount as read-only | |
| 3 | Panic |
Creator operating system
| Value | Identifier | Description |
|---|---|---|
| 0 | Linux | |
| 1 | GNU Hurd | |
| 2 | Masix | |
| 3 | FreeBSD | |
| 4 | Lites |
Format revision
| Value | Identifier | Description |
|---|---|---|
| 0 | EXT2_GOOD_OLD_REV | Original version with a fixed inode size of 128 bytes |
| 1 | EXT2_DYNAMIC_REV | Version with dynamic inode size support |
Compatible feature flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | EXT2_COMPAT_PREALLOC | Pre-allocate directory blocks, which is intended to reduce fragmentation |
| 0x00000002 | EXT2_FEATURE_COMPAT_IMAGIC_INODES | Has AFS server inodes |
| 0x00000004 | EXT3_FEATURE_COMPAT_HAS_JOURNAL | Has a journal |
| 0x00000008 | EXT2_FEATURE_COMPAT_EXT_ATTR | Has extended attributes |
| 0x00000010 | EXT2_FEATURE_COMPAT_RESIZE_INO, EXT2_FEATURE_COMPAT_RESIZE_INODE | Is resizeable, the file system has reserved GDT blocks for expansion, which also requires RO_COMPAT_SPARSE_SUPER |
| 0x00000020 | EXT2_FEATURE_COMPAT_DIR_INDEX | Has indexed directories |
| 0x00000040 | COMPAT_LAZY_BG | Unknown (Lazy block group) |
| 0x00000080 | COMPAT_EXCLUDE_INODE | Unknown (Exclude inode), which is not yet implemented and intended for a future file system snapshot feature |
| 0x00000100 | COMPAT_EXCLUDE_BITMAP | Unknown (Exclude bitmap), which is not yet implemented and intended for a future file system snapshot feature |
| 0x00000200 | EXT4_FEATURE_COMPAT_SPARSE_SUPER2 | Has sparse superblock version 2 |
| 0x00000400 | EXT4_FEATURE_COMPAT_FAST_COMMIT | Unknown (fast commit) |
| 0x00000800 | EXT4_FEATURE_COMPAT_STABLE_INODES | Unknown (stable inodes) |
| 0x00001000 | EXT4_FEATURE_COMPAT_ORPHAN_FILE | Has orphan file |
Note that EXT2_FEATURE_COMPAT_, EXT3_FEATURE_COMPAT_, EXT4_FEATURE_COMPAT_ and COMPAT_ can be used interchangeably.
Incompatible feature flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | EXT2_FEATURE_INCOMPAT_COMPRESSION | Has compression, which is not yet implemented |
| 0x00000002 | EXT2_FEATURE_INCOMPAT_FILETYPE | Directory entry has file type |
| 0x00000004 | EXT3_FEATURE_INCOMPAT_RECOVER | Needs recovery |
| 0x00000008 | EXT3_FEATURE_INCOMPAT_JOURNAL_DEV | Journal device |
| 0x00000010 | EXT2_FEATURE_INCOMPAT_META_BG | Has meta (or metadata) block groups |
| 0x00000040 | EXT4_FEATURE_INCOMPAT_EXTENTS | Has extents |
| 0x00000080 | EXT4_FEATURE_INCOMPAT_64BIT | Has 64-bit support, which supports more than 2^32 blocks |
| 0x00000100 | EXT4_FEATURE_INCOMPAT_MMP | Multiple mount protection |
| 0x00000200 | EXT4_FEATURE_INCOMPAT_FLEX_BG | Has flex (or flexible) block groups |
| 0x00000400 | EXT4_FEATURE_INCOMPAT_EA_INODE | Has large inodes, which are larger than 128 bytes |
| 0x00001000 | EXT4_FEATURE_INCOMPAT_DIRDATA | Data in directory entry, which is not yet implemented |
| 0x00002000 | EXT4_FEATURE_INCOMPAT_CSUM_SEED, EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM | Initial metadata checksum value (or seed) is stored in the superblock |
| 0x00004000 | EXT4_FEATURE_INCOMPAT_LARGEDIR | Large directory >2GB or 3-level hash tree (HTree) |
| 0x00008000 | EXT4_FEATURE_INCOMPAT_INLINE_DATA | Has data stored in inode |
| 0x00010000 | EXT4_FEATURE_INCOMPAT_ENCRYPT | Has encrypted inodes |
| 0x00020000 | EXT4_FEATURE_INCOMPAT_CASEFOLD | Hash case folding |
Note that EXT2_FEATURE_INCOMPAT_, EXT3_FEATURE_INCOMPAT_, EXT4_FEATURE_INCOMPAT_ and INCOMPAT_ can be used interchangeably.
Read-only compatible feature flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER | Has sparse superblocks and group descriptor tables. If set a superblock is stored in block groups 0, 1 and those that are powers of 3, 5 and 7. If not set a superblock is stored in every block group |
| 0x00000002 | EXT2_FEATURE_RO_COMPAT_LARGE_FILE | Contains large files |
| 0x00000004 | EXT2_FEATURE_RO_COMPAT_BTREE_DIR | Intended for hash-tree directory (or directory B-tree), which is not yet implemented |
| 0x00000008 | EXT4_FEATURE_RO_COMPAT_HUGE_FILE | Has huge file support |
| 0x00000010 | EXT4_FEATURE_RO_COMPAT_GDT_CSUM | Has group descriptors with checksums |
| 0x00000020 | EXT4_FEATURE_RO_COMPAT_DIR_NLINK | The ext3 32000 subdirectory limit does not apply. A directory's number of links will be set to 1 if it is incremented past 64999 |
| 0x00000040 | EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE | Has large inodes. The size of an inode can be larger than 128 bytes |
| 0x00000080 | EXT4_FEATURE_RO_COMPAT_HAS_SNAPSHOT | Has snapshots, which is not yet implemented and intended for a future file system snapshot feature |
| 0x00000100 | EXT4_FEATURE_RO_COMPAT_QUOTA | Quota is handled transactionally with the journal |
| 0x00000200 | EXT4_FEATURE_RO_COMPAT_BIGALLOC | Has big block allocation bitmaps. Block allocation bitmaps are tracked in units of clusters (of blocks) instead of blocks |
| 0x00000400 | EXT4_FEATURE_RO_COMPAT_METADATA_CSUM | File system metadata has checksums |
| 0x00000800 | EXT4_FEATURE_RO_COMPAT_REPLICA | Supports replicas |
| 0x00001000 | EXT4_FEATURE_RO_COMPAT_READONLY | Read-only file system image |
| 0x00002000 | EXT4_FEATURE_RO_COMPAT_PROJECT | File system tracks project quotas |
| 0x00004000 | EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS | File system has (read-only) shared blocks |
| 0x00008000 | EXT4_FEATURE_RO_COMPAT_VERITY | Unknown (Verity inodes may be present on the filesystem) |
| 0x00010000 | EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT | Orphan file may be non-empty |
EXT2_FEATURE_RO_COMPAT_, EXT3_FEATURE_RO_COMPAT_, EXT4_FEATURE_RO_COMPAT_ and RO_COMPAT_ are used interchangeably.
Note that in some ext file systems used by ChromeOS it has been observed that the upper 8-bits of the read-only compatible feature flags are set as in 0xff000003. debugfs identifies these as FEATURE_R24 - FEATURE_R31.
Checksum types
| Value | Identifier | Description |
|---|---|---|
| 1 | EXT4_CRC32C_CHKSUM | CRC-32C (or CRC32-C), which uses the Castagnoli polynomial (0x1edc6f41) |
The group descriptor table
The group descriptor table is stored in the block following the superblock.
The group descriptor table consist of:
- one or more group descriptors
The ext2 and ext3 group descriptor
The ext2 and ext3 group descriptor is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Block bitmap block number. The block number is relative from the start of the volume | |
| 4 | 4 | Inode bitmap block number. The block number is relative from the start of the volume | |
| 8 | 4 | Inode table block number. The block number is relative from the start of the volume | |
| 12 | 2 | Number of unallocated blocks | |
| 14 | 2 | Number of unallocated inodes | |
| 16 | 2 | Number of directories | |
| 18 | 2 | Unknown (padding) | |
| 20 | 3 x 4 | Unknown (reserved) |
Note that it has been observed that implementations that support ext4 can set a value in the padding. It is currently assumed that this value contains block group flags.
The ext4 group descriptor
The ext4 group descriptor is 68 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Block bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume | |
| 4 | 4 | Inode bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume | |
| 8 | 4 | Inode table block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume | |
| 12 | 2 | Number of unallocated blocks, which contains the lower 16-bit of the value | |
| 14 | 2 | Number of unallocated inodes, which contains the lower 16-bit of the value | |
| 16 | 2 | Number of directories, which contains the lower 16-bit of the value | |
| 18 | 2 | Block group flags | |
| 20 | 4 | Exclude bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume | |
| 24 | 2 | Block bitmap checksum, which contains the lower 16-bit of the value | |
| 26 | 2 | Inode bitmap checksum, which contains the lower 16-bit of the value | |
| 28 | 2 | Number of unused inodes, which contains the lower 16-bit of the value | |
| 30 | 2 | Checksum | |
| If 64-bit support (EXT4_FEATURE_INCOMPAT_64BIT) is enabled and group descriptor size > 32 | |||
| 32 | 4 | Block bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume | |
| 36 | 4 | Inode bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume | |
| 40 | 4 | Inode table block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume | |
| 44 | 2 | Number of unallocated blocks, which contains the upper 16-bit of the value | |
| 46 | 2 | Number of unallocated inodes, which contains the upper 16-bit of the value | |
| 48 | 2 | Number of directories, which contains the upper 16-bit of the value | |
| 50 | 2 | Number of unused inodes, which contains the upper 16-bit of the value | |
| 52 | 4 | Exclude bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume | |
| 56 | 2 | Block bitmap checksum, which contains the upper 16-bit of the value | |
| 60 | 2 | Inode bitmap checksum, which contains the upper 16-bit of the value | |
| 64 | 4 | Unknown (padding) | |
If checksum type is CRC-32C, the checksum is stored as the lower 16-bits of 0xffffffff - CRC-32C, otherwise the checksum is stored as a CRC-16.
Checksum calculation
If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.
The checksum is calculated over:
- the 16 byte file system identifier in the superblock
- the group number as a 32-bit little-endian integer
- the data of the group descriptor with the checksum set to 0-byte values
TODO: describe the block bitmap checksum calculation: crc32c(s_uuid+grp_num+bbitmap)
TODO: describe the inode bitmap checksum calculation: crc32c(s_uuid+grp_num+ibitmap)
Block group flags
| Value | Identifier | Description |
|---|---|---|
| 0x0001 | EXT4_BG_INODE_UNINIT | The inode table and bitmap are not initialized |
| 0x0002 | EXT4_BG_BLOCK_UNINIT | The block bitmap is not initialized |
| 0x0004 | EXT4_BG_INODE_ZEROED | The inode table is filled with 0 |
Direct and indirect blocks
Direct blocks are blocks that part of the data stream of a file entry.
A direct block number is 0 that is part of the data stream represents a sparse data block.
Indirect blocks are blocks that refer to blocks containing direct or indirect block numbers. There are multiple levels of indirect block:
- indirect blocks (level 1), that refer to direct blocks
- double indirect blocks (level 2), that refer to indirect blocks
- triple indirect blocks (level 3), that refer to double indirect blocks
An indirect block number is 0 that is part of the data stream represents sparse data blocks.
Extents
Extents were introduced in ext4 and are controlled by EXT4_FEATURE_INCOMPAT_EXTENTS.
Extents form an extent B-Tree, where:
- extent indexes are stored in the branch nodes and
- extent descriptors are stored in the leaf nodes.
An extents B-tree node consists of:
- extents header
- extents entries
- extents footer
Note that inodes can have an implicit last sparse extent if the the inode data size is greater than the total data size defined by the extent descriptors.
The ext4 extents header
The ext4 extents header (ext4_extent_header) is 12 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | "\x0a\xf3" | Signature |
| 2 | 2 | Number of entries | |
| 4 | 2 | Maximum number of entries | |
| 6 | 2 | Depth, where 0 reprensents a leaf node and 1 to 5 different levels of branch nodes | |
| 8 | 4 | Generation, which is used by Lustre, but not by standard ext4 |
The ext4 extent descriptor
The ext4 extent descriptor (ext4_extent) is 12 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Logical block number | |
| 4 | 2 | Number of blocks | |
| 6 | 2 | Upper 16-bits of physical block number | |
| 8 | 4 | Lower 32-bits of physical block number |
If number of blocks > 32768 the extent is considered “uninitialized” which is (as far as currently known) comparable to extent being sparse. The number of blocks of the sparse extent can be determined as following:
sparse_number_of_blocks = number_of_blocks - 32768
Sparse extents can exist between the extent descriptors. In such a case the logical block number will not align with the information from the previous extent descriptors.
Note that the native Linux ext implementation expects the extents to be stored in order of logical block number.
The ext4 extents index
The ext4 extent index (ext4_extent_idx) is 12 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Logical block number, which contains the first logical block number of next depth extents block | |
| 4 | 4 | Lower 32-bits of physical block number, which contains the block number of the next depth extents block | |
| 8 | 2 | Upper 16-bits of physical block number, which contains the block number of the next depth extents block | |
| 10 | 2 | Unknown (unused) |
The ext4 extents footer
The ext4 extents footer (ext4_extent_tail) is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Checksum of an extents block, which contains a CRC32 |
The inode
The size of the inode is defined in the superblock when dynamic inode information is present.
Note that the ext4 inode format can be used on ext2 formatted file system. This was observed in combination with format revision 1 and inode size > 128 created by mkfs.ext2.
The ext2 inode
The ext2 inode is 128 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | File mode, which contains file type and permissions | |
| 2 | 2 | Lower 16-bits of owner (or user) identifier (UID) | |
| 4 | 4 | Data size | |
| 8 | 4 | (last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 12 | 4 | (last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 16 | 4 | (last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 20 | 4 | Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 24 | 2 | Lower 16-bits of group identifier (GID) | |
| 26 | 2 | Number of (hard) links | |
| 28 | 4 | Numer of blocks | |
| 32 | 4 | Flags | |
| 36 | 4 | Unknown (reserved) | |
| 40 | 12 x 4 | Array of direct block numbers. A block number is relative from the start of the volume | |
| 88 | 4 | Indirect block number. A block number is relative from the start of the volume | |
| 92 | 4 | Double indirect block number. A block number is relative from the start of the volume | |
| 96 | 4 | Triple indirect block number. A block number is relative from the start of the volume | |
| 100 | 4 | NFS generation number | |
| 104 | 4 | File ACL (or extended attributes) block number | |
| 108 | 4 | Unknown (Directory ACL) | |
| 112 | 4 | Fragment block address | |
| 116 | 1 | Fragment block index | |
| 117 | 1 | Fragment size | |
| 118 | 2 | Unknown (padding) | |
| 120 | 2 | Upper 16-bits of owner (or user) identifier (UID) | |
| 122 | 2 | Upper 16-bits of group identifier (GID) | |
| 124 | 4 | Unknown (reserved) |
Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.
If the inode size is larger than 128 bytes, the additional data can be stored using an ext4 inode extension.
The ext3 inode
The ext3 inode is 132 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | File mode, which contains file type and permissions | |
| 2 | 2 | Lower 16-bits of owner (or user) identifier (UID) | |
| 4 | 4 | Data size | |
| 8 | 4 | (last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 12 | 4 | (last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 16 | 4 | (last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 20 | 4 | Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 24 | 2 | Lower 16-bits of group identifier (GID) | |
| 26 | 2 | Number of (hard) links | |
| 28 | 4 | Numer of blocks | |
| 32 | 4 | Flags | |
| 36 | 4 | Unknown (reserved) | |
| 40 | 12 x 4 | Array of direct block numbers. A block number is relative from the start of the volume | |
| 88 | 4 | Indirect block number. A block number is relative from the start of the volume | |
| 92 | 4 | Double indirect block number. A block number is relative from the start of the volume | |
| 96 | 4 | Triple indirect block number. A block number is relative from the start of the volume | |
| 100 | 4 | NFS generation number | |
| 104 | 4 | File ACL (or extended attributes) block number | |
| 108 | 4 | Unknown (Directory ACL) | |
| 112 | 4 | Fragment block address | |
| 116 | 1 | Fragment block index | |
| 117 | 1 | Fragment size | |
| 118 | 2 | Unknown (padding) | |
| 120 | 2 | Upper 16-bits of owner (or user) identifier (UID) | |
| 122 | 2 | Upper 16-bits of group identifier (GID) | |
| 124 | 4 | Unknown (reserved) | |
| Extension (if inode size > 128) | |||
| 128 | 2 | Extended inode size | |
| 130 | 2 | Unknown (padding) | |
Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.
If the inode size is larger than 128 bytes, the additional data can be stored using an ext4 inode extension.
The ext4 inode
The ext4 inode is 160 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | File mode, which contains file type and permissions | |
| 2 | 2 | Lower 16-bits of owner (or user) identifier (UID) | |
| 4 | 4 | Lower 32-bits of data size | |
| If EXT4_EA_INODE_FL is not set | |||
| 8 | 4 | (last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 12 | 4 | (last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 16 | 4 | (last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| If EXT4_EA_INODE_FL is set | |||
| 8 | 4 | Unknown (extended attribute value data checksum) | |
| 12 | 4 | Unknown (lower 32-bits of extended attribute reference count) | |
| 16 | 4 | Unknown (inode number that owns the extended attribute) | |
| Common | |||
| 20 | 4 | Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch) | |
| 24 | 2 | Lower 16-bits of group identifier (GID) | |
| 26 | 2 | Number of (hard) links | |
| 28 | 4 | Lower 32-bits of number of blocks | |
| 32 | 4 | Flags | |
| If EXT4_EA_INODE_FL is not set | |||
| 36 | 4 | Lower 32-bits of version | |
| If EXT4_EA_INODE_FL is set | |||
| 36 | 4 | Unknown (upper 32-bits of extended attribute reference count) | |
| If EXT4_EXTENTS_FL and EXT4_INLINE_DATA_FL are not set | |||
| 40 | 12 x 4 | Array of direct block numbers. A block number is relative from the start of the volume | |
| 88 | 4 | Indirect block number. A block number is relative from the start of the volume | |
| 92 | 4 | Double indirect block number. A block number is relative from the start of the volume | |
| 96 | 4 | Triple indirect block number. A block number is relative from the start of the volume | |
| If EXT4_EXTENTS_FL is set | |||
| 40 | 12 | Extents header | |
| 52 | 4 x 12 | extent descriptors or extents indexes | |
| If EXT4_INLINE_DATA_FL is set | |||
| 40 | 60 | File content data | |
| Common | |||
| 100 | 4 | NFS generation number | |
| 104 | 4 | Lower 32-bits of file ACL (or extended attributes) block number | |
| 108 | 4 | Upper 32-bits of data size | |
| 112 | 4 | Fragment block address | |
| 116 | 2 | Upper 16-bits of number of blocks | |
| 118 | 2 | Upper 16-bits of file ACL (or extended attributes) block number | |
| 120 | 2 | Upper 16-bits of owner (or user) identifier (UID) | |
| 122 | 2 | Upper 16-bits of group identifier (GID) | |
| 124 | 2 | Lower 16-bits of checksum | |
| 126 | 2 | Unknown (reserved) | |
| Extension (if inode size > 128) | |||
| 128 | 2 | Extended inode size, which can vary, values of 4, 28 and 32 have been observed | |
| 130 | 2 | Upper 16-bits of checksum | |
| 132 | 4 | (last) inode change (or modification) time extra precision | |
| 136 | 4 | (last) content modification time extra precision | |
| 140 | 4 | (last) access time extra precision | |
| 144 | 4 | Creation time | |
| 148 | 4 | Creation time extra precision | |
| 152 | 4 | Upper 32-bits of version | |
| 156 | 4 | Unknown (i_projid) | |
If checksum type is CRC-32C, the checksum is stored as 0xffffffff - CRC-32C.
Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.
Checksum calculation
If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.
The checksum is calculated from:
- the 16 byte file system identifier in the superblock
- the inode number as a 32-bit little-endian integer
- the NFS generation number in the inode as a 32-bit little-endian integer
- the data of the inode with the lower and upper part of the checksum set to 0-byte values.
Extra precision
The ext4 extra precision is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 2 bits | Extra epoch value | |
| 0.2 | 30 bits | Fraction of second in nanoseconds |
The 34 bits extra precision timestamp (in number of seconds) can be calculated as following:
extra_precision_timestamp = (extra_epoch_value * 0x100000000) + timestamp
Notes
It has been observed that when EXT4_EA_INODE_FL is set the (last) modification time can contain a valid timestamp.
According to The Linux Kernel documentation
For backward compatibility with older versions of this feature, the i_mtime/i_generation may store a back-reference to the inode number and i_generation of the one owning inode (in cases where the EA inode is not referenced by multiple inodes) to verify that the EA inode is the correct one being accessed.
File mode
| Value | Identifier | Description |
|---|---|---|
| Access other, Bitmask: 0x0007 (S_IRWXO) | ||
| 0x0001 | S_IXOTH | X-access for other |
| 0x0002 | S_IWOTH | W-access for other |
| 0x0004 | S_IROTH | R-access for other |
| Access group, Bitmask: 0x0038 (S_IRWXG) | ||
| 0x0008 | S_IXGRP | X-access for group |
| 0x0010 | S_IWGRP | W-access for group |
| 0x0020 | S_IRGRP | R-access for group |
| Access owner (or user), Bitmask: 0x01c0 (S_IRWXU) | ||
| 0x0040 | S_IXUSR | X-access for owner (or user) |
| 0x0080 | S_IWUSR | W-access for owner (or user) |
| 0x0100 | S_IRUSR | R-access for owner (or user) |
| Other | ||
| 0x0200 | S_ISTXT | Sticky bit |
| 0x0400 | S_ISGID | Set group identifer (GID) on execution |
| 0x0800 | S_ISUID | Set owner (or user) identifer (UID) on execution |
| Type of file, Bitmask: 0xf000 (S_IFMT) | ||
| 0x1000 | S_IFIFO | Named pipe (FIFO) |
| 0x2000 | S_IFCHR | Character device |
| 0x4000 | S_IFDIR | Directory |
| 0x6000 | S_IFBLK | Block device |
| 0x8000 | S_IFREG | Regular file |
| 0xa000 | S_IFLNK | Symbolic link |
| 0xc000 | S_IFSOCK | Socket |
Inode flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | EXT2_SECRM_FL, EXT3_SECRM_FL, EXT4_SECRM_FL, EXT4_INODE_SECRM | Secure deletion |
| 0x00000002 | EXT2_UNRM_FL, EXT3_UNRM_FL, EXT4_UNRM_FL, EXT4_INODE_UNRM | Undelete |
| 0x00000004 | EXT2_COMPR_FL, EXT3_COMPR_FL, EXT4_COMPR_FL, EXT4_INODE_COMPR | Compressed file, which is not yet implemented |
| 0x00000008 | EXT2_SYNC_FL, EXT3_SYNC_FL, EXT4_SYNC_FL, EXT4_INODE_SYNC | Synchronous updates |
| 0x00000010 | EXT2_IMMUTABLE_FL, EXT3_IMMUTABLE_FL, EXT4_IMMUTABLE_FL, EXT4_INODE_IMMUTABLE | Immutable file |
| 0x00000020 | EXT2_APPEND_FL, EXT3_APPEND_FL, EXT4_APPEND_FL, EXT4_INODE_APPEND | Writes to file may only append |
| 0x00000040 | EXT2_NODUMP_FL, EXT3_NODUMP_FL, EXT4_NODUMP_FL, EXT4_INODE_NODUMP | Do not remove (or dump) file |
| 0x00000080 | EXT2_NOATIME_FL, EXT3_NOATIME_FL, EXT4_NOATIME_FL, EXT4_INODE_NOATIME | Do not update access time (atime) |
| 0x00000100 | EXT2_DIRTY_FL, EXT3_DIRTY_FL, EXT4_DIRTY_FL, EXT4_INODE_DIRTY | Dirty compressed file, which is not yet implemented |
| 0x00000200 | EXT2_COMPRBLK_FL, EXT3_COMPRBLK_FL, EXT4_COMPRBLK_FL, EXT4_INODE_COMPRBLK | One or more compressed clusters, which is not yet implemented |
| 0x00000400 | EXT2_NOCOMP_FL, EXT3_NOCOMP_FL, EXT4_NOCOMPR_FL, EXT4_INODE_NOCOMPR | Do not compress, which is not yet implemented |
| ext2 and ext3 | ||
| 0x00000800 | EXT2_ECOMPR_FL, EXT3_ECOMPR_FL | Encrypted Compression error |
| ext4 | ||
| 0x00000800 | EXT4_ENCRYPT_FL, EXT4_INODE_ENCRYPT | Encrypted file |
| Common | ||
| 0x00001000 | EXT2_BTREE_FL, EXT2_INDEX_FL, EXT3_INDEX_FL, EXT4_INDEX_FL, EXT4_INODE_INDEX | Hash-indexed directory (previously referred to as B-tree format) |
| 0x00002000 | EXT2_IMAGIC_FL, EXT3_IMAGIC_FL, EXT4_IMAGIC_FL, EXT4_INODE_IMAGIC | AFS directory |
| 0x00004000 | EXT2_JOURNAL_DATA_FL, EXT3_JOURNAL_DATA_FL, EXT4_JOURNAL_DATA_FL, EXT4_INODE_JOURNAL_DATA | File data must be written using the journal |
| 0x00008000 | EXT2_NOTAIL_FL, EXT3_NOTAIL_FL, EXT4_NOTAIL_FL, EXT4_INODE_NOTAIL | File tail should not be merged, which is not used by ext4 |
| 0x00010000 | EXT2_DIRSYNC_FL, EXT3_DIRSYNC_FL, EXT4_DIRSYNC_FL, EXT4_INODE_DIRSYNC | Directory entries should be written synchronously (dirsync) |
| 0x00020000 | EXT2_TOPDIR_FL, EXT3_TOPDIR_FL, EXT4_TOPDIR_FL, EXT4_INODE_TOPDIR | Top of directory hierarchy |
| ext4 | ||
| 0x00040000 | EXT4_HUGE_FILE_FL, EXT4_INODE_HUGE_FILE | Is a huge file |
| 0x00080000 | EXT4_EXTENTS_FL, EXT4_INODE_EXTENTS | Inode uses extents |
| 0x00100000 | EXT4_INODE_VERITY | Verity protected inode |
| 0x00200000 | EXT4_EA_INODE_FL, EXT4_INODE_EA_INODE | Inode used for large extended attribute |
| 0x00400000 | EXT4_EOFBLOCKS_FL, EXT4_INODE_EOFBLOCKS | Blocks allocated beyond EOF |
| 0x01000000 | EXT4_SNAPFILE_FL | Inode is a snapshot |
| 0x02000000 | EXT4_INODE_DAX | Inode is direct-access (DAX) |
| 0x04000000 | EXT4_SNAPFILE_DELETED_FL | Snapshot is being deleted |
| 0x08000000 | EXT4_SNAPFILE_SHRUNK_FL | Snapshot shrink has completed |
| 0x10000000 | EXT4_INLINE_DATA_FL, EXT4_INODE_INLINE_DATA | Inode has inline data |
| 0x20000000 | EXT4_PROJINHERIT_FL, EXT4_INODE_PROJINHERIT | Create sub file entries with the same project identifier |
| 0x40000000 | EXT4_INODE_CASEFOLD | Casefolded directory |
| 0x80000000 | EXT4_INODE_RESERVED | Unknown (reserved) |
Reserved inode numbers
| Value | Identifier | Description |
|---|---|---|
| 1 | EXT2_BAD_INO, EXT3_BAD_INO, EXT4_BAD_INO | Bad blocks inode |
| 2 | EXT2_ROOT_INO, EXT3_ROOT_INO, EXT4_ROOT_INO | Root inode |
| 3 | EXT4_USR_QUOTA_INO | Owner (or user) quota inode |
| 4 | EXT4_GRP_QUOTA_INO | Group quota inode |
| 5 | EXT2_BOOT_LOADER_INO, EXT3_BOOT_LOADER_INO, EXT4_BOOT_LOADER_INO | Boot loader inode |
| 6 | EXT2_UNDEL_DIR_INO, EXT3_UNDEL_DIR_INO, EXT4_UNDEL_DIR_INO | Undelete directory inode |
| 7 | EXT3_RESIZE_INO, EXT4_RESIZE_INO | Reserved group descriptors inode |
| 8 | EXT3_JOURNAL_INO, EXT4_JOURNAL_INO | Journal inode |
Inline data
ext4 supports storing file entry data inline when the inode flag EXT4_INLINE_DATA_FL is set.
Note that inodes can have an implicit last sparse extent if the the inode data size is greater than 60 bytes.
Huge files
TODO: complete section
Directory entries
Directories entries are stored in the data blocks of a directory inode. The directory entries can be stored in multiple ways:
- as linear directory entries
- as inline data directory entries
- as hash-tree directory entries
Linear directory entries
Linear directories entries are stored in a series of allocation blocks.
Linear directory entries contain:
- directory entry for “.” (self)
- directory entry for “..” (parent)
- directory entry for other file system entries
The directory entry
The directory entry is of variable size, at most 263 bytes, and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Inode number | |
| 4 | 2 | Directory entry size, which must be a multitude of 4 | |
| 6 | 1 | Name size, which contains the size of the name without the end-of-string character and has a maximum of 255 | |
| 7 | 1 | File type | |
| 8 | ... | Name, which contains a narrow character string without end-of-string character |
Older directory entry structures considered the name size a 16-bit value, but the upper byte was never used.
The name can contain any character value except the path segment separator (‘/’) and the NUL-character (‘\0’).
File types
| Value | Identifier | Description |
|---|---|---|
| 0 | EXT2_FT_UNKNOWN | Unknown |
| 1 | EXT2_FT_REG_FILE | Regular file |
| 2 | EXT2_FT_DIR | Directory |
| 3 | EXT2_FT_CHRDEV | Character device |
| 4 | EXT2_FT_BLKDEV | Block device |
| 5 | EXT2_FT_FIFO | FIFO queue |
| 6 | EXT2_FT_SOCK | Socket |
| 7 | EXT2_FT_SYMLINK | Symbolic link |
Inline data directory entries
ext4 supports storing the directory entries as inline data when the inode flag EXT4_INLINE_DATA_FL is set.
The inline data directory entries is of variable size, at most 60 bytes, and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Parent inode number | |
| 4 | ... | Array of directory entries |
Hash tree directory entries
The data of the hash tree (HTree) is stored in the data blocs or extent defined by the directory inode. The hash-indexed directory entries are read-compatible with the linear directory entry.
Hash tree root
The hash tree root consists of:
- dx_root
- directory entry for “.” (self)
- directory entry for “..” (parent)
- dx_root_info
- Array of dx_entry
- directory entry for other file system entries
dx_root_info
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | 0 | Unknown (reserved) |
| 4 | 1 | Hash method (or version) | |
| 5 | 1 | 8 | Root information size |
| 6 | 1 | Number of indirect levels in the hash tree | |
| 7 | 1 | Unknown (unused flags) |
dx_entry
TODO: complete section
struct dx_entry
{
__le32 hash;
__le32 block;
};
Symbolic links
If the target path of a symbolic link is less than 60 characters long, it is stored in the 60 bytes in the inode that are normally used for the 12 direct and 3 indirect block numbers. If the target path is longer than 60 characters, a block is allocated, and the block contains the target path. The inode data size contains the length of the target path.
Extended attributes
Extended attributes can be stored:
- in the inode block after the inode data
- in the block referenced by the file ACL (or extended attributes) block number, if not 0
Note that both should be read to get the all the extended attributes.
Extended attributes consists of:
- An extended attributes header
- Extended attributes entries with a terminator
The extended attributes inode header
The extended attributes inode header (ext2_xattr_ibody_header, ext3_xattr_ibody_header, ext4_xattr_ibody_header) is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "\x00\x00\x02\xea" | Signature |
The extended attributes block header
The ext2 and ext3 extended attributes block header
The ext2 and ext3 extended attributes block header (ext2_xattr_header, ext3_xattr_header) is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "\x00\x00\x02\xea" | Signature |
| 4 | 4 | Unknown (reference count) | |
| 8 | 4 | Number of blocks | |
| 12 | 4 | Attributes hash | |
| 16 | 4 x 4 | Unknown (reserved) |
The ext4 extended attributes block header
The ext4 extended attributes block header (ext4_xattr_header) is 32 bytes of size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "\x00\x00\x02\xea" | Signature |
| 4 | 4 | Unknown (Reference count) | |
| 8 | 4 | Number of blocks | |
| 12 | 4 | Attributes hash | |
| 16 | 4 | Checksum | |
| 20 | 3 x 4 | Unknown (reserved) |
The extended attributes entry
The extended attributes entry (ext2_xattr_entry, ext3_xattr_entry, ext4_xattr_entry) is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | Name size, which contains the size of the name without the end-of-string character | |
| 1 | 1 | Name index | |
| 2 | 2 | Value data offset, which contains the offset of the value data relative from the start of the extended attributes block or after the extended attributes signature in the inode block data | |
| 4 | 4 | Value data inode number, which contains the inode number that contains the value data or 0 to indicate the current block | |
| 8 | 4 | Value data size | |
| 12 | 4 | Unknown (Attribute hash) | |
| 16 | ... | Name string, which contains an ASCII string without end-of-string character and can be empty, for example in combination with a prefix or with an encrypted file | |
| ... | ... | 32-bit alignment padding |
The last extended attributes entry has the first 4 values set to 0 (8 bytes) and is used as a terminator.
Note that some implementations of older Android versions of ext appear to only set the first 4 bytes to 0 for the terminator.
The extended attribute name index
The name index indicates the prefix of the extended attribute name.
| Name index | Name prefix | Description |
|---|---|---|
| 0 | "" | No prefix |
| 1 | "user." | |
| 2 | "system.posix_acl_access" | |
| 3 | "system.posix_acl_default" | |
| 4 | "trusted." | |
| 6 | "security." | |
| 7 | "system." | |
| 8 | "system.richacl" |
Journal
The journal was introduced in ext3.
TODO: complete section
Exclude bitmap
TODO: complete section
Note that the excluded bitmap is used for snapshots.
Corruption scenarios
File entry with invalid extents header signature
File content inaccessible but file entry metadata and extended attributes accessible.
References
- ext4 Data Structures and Algorithms, by the Linux kernel documentation
Extensible File Allocation Table (exFAT) file system format
The Extensible File Allocation Table (exFAT) file system format is a successor of the File Allocation Table (FAT) file system format.
Overview
An exFAT file system consists of:
- One or more reserved sectors
- a boot record (or boot sector)
- One or more cluster block allocation tables
- File and directory data
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | little-endian |
| Date and time values | FAT date and time |
| Character strings | UCS-2 little-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00" |
Boot record
The boot record is stored in the first sector of the volume.
The boot record is at least 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 3 | "\xeb\x76\x90" | Boot entry point (JMP +120, NOP) |
| 3 | 8 | "EXFAT\x20\x20\x20" | File system signature (or OEM name) |
| 11 | 53 | 0 | Unknown (reserved), which must be 0 |
| 64 | 8 | Partition offset | |
| 72 | 8 | Total number of sectors | |
| 80 | 4 | Cluster block allocation table start sector | |
| 84 | 4 | Cluster block allocation table size in number of sectors, which must be non 0 | |
| 88 | 4 | Data cluster start sector | |
| 92 | 4 | Total number of data clusters | |
| 96 | 4 | Root directory start cluster | |
| 100 | 4 | Volume serial number | |
| 104 | 1 | Format revision minor number | |
| 105 | 1 | 1 | Format revision major number |
| 106 | 2 | Volume flags | |
| 108 | 1 | Bytes per sector, which is stored as 2^n, for example 9 is 2^9 = 512. The bytes per sector value must be 512, 1024, 2048 or 4096 | |
| 109 | 1 | Sectors per cluster block, which is stored as 2^n, for example 3 is 2^3 = 8. The sectors per cluster block must be 1 upto 32M (2^25) | |
| 110 | 1 | Number of cluster block allocation tables | |
| 111 | 1 | Drive number | |
| 112 | 1 | Unknown (percent in use), which contains the percentage of allocated cluster blocks in the cluster heap of 0xff if not available | |
| 113 | 7 | Unknown (reserved) | |
| 120 | 390 | Used for boot code | |
| 510 | 2 | "\x55\xaa" | Sector signature |
Volume flags
| Value | Identifier | Description |
|---|---|---|
| 0x0001 | ActiveFat | Active FAT, 0 for the first FAT, 1 for the second FAT |
| 0x0002 | VolumeDirty | Is dirty |
| 0x0004 | MediaFailure | Has media failures |
| 0x0008 | ClearToZero | Must be cleared |
| 0xfff0 | Unknown (reserved) |
Cluster block allocation table
A cluster block allocation table consists of:
- One ore more cluster block allocation table entries
Cluster block allocation table entry
A cluster block allocation table entry is 32 bits in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 32 bits | Data cluster number |
Where the data cluster number has the following meanings:
| Value(s) | Description |
|---|---|
| 0x00000000 | Unused (free) cluster |
| 0x00000001 | Unknown (invalid) |
| 0x00000002 - 0xffffffef | Used cluster |
| 0xfffffff0 - 0xfffffff6 | Reserved |
| 0xfffffff7 | Bad cluster |
| 0xfffffff8 - 0xffffffff | End of cluster chain |
Directory
A directory consists of:
- Zero or more directory entries
- Terminator directory entry
Directory entry
A directory entry is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | Entry type | |
| 1 | 19 | Entry data | |
| 20 | 4 | Data stream start cluster | |
| 24 | 8 | Data stream size |
Directory entry type
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 5 bits | Type type code | |
| 0.5 | 1 bit | Is non-critical (also referred to as type importance) | |
| 0.6 | 1 bit | Is secondary entry (also referred to as type category) | |
| 0.7 | 1 bit | In use |
| Value | Description |
|---|---|
| 0x00 | Terminator directory entry |
| 0x01 - 0x7f | Unused |
| 0x80 | Invalid |
| 0x81 - 0xff | Used |
Directory entry type codes
| Value | Description | ||
|---|---|---|---|
| Critical and primary | |||
| 0x81 | Allocation bitmap | ||
| 0x82 | Case folding mappings | ||
| 0x83 | Volume label | ||
| 0x85 | File entry | ||
| Non-critical and primary | |||
| 0xa0 | Volume identifier | ||
| 0xa1 | TexFAT padding | ||
| Critical and secondary | |||
| 0xc0 | Data stream | ||
| 0xc1 | File entry name | ||
| Non-critical and secondary | |||
| 0xe0 | Vendor extension | ||
| 0xe1 | Vendor allocation | ||
Allocation bitmap directory entry
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | 0x81 | Entry type |
| 1 | 1 | Bitmap flags | |
| 2 | 18 | 0 | Unknown (Reserved) |
| 20 | 4 | Data stream start cluster | |
| 24 | 8 | Data stream size |
Case folding mappings directory entry
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | 0x82 | Entry type |
| 1 | 3 | 0 | Unknown (Reserved) |
| 4 | 4 | Checksum | |
| 8 | 12 | 0 | Unknown (Reserved) |
| 20 | 4 | Data stream start cluster | |
| 24 | 8 | Data stream size |
Volume label directory entry
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | 0x83 | Entry type |
| 1 | 1 | Name number of characters | |
| 2 | 22 | Name string, which contains an UCS-2 little-endian string without an end-of-string character | |
| 24 | 8 | 0 | Unknown (Reserved) |
Note that the volume label directory entry should only be stored in the first and/or second directory entry of the root directory.
File entry directory entry
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | 0x85 | Entry type |
| 1 | 1 | Unknown (Secondary count) | |
| 2 | 2 | Unknown (Set checksum) | |
| 4 | 2 | File attribute flags | |
| 6 | 2 | 0 | Unknown (Reserved) |
| 8 | 2 | Creation time | |
| 10 | 2 | Creation date | |
| 12 | 2 | Last modification time | |
| 14 | 2 | Last modification date | |
| 16 | 2 | Last access time | |
| 18 | 2 | Last access date | |
| 20 | 1 | Creation time fraction of seconds, which contains fraction of 2-seconds in 10 ms intervals | |
| 21 | 1 | Last modification time fraction of seconds, which contains fraction of 2-seconds in 10 ms intervals | |
| 22 | 1 | Creation time UTC offset, which contains number of 15 minute intervals of the time relative to UTC, where the MSB indicates the offset is valid | |
| 23 | 1 | Last modification time UTC offset, which contains number of 15 minute intervals of the time relative to UTC, where the MSB indicates the offset is valid | |
| 24 | 1 | Last access time UTC offset, which contains number of 15 minute intervals of the time relative to UTC, where the MSB indicates the offset is valid | |
| 25 | 7 | 0 | Unknown (Reserved) |
Volume identifier directory entry
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | 0xa0 | Entry type |
| 1 | 1 | Unknown (Secondary count) | |
| 2 | 2 | Unknown (Set checksum) | |
| 4 | 2 | Unknown (Flags) | |
| 6 | 16 | Volume identifier, which contains a GUID | |
| 22 | 10 | 0 | Unknown (Reserved) |
Data stream directory entry
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | 0xc0 | Entry type |
| 1 | 1 | Unknown (Flags) | |
| 2 | 1 | 0 | Unknown (Reserved) |
| 3 | 1 | Name number of characters | |
| 4 | 2 | Name hash | |
| 6 | 2 | 0 | Unknown (Reserved) |
| 8 | 8 | Data stream valid data size | |
| 16 | 4 | 0 | Unknown (Reserved) |
| 20 | 4 | Data stream start cluster | |
| 24 | 8 | Data stream size |
File entry name directory entry
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | 0xc1 | Entry type |
| 1 | 1 | Unknown (Flags) | |
| 2 | 30 | Name string, which contains an UCS-2 little-endian string without an end-of-string character |
File attribute flags
| Value | Description |
|---|---|
| 0x0001 | Read-only |
| 0x0002 | Hidden |
| 0x0004 | System |
| 0x0008 | Is volume label |
| 0x0010 | Is directory |
| 0x0020 | Archive |
| 0x0040 | Is device |
| 0x0080 | Unused (reserved) |
References
- exFAT file system specification, by Microsoft
- exFAT, by Wikipedia
File Allocation Table (FAT) file system format
The File Allocation Table (FAT) is widely used a file sytem and is the default file system for DOS and Windows.
There are multiple known variants or derivatives of FAT, such as:
- (original) 8-bit FAT
- FAT-12
- FAT-16
- FAT-32
- exFAT
Overview
A FAT file system consists of:
- One or more reserved sectors
- a boot record (or boot sector)
- file system informartion for FAT-32
- One or more cluster block allocation tables
- Root directory data for FAT-12 and FAT-16
- File and directory data
Note that FAT-32 stores the root directory as part of the file and directory data.
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | little-endian |
| Date and time values | FAT date and time |
| Character strings | A narrow character Single Byte Character (SBC) ASCII string |
Terminology
| Term | Description |
|---|---|
| Hidden sectors | The sectors stored before the FAT volume, such as those used to store a parition table |
Determing the FAT format version
To distinguish between FAT-12, FAT-16 and FAT-32, compute the number of clusters in the data area:
data_area_size = total_number_of_sectors - (number_of_reserved_sectors + (
number_of_allocation_tables * allocation_table_size) + size_of_root_directory)
number_of_clusters = round down (data_area_size / sectors_per_cluster)
- FAT-12 is used if the number of clusters is less than 4085
- FAT-16 is used if the number of clusters is less than 65525
- FAT-32 is used otherwise
Boot record
The boot record is stored in the first sector of the volume.
FAT-12 and FAT-16 boot record
The FAT-12 and FAT-16 boot record is at least 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 3 | "\xeb\x3c\x90" | Boot entry point (JMP +62, NOP) |
| 3 | 8 | File system signature (or OEM name) | |
| 11 | 2 | Bytes per sector, which must be 512, 1024, 2048 or 4096 | |
| 13 | 1 | Sectors per cluster block, which must be 1, 2, 4, 8, 16, 32, 64 or 128 | |
| 14 | 2 | Number of reserved sectors (reserved region), which starts at the first sector of the volume (sector 0) and must be 1 or more (typically 1 or 32) | |
| 16 | 1 | Number of cluster block allocation tables, which must be 1 or more (typically 2) | |
| 17 | 2 | Number of root directory entries | |
| 19 | 2 | Total number of sectors (16-bit) | |
| 21 | 1 | Media descriptor | |
| 22 | 2 | Cluster block allocation table size (16-bit) in number of sectors | |
| 24 | 2 | Number of sectors per track | |
| 26 | 2 | Number of heads | |
| 28 | 4 | Number of hidden sectors | |
| 32 | 4 | Total number of sectors (32-bit) | |
| 36 | 1 | Drive number | |
| 37 | 1 | 0 | Unknown (reserved for Windows NT) |
| 38 | 1 | Extended boot signature | |
| If extended boot signature == 0x29 | |||
| 39 | 4 | Volume serial number, which can be derived from the system current date and time | |
| 43 | 11 | Volume label, which contains a narrow character string or "NO\x20NAME\x20\x20\x20\x20" if not set | |
| 54 | 8 | "FAT12\x20\x20\x20" or "FAT16\x20\x20\x20" | File system hint, which is informational and not required |
| If extended boot signature != 0x29 | |||
| 39 | 23 | Unknown | |
| Common | |||
| 62 | 448 | Used for boot code | |
| 510 | 2 | "\x55\xaa" | Sector signature |
Note that the sector signature must be set at offset 512 but in addition can be set in the last 2 bytes of the sector.
FAT-32 boot record
The FAT-32 boot record is at least 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 3 | "\xeb\x58\x90" | Boot entry point (JMP +90, NOP) |
| 3 | 8 | File system signature (or OEM name) | |
| 11 | 2 | Bytes per sector, which must be 512, 1024, 2048 or 4096 | |
| 13 | 1 | Sectors per cluster block, which must be 1, 2, 4, 8, 16, 32, 64 or 128 | |
| 14 | 2 | Number of reserved sectors (reserved region), which starts at the first sector of the volume (sector 0) and must be 1 or more (typically 1 or 32) | |
| 16 | 1 | Number of cluster block allocation tables, which must be 1 or more (typically 2) | |
| 17 | 2 | 0 | Number of root directory entries, which must be 0 for FAT-32 |
| 19 | 2 | 0 | Total number of sectors (16-bit), which must be 0 for FAT-32 |
| 21 | 1 | Media descriptor | |
| 22 | 2 | 0 | Cluster block allocation table size (16-bit) in number of sectors, which must be 0 for FAT-32 |
| 24 | 2 | Number of sectors per track | |
| 26 | 2 | Number of heads | |
| 28 | 4 | Number of hidden sectors | |
| 32 | 4 | Total number of sectors (32-bit) | |
| 36 | 4 | Cluster block allocation table size (32-bit) in number of sectors, which must be non 0 for FAT-32 | |
| 40 | 2 | Extended flags | |
| 42 | 1 | 0 | Format revision minor number |
| 43 | 1 | 0 | Format revision major number |
| 44 | 4 | Root directory start cluster | |
| 48 | 2 | File system information (FSINFO) sector number | |
| 50 | 2 | Boot record sector number | |
| 52 | 12 | 0 | Unknown (reserved) |
| 64 | 1 | Drive number | |
| 65 | 1 | 0 | Unknown (reserved for Windows NT) |
| 66 | 1 | Extended boot signature | |
| If extended boot signature == 0x29 | |||
| 67 | 4 | Volume serial number, which can be derived from the system current date and time | |
| 71 | 11 | Volume label, which contains a narrow character string or "NO\x20NAME\x20\x20\x20\x20" if not set | |
| 82 | 8 | "FAT32\x20\x20\x20" | File system hint, which is informational and not required |
| If extended boot signature != 0x29 | |||
| 67 | 23 | Unknown | |
| Common | |||
| 90 | 420 | Used for boot code | |
| 510 | 2 | "\x55\xaa" | Sector signature |
Note that the sector signature must be set at offset 512 but in addition can be set in the last 2 bytes of the sector.
OEM names
| Value | Description |
|---|---|
| "MSWIN4.1" | |
| "MSDOS 5.0" |
Media descriptors
| Value | Identifier | Description |
|---|---|---|
| 0xe5 | ||
| 0xed | ||
| 0xee | ||
| 0xef | ||
| 0xf0 | removable media | |
| 0xf4 | ||
| 0xf5 | ||
| 0xf8 | fixed (non-removable) media | |
| 0xf9 | ||
| 0xfa | ||
| 0xfb | ||
| 0xfc | ||
| 0xfd | ||
| 0xfe | ||
| 0xff |
Cluster block allocation table
A cluster block allocation table consists of:
- One ore more cluster block allocation table entries
FAT 12 cluster block allocation table entry
A FAT 12 cluster block allocation table entry is 12 bits in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 12 bits | Data cluster number |
Where the data cluster number has the following meanings:
| Value(s) | Description |
|---|---|
| 0x000 | Unused (free) cluster |
| 0x001 | Unknown (invalid) |
| 0x002 - 0xfef | Used cluster |
| 0xff0 - 0xff6 | Reserved |
| 0xff7 | Bad cluster |
| 0xff8 - 0xfff | End of cluster chain |
FAT 16 cluster block allocation table entry
A FAT 16 cluster block allocation table entry is 16 bits in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 bits | Data cluster number |
Where the data cluster number has the following meanings:
| Value(s) | Description |
|---|---|
| 0x0000 | Unused (free) cluster |
| 0x0001 | Unknown (invalid) |
| 0x0002 - 0xffef | Used cluster |
| 0xfff0 - 0xfff6 | Reserved |
| 0xfff7 | Bad cluster |
| 0xfff8 - 0xffff | End of cluster chain |
FAT 32 cluster block allocation table entry
A FAT 32 cluster block allocation table entry is 32 bits in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 32 bits | Data cluster number |
Note that only the lower 28-bits are used
Where the data cluster number has the following meanings:
| Value(s) | Description |
|---|---|
| 0x00000000 | Unused (free) cluster |
| 0x00000001 | Unknown (invalid) |
| 0x00000002 - 0x0fffffef | Used cluster |
| 0x0ffffff0 - 0x0ffffff6 | Reserved |
| 0x0ffffff7 | Bad cluster |
| 0x0ffffff8 - 0x0fffffff | End of cluster chain |
| 0x10000000 - 0xffffffff | Unknown |
Directory
A directory consists of:
- self (“.”) directory entry (not used in root directory)
- parent (“..”) directory entry (not used in root directory)
- Zero or more directory entries
- Terminator directory entry
Directory entry
Determining the root directory location
first_allocation_table_offset = number_of_reserved_sectors * bytes_per_sector
FAT-12 and FAT-16 root directory
root_directory_start_offset = first_allocation_table_offset + (
number_of_allocation_tables * allocation_table_size * bytes_per_sector)
first_cluster_offset = directory_start_sector + (number_of_root_directory_entries * 32)
FAT-32 root directory
first_cluster_offset = first_allocation_table_sector + (
number_of_allocation_tables * allocation_table_size * bytes_per_sector)
root_directory_start_offset = first_cluster_sector + (
(root_directory_cluster - 2) * number_of_sectors_per_cluster)
FAT-12 and FAT-16 directory entry
A FAT-12 and FAT-16 directory entry is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Name, which is padded with spaces and the first character can have a special meaning | |
| 8 | 3 | Extension, which is padded with spaces | |
| 11 | 1 | File attribute flags | |
| 12 | 1 | Flags | |
| 13 | 1 | Creation time fraction of seconds, which contains fraction of 2-seconds in 10 ms intervals | |
| 14 | 2 | Creation time | |
| 16 | 2 | Creation date | |
| 18 | 2 | Last access date | |
| 20 | 2 | Unknown (OS/2 extended attribute) | |
| 22 | 2 | Last modification time | |
| 24 | 2 | Last modification date | |
| 26 | 2 | Data stream start cluster | |
| 28 | 4 | Data stream data size |
FAT-32 directory entry
A FAT-32 directory entry is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Name, which is padded with spaces and the first character can have a special meaning | |
| 8 | 3 | Extension, which is padded with spaces | |
| 11 | 1 | File attribute flags | |
| 12 | 1 | Flags | |
| 13 | 1 | Creation time fraction of seconds, which contains fraction of 2-seconds in 10 ms intervals | |
| 14 | 2 | Creation time | |
| 16 | 2 | Creation date | |
| 18 | 2 | Last access date | |
| 20 | 2 | Data stream data size, which contains the upper 16-bit of the value | |
| 22 | 2 | Last modification time | |
| 24 | 2 | Last modification date | |
| 26 | 2 | Data stream start cluster, which contains the lower 16-bit of the value | |
| 28 | 4 | Data stream data size |
Short (or 8.3) file name
A FAT short (or 8.3) file name is stored in an OEM character set (codepage). The first character can have a special meaning.
Valid FAT short file name characters are:
| Value | Description |
|---|---|
| 'A-Z' | Upper case character |
| '0-9' | Numeric character |
| ' ' | Space, where trailing spaces are considered padding and therefore ignored |
| '.' | Dot, with the exception of "." and "..", where trailing dot characters are ignored |
| '!' | Exclamation mark |
| '#' | Hash |
| '$' | Dollar sign |
| '%' | Percent sign |
| '&' | Ampersand |
| ''' | Single quote |
| '(' | Left parenthesis |
| ')' | Right parenthesis |
| '-' | Hyphen |
| '@' | At sign |
| '^' | Caret |
| '_' | Underscore |
| '`' | Grave accent |
| '{' | Left curly brace |
| '}' | Right curly brace |
| '~' | Tilde |
| 0x80 - 0xff | Extended ASCII character, which are codepage dependent |
Note that other characters such as plus sign (‘+’) have been observed in FAT short file names.
First character
| Value | Description |
|---|---|
| 0x00 | Last (or terminator) directory entry |
| 0x01 - 0x13 | VFAT long file name directory entry |
| 0x05 | Directory entry pending deallocation (deprecated since DOS 3.0) or substitution of a 0xe5 value |
| 0x41 - 0x54 | Last VFAT long file name directory entry |
| 0xe5 | Unallocated directory entry |
File attribute flags
| Value | Description |
|---|---|
| 0x01 | Read-only |
| 0x02 | Hidden |
| 0x04 | System |
| 0x08 | Is volume label |
| 0x10 | Is directory |
| 0x20 | Archive |
| 0x40 | Is device |
| 0x80 | Unused (reserved) |
Flags
| Value | Description |
|---|---|
| 0x01 | Data is EFS encrypted |
| 0x02 | Data contains large EFS header |
| 0x08 | Name should be represented in lower case |
| 0x10 | Extension should be represented in lower case |
VFAT long file name entry
VFAT long file names entries are stored in directory entries. Multiple VFAT long file name entries can be used to store a single long file name, where the highest (last) sequence number is stored first. A maximum of 20 VFAT long file name entries can be used to store a long file name of 255 UCS-2 characters.
VFAT long file names are stored using UCS-2 little-endian, which allows for unpaired Unicode surrogates such as “U+d800” and “U+dc00”
VFAT long file name entries are stored before the directory entry containing the short file name and additional file entry information.
A VFAT long file name entry is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | Sequence number | |
| 1 | 10 | First name segment string, which contains 5 UCS-2 string characters | |
| 11 | 1 | 0x0f | Unknown (attributes) |
| 12 | 1 | 0x00 | Unknown (type) |
| 13 | 1 | Checksum of the short (8.3) file name | |
| 14 | 12 | Second name segment string, which contains 6 UCS-2 string characters | |
| 26 | 2 | 0 | Unknown (first cluster) |
| 28 | 4 | Third name segment string, which contains 2 UCS-2 string characters |
Note that unused characters in the VFAT long file segment strings after the end-of-string character (0x0000) are padded with 0xffff.
VFAT long file name sequence number
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 5 bits | Number | |
| 0.5 | 1 bit | 0 | Unknown (reserved) |
| 0.6 | 1 bit | 0 | Unknown (last logical, first physical LFN entry) |
| 0.7 | 1 bit | 0 | Unknown |
References
- Microsoft Extensible Firmware Initiative FAT32 File System Specification, by Microsoft
- Design of the FAT file system, by Wikipedia
- File Allocation Table, by Wikipedia
Hierarchical File System (HFS) format
The Hierarchical File System (HFS) was the default file system for Mac OS after Macintosh File System (MFS) and before Apple File System (APFS).
Note that this document uses Mac OS to refer to the Macintosh Operating System in general, instead of specific versions like Mac OS X or macOS. Mac OS X is used to refer to version of Mac OS 10.0 or later.
There are multiple known variants or derivatives of HFS, such as:
- HFS
- HFS+ 8.10, used by Mac OS 8.1 to 9.2.2
- HFS+ 10.0, introduced in Mac OS 10.0
- HFSX, introduced in Mac OS 10.3
Note that HFS can be referred to as “HFS Standard” and HFS+ or HFSX as “HFS Extended”.
HFSX (or HFS/X) is an extension to HFS+ to allow additional features that are incompatible with HFS+. One such feature is case-sensitive file names. A HFSX volume may be either case-sensitive or case-insensitive. Case sensitivity (or lack thereof) applies to all file and directory names on the volume.
Overview
| Feature | HFS | HFS+ and HFSX |
|---|---|---|
| Maximum file size | 231 (2 GiB) | 263 (8 EiB) |
| Maximum file name size | 31 characters | 255 characters |
| Maximum number of blocks | 216 (65535 bytes) | 232 (4294967296 bytes) |
| Character set | narrow character with codepage | Unicode UTF-16 big-endian |
| Time stamps | In local time | In UTC |
| Catalog B-tree file node size | 512 bytes | 4096 bytes |
| File attributes | none | Basic and extended |
HFS
A HFS file system consists of:
- optional MFS boot block
- master directory block (MDB)
- volume bitmap
- extents overflow file
- catalog file
- optional backup (or alternate) master directory block (MDB)
The backup master directory block (MDB), is stored in the last 2 sectors of the volume.
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | big-endian |
| Date and time values | HFS timestamp in local time |
| Character strings | Narrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a system defined codepage |
HFS+ and HFSX
A HFS+ or HFSX file system consists of:
- reserved (or unused) blocks
- volume header
- allocation file
- extents overflow file
- catalog file
- optional attributes file
- optional startup file
- optional backup (or alternate) volume header
The backup volume header, is stored in the last 1024 bytes of the volume.
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | big-endian |
| Date and time values | HFS timestamp in UTC |
| Character strings | UTF-16 big-endian |
Terminology
| Term | Description |
|---|---|
| Clump size | Size of the group of (allocation) blocks (or clump), in bytes, to avoid fragmentation |
Unicode strings
Unicode strings are stored as UTF-16 big-endian in Normalization Form Canonical Decomposition (NFD) based on Unicode 3.2, with exclusions. Unicode values in the ranges U+2000 - U+2FFF, U+F900 - U+FAFF and U+2F800 - U+2FAFF are not decomposed.
On Mac OS 8.1 through 10.2.x decomposition was based on Unicode 2.1.
TODO: determine what the impact of the different Unicode versions is.
Note that based on observations on Mac OS 10.15.7 on HFS+ the range U+1D000 - U+1D1FF is excluded from decomposition and U+2400 is replaced by U+0.
HFS timestamp
Date and time values are stored as an unsigned 32-bit integer containing the number of seconds since January 1, 1904 at 00:00:00 (midnight), where:
- MFS and HFS use local time;
- HFS+ and HFSX use Coordinated Universal Time (UTC).
This document will refer to both forms as HFS timestamp.
The maximum representable date is February 6, 2040 at 06:28:15 UTC.
The HFS timestamp does not account for leap seconds. It includes a leap day in every year that is evenly divisible by 4. This is sufficient given that the range of representable dates does not contain 1900 or 2100, neither of which have leap days.
File names
TN1150 states that HFS file names are compared in case-insensitive assuming a MacRoman encoding.
| Upper case | Lower case |
|---|---|
| 0x41 - 0x5a (A - Z) | 0x61 - 0x7a (a - z) |
| 0x80 (Ä) | 0x8a (ä) |
| 0x81 (Å) | 0x8c (å) |
| 0x82 (Ç) | 0x8d (ç) |
| 0x83 (É) | 0x8e (é) |
| 0x84 (Ñ) | 0x96 (ñ) |
| 0x85 (Ö) | 0x9a (ö) |
| 0x86 (Ü) | 0x9f (ü) |
| 0xae (Æ) | 0xbe (æ) |
| 0xaf (Ø) | 0xbf (ø) |
| 0xcb (À) | 0x88 (à) |
| 0xcc (Ã) | 0x8b (ã) |
| 0xcd (Õ) | 0x9b (õ) |
| 0xce (Œ) | 0xcf (œ) |
| 0xd9 (Ÿ) | 0xd8 (ÿ) |
| 0xe5 (Â) | 0x89 (â) |
| 0xe6 (Ê) | 0x90 (ê) |
| 0xe7 (Á) | 0x87 (á) |
| 0xe8 (Ë) | 0x91 (ë) |
| 0xe9 (È) | 0x8f (è) |
| 0xea (Í) | 0x92 (í) |
| 0xeb (Î) | 0x94 (î) |
| 0xec (Ï) | 0x95 (ï) |
| 0xed (Ì) | 0x93 (ì) |
| 0xee (Ó) | 0x97 (ó) |
| 0xef (Ô) | 0x99 (ô) |
| 0xf1 (Ò) | 0x98 (ò) |
| 0xf2 (Ú) | 0x9c (ú) |
| 0xf3 (Û) | 0x9e (û) |
| 0xf4 (Ù) | 0x9d (ù) |
HFS+ allows for the “/” character in file names. On Mac OS, Finder this will be represented as a “/” but in Terminal it is replaced by “:” since the same character is used as path segment separator. A file name with a “:” created in Terminal will be shown as “/” in Finder. Finder does not allow the creation of a file containing “:” in the name. A symbolic link created in Terminal to a file with a “:” in name will not convert the “:” character in the link target data. The Linux HFS+ implementation appears to apply a similar conversion logic as Terminal.
B-tree files
HFS, HFS+ and HFSX use multiple B-trees files.
A B-tree file consists of fixed sized nodes:
- header node
- map nodes
- index (root and branch) nodes
- leaf nodes
Note that only the data fork of a B-tree file is used. The resource fork should be unused.
The size of a B-tree file can be calculated in the following manner:
size = number_of_nodes * node_size
Node size
The node size is determined when the B-tree file is created.
| Feature | HFS | HFS+ and HFSX |
|---|---|---|
| Node size | 512 bytes | where the value must be a power of 2 in the range 512 - 32768 |
In a HFS+ the B-tree node size is stored in the header node.
Default node sizes:
| Feature | HFS | HFS+ and HFSX |
|---|---|---|
| catalog file | 512 | 4 KiB (8 KiB in Mac OS X) |
| extents overflow file | 512 | 1 KiB (4 KiB in Mac OS X) |
| attributes file | N/A | 4 KiB |
B-tree (file) node
A B-tree file node consists of:
- node descriptor
- node records
- node record offsets
The first node in the file is referenced by node number 0.
The node offset relative to the start of the file and can be calculated in the following manner:
node_offset = node_number * node_size
B-tree node descriptor
The B-tree node descriptor (BTNodeDescriptor) is 14 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Next tree node number (forward link), which contains 0 if empty | |
| 4 | 4 | Previous tree node number (backward link), which contains 0 if empty | |
| 8 | 1 | Node type, which consists of a signed 8-bit integer | |
| 9 | 1 | Node level, which consists of a signed 8-bit integer | |
| 10 | 2 | Number of records | |
| 12 | 2 | 0 | Unknown (Reserved), should contain 0 |
The root node level is 0, with a maximum depth of 8.
B-tree node types
| Value | Identifier | Description |
|---|---|---|
| -1 | kBTLeafNode | leaf node |
| 0 | kBTIndexNode | index node |
| 1 | kBTHeaderNode | header node |
| 2 | kBTMapNode | map node |
B-tree node record
The B-tree node record contains (leaf) data or a reference to an index node and consists of:
- a key
- value data
B-tree record offsets
The B-tree record offsets are an array of 16-bit integers relative from the start of the B-tree node descriptor. The first record offset is found at node size - 2, e.g. 512 - 2 = 510, the second 2 bytes before that, e.g. 508, etc.
An additional record offset is added at the end to signify the start of the free space.
Note that the record offsets are not necessarily stored in linear order.
B-tree header node
The B-tree header node is stored in the first node of the B-tree file and contains 3 records:
- the B-tree header record;
- the user data record, which consist of 128 bytes (reserved within HFS);
- the B-tree map record.
Note that the records in the B-tree header node do not have keys.
B-tree header record
The B-tree header record (BTHeaderRec) is 106 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Depth of the tree | |
| 2 | 4 | Root node number | |
| 6 | 4 | Number of data records contained in leaf nodes | |
| 10 | 4 | First leaf node number | |
| 14 | 4 | Last leaf node number | |
| 18 | 2 | Node size, in bytes, where the value must be a power of 2 in the range 512 - 32768 | |
| 20 | 2 | Maximum key size, in bytes | |
| 22 | 4 | Number of nodes | |
| 26 | 4 | Number of unused nodes | |
| HFS | |||
| 30 | 76 | Unknown (Reserved) | |
| HFS+/HFSX | |||
| 30 | 2 | Unknown (Reserved) | |
| 32 | 4 | Clump size, in bytes | |
| 36 | 1 | B-tree file type | |
| 37 | 1 | Key comparision method | |
| 38 | 4 | Flags (or attributes) | |
| 42 | 16 x 4 = 64 | Unknown (Reserved) | |
TODO: does the number of data records equal the number of leaf nodes?
File type
| Value | Identifier | Description |
|---|---|---|
| 0x00 | Control file | |
| 0x80 | First user B-tree type | |
| 0xff | Reserved B-tree type |
Key comparision methodtype
| Value | Identifier | Description |
|---|---|---|
| 0x00 | Unknown (not set), observed on HFS standard, HFS+ and an empty HFSX file system | |
| 0xbc | Binary compare (case-sensitive) | |
| 0xcf | Unicode case folding (case-insensitive) |
Flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | kBTBadCloseMask | Bad close, which indicates that the B-tree was not closed properly and should be checked for consistency (Not used by HFS+ and HFSX) |
| 0x00000002 | kBTBigKeysMask | Big keys, which indicates the key data size value of the keys in index and leaf nodes is 16-bit integer, otherwise, it is an 8-bit integer (Must be set for HFS+ and HFSX) |
| 0x00000004 | kBTVariableIndexKeysMask | Variable-size (index) keys, which indicates that the keys in index nodes occupy the number of bytes indicated by their key size; otherwise, the keys in index nodes always occupy maximum key size (must be set for the HFS+ and HFSX Catalog B-tree, and cleared for the HFS+ and HFSX Extents overflow B-tree) |
B-tree map record
The B-tree map record contains of a bitmap that indicates which nodes in the B-tree file are used and which are not. If a bit is set, then the corresponding node in the B-tree file is in use.
The bitmap is 256 bytes in size and can represent a maximum of 2048 nodes. If more nodes are needed a map node is used to store additional mappings.
The map node
If a B-tree file contains more than 2048 nodes, which are enough for about 8000 files, a map node is used to store additional node-mapping information.
The next tree node value in the B-tree node descriptor of the header node is used to refer to the first map node.
A map node consists of a B-tree node descriptor and one B-tree map record. The map record is 494 bytes in size 512 - (14 + 2) and can therefore contain mapping information for 3952 nodes.
If a B-tree contains more than 6000 nodes (enough for about 25000 files) a second map node is needed. The next tree node value in the B-tree node descriptor of the first map node is used to refer to the second.
If more map nodes are required, each additional map node is similarly linked to the previous one.
The root node
The root node is the start of the B-tree structure; usually the root node is an index node, but it might be a leaf node if there are no index nodes.
The root node number is stored in the B-tree header record and is 0 if the B-tree is empty.
The index node
The records stored in an index node are called pointer records. A pointer record consists of a key followed by the node number of the corresponding node. The size of the key varies according to the type of B-tree file.
- In a catalog file, the search key is a combination of the file or directory name and the parent identifier of that file or directory.
- In an extents overflow file, the search key is a combination of that file’s type, its file identifier and the index of the first block in the extent.
The immediate descendants of an index node are called the children of the index node. An index node can have from 1 to 15 children, depending on the size of the pointer records that the index node contains.
The leaf node
The leaf nodes contain data records. The structure of the leaf node data records varies according to the type of B-tree.
- In an extents overflow file, the leaf node data records consist of a key and an extent record.
- In a catalog file, the leaf node data records can be any one of four kinds of records.
HFS Master Directory Block (MDB)
The primary Master Directory Block (MDB) (or volume information block (VIB)) is located at offset 1024 of the volume.
The MDB is 162 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | "BD" (or "\x42\x44") | Volume signature |
| 2 | 4 | Creation time, which contains a HFS timestamp in local time | |
| 6 | 4 | (last) modification time, which contains a HFS timestamp in local time | |
| 10 | 2 | Volume attribute flags | |
| 12 | 2 | Number of files in the root directory | |
| 14 | 2 | Volume bitmap block number, contains a block number relative from the start of the volume, where 0 is the first block number, typically 3 | |
| 16 | 2 | Next allocation search block number | |
| 18 | 2 | Number of blocks, where a volume can contain at most 65535 blocks | |
| 20 | 4 | Block size, in bytes, must be a multitude of 512 | |
| 24 | 4 | Clump size, in bytes | |
| 28 | 2 | Data area block number, contains a block number relative from the start of the volume, where 0 is the first block number | |
| 30 | 4 | Next available catalog node identifier (CNID), which can be a directory or file record identifier | |
| 34 | 2 | Number of unused blocks | |
| 36 | 1 | Volume label size, with a maximum of 27 | |
| 37 | 27 | Volume label | |
| 64 | 4 | (last) backup time, which contains a HFS timestamp in local time | |
| 68 | 2 | Backup sequence number | |
| 70 | 4 | Volume write count, which contains the number of times the volume has been written to | |
| 74 | 4 | Extents overflow file clump size, in bytes | |
| 78 | 4 | Catalog file clump size, in bytes | |
| 82 | 2 | Number of sub directories in the root directory | |
| 84 | 4 | Total number of files, which does not include file system metadata files | |
| 88 | 4 | Total number of directories (folders), which does not include the root folder | |
| 92 | 32 | Finder information | |
| 124 | 2 | Embedded volume signature (drVCSize) | |
| 126 | 4 | Embedded volume extent descriptor (drVBMCSize and drCtlCSize) | |
| 130 | 4 | Extents overflow file size | |
| 134 | 12 | Extents overflow file extents record | |
| 146 | 4 | Catalog file size | |
| 150 | 12 | Catalog file extents record |
Note that the volume modification time is not necessarily the data and time when the volume was last flushed.
Notes
TODO: check
- drVCSize => Volume cache block size (16-bit)
- drVBMCSize => Volume bitmap cache block size (16-bit)
- drCtlCSize => Common volume cache block size (16-bit)
HFS Volume Bitmap
The volume bitmap is used to keep track of block allocation. The bitmap contains one bit for each block in the volume.
- If a bit is set, the corresponding block is currently in use by some file.
- If a bit is clear, the corresponding block is not currently in use by any file and is available.
The volume bitmap does not indicate which files occupy which blocks. The actual file-mapping information in maintained in two locations:
- in the corresponding catalog entry;
- in the corresponding extents overflow file entry.
The size of the volume bitmap depends on the number of blocks in the volume.
A 800 KiB floppy disk with a block size of 512 bytes has a volume bitmap size of:
((800 * 1024) / (512 * 8)) = 1600 bits (200 bytes).
A 32 MiB volume containing 32 MiB with a block size of 512 bytes has a volume bitmap size of:
((32 * 1024 * 1024) / (512 * 8)) = 65536 bits (8192 bytes).
The number of blocks in the volume in the MDB consists of a 16-bit integer, so no more than 65535 blocks can be addressed. The volume bitmap is never larger than 8192 bytes (or 16 physical blocks). For volumes containing more than 32 MiB of space, the block size must be increased.
A volume containing 40 MiB of space must have an block size that is at least 2 x 512 bytes.
A volume containing 80 MiB of space must have an block size that is at least 3 x 512 bytes.
HFS+ and HFSX Volume Header
The volume header (HFSPlusVolumeHeader) replaces the master directory block (MDB). The volume header starts at offset 1024 of the volume.
The block containing the first 1536 bytes (reserved space plus volume header) are marked as used in the allocation file.
The volume header is 512 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | "H+" (or "\x48\x2b") or "HX" (or "\x48\x58") | Volume signature, where "H+" (kHFSPlusSigWord) is used for HFS+ and "HX" (kHFSXSigWord) for HFSX |
| 2 | 2 | Format version, where 4 (kHFSPlusVersion) is used for HFS+ and 5 (kHFSXVersion) for HFSX | |
| 4 | 4 | Volume attribute flags | |
| 8 | 4 | Last mounted version | |
| 12 | 4 | Journal information block number, contains a block number relative from the start of the volume | |
| 16 | 4 | Creation time, which contains a HFS timestamp in UTC | |
| 20 | 4 | (last) content modification time, which contains a HFS timestamp in UTC | |
| 24 | 4 | (last) backup time, which contains a HFS timestamp in UTC | |
| 28 | 4 | Checked time, which contains a HFS timestamp in UTC | |
| 32 | 4 | Total number of files, which does not include file system metadata files | |
| 36 | 4 | Total number of directories (folders), which does not include the root folder | |
| 40 | 4 | Block size, in bytes | |
| 44 | 4 | Total number of blocks | |
| 48 | 4 | Number of unused blocks | |
| 52 | 4 | Next allocation search block number (nextAllocation) | |
| 56 | 4 | Clump size, in bytes, of a resource fork | |
| 60 | 4 | Clump size, in bytes, of a data fork | |
| 64 | 4 | Next available catalog node identifier (CNID), which can be a directory or file record identifier | |
| 68 | 4 | Volume write count, which contains the number of times the volume has been written to | |
| 72 | 8 | Encodings bitmap | |
| 80 | 32 | Finder information | |
| 112 | 80 | Allocation file fork descriptor | |
| 192 | 80 | Extents overflow file fork descriptor | |
| 272 | 80 | Catalog file fork descriptor | |
| 352 | 80 | Attributes file fork descriptor | |
| 432 | 80 | Startup file fork descriptor |
Total number of blocks
For a disk whose size is an even multiple of the block size, all areas on the disk are included in an block, including the volume header and backup volume header. For a disk whose size is not an even multiple of the block size, only the blocks that will fit entirely on the disk are counted here. The remaining space at the end of the disk is not used by the volume format (except for storing the backup volume header, as described above).
Volume attribute flags
The volume attributes flags are specified as following.
| Value | Identifier | Description |
|---|---|---|
| 0x00000080 | kHFSVolumeHardwareLockBit | Volume hardware lock, set if the volume is write-protected due to a hardware setting |
| 0x00000100 | kHFSVolumeUnmountedBit | Volume unmounted, set if the volume was correctly flushed before being unmounted or ejected |
| 0x00000200 | kHFSVolumeSparedBlocksBit | Volume spared blocks, set if there are any records in the extents overflow file for bad blocks |
| 0x00000400 | kHFSVolumeNoCacheRequiredBit | Volume no cache required, set if the blocks from this volume should not be cached |
| 0x00000800 | kHFSBootVolumeInconsistentBit | Boot volume inconsistent, set if the volume was mounted for writing |
| 0x00001000 | kHFSCatalogNodeIDsReusedBit | Catalog node identifiers reused, set when the next catalog identifier value overflows 32 bits, forcing smaller catalog node identifiers to be reused |
| 0x00002000 | kHFSVolumeJournaledBit | Journaled, set if the file system uses a journal |
| 0x00004000 | kHFSVolumeInconsistentBit | Unknown (Reserved) |
| 0x00008000 | kHFSVolumeSoftwareLockBit | Volume software lock, set if the volume is write-protected due to a software setting |
| 0x40000000 | kHFSContentProtectionBit | Unknown (Reserved) |
| 0x80000000 | kHFSUnusedNodeFixBit | Unknown (Reserved) |
Last mounted version
| Value | Identifier | Description |
|---|---|---|
| "8.10" | used by Mac OS 8.1 to 9.2.2 | |
| "10.0" | kHFSPlusMountVersion | used by Mac OS X |
| "FSK!" or "fsck" | used by fsck_hfs on Mac OS X | |
| "HFSJ" | kHFSJMountVersion | used by journaled HFS+ or HFSX |
Links
TODO: add text about HFS standard
HFS+ supports both hard links and symbolic links.
Hard links to directories are not supported (allowed).
Hard Links
Hard links in HFS+/HFSX are represented by multiple different types of file records:
- one indirect node file record, named “iNode#”, where # is the link reference. This file contains the content of the file shared by the hard links.
- one or more hard link file records, that reference the indirect node file record.
Indirect node files are stored in a file system metadata directory referred to as the metadata directory with the name “/\u{2400}\u{2400}\u{2400}\u{2400}HFS+ Private Data”.
The link reference corresponds to the catalog node identifier (CNID) of the indirect node file, where 0 is not a valid link reference.
Note that TN1150 states that a new link reference randomly chosen from the range 100 to 1073741923. However link references that fall outside of this range have been observed such as “iNode20”.
The special permission data of the hard link file records contains the link reference if:
- the catalog file record flag kHFSHasLinkChainMask is set;
- and the first 8 bytes of the file information contains “hlnkhfs+”
| Value | Identifier | Description |
|---|---|---|
| "hlnk" | kHardLinkFileType | Hard link file type |
| "hfs+" | kHFSPlusCreator | Hard link file creator |
The hard link file’s creation date should be set to the creation date of the metadata directory, but the creation date may also be set to the creation date of the volume’s root directory though this is deprecated.
Device identifier
The Special permission data contains the device identifier. The device identifier can be stored in different formats, such as: “native”, “386bsd”, “4bsd”, “bsdos”, “freebsd”, “hpux”, “isc”, “linux”, “netbsd”, “osf1”, “sco”, “solaris”, “sunos”, “svr3”, “svr4” and “ultrix”.
The “native” and “hpux” device identifier is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | Major device number | |
| 1 | 2 | 0 | Unknown |
| 3 | 1 | Minor device number |
The “386bsd”, “4bsd”, “freebsd”, “isc”, “linux”, “netbsd”, “sco”, “sunos”, “svr3” and “ultrix” device identifier is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | 0 | Unknown |
| 2 | 1 | Major device number | |
| 3 | 1 | Minor device number |
The “solaris” and “svr4” device identifier is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 18 bits | Minor device number | |
| 2.2 | 14 bits | Major device number |
The “bsdos” and “osf1” device identifier is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 20 bits | Minor device number | |
| 2.4 | 12 bits | Major device number |
The “bsdos” alternative device identifier is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 8 bits | Sub unit number | |
| 1.0 | 12 bits | Unit number | |
| 2.4 | 12 bits | Major device number |
Symbolic Links
The data fork of a symbolic link contains the path of the directory or file it refers to.
On HFS+/HFSX the symbolic link target contains a POSIX pathname, as used by the Mac OS BSD and Cocoa programming interfaces; not a traditional Mac OS or Carbon, path.
The path is stored as an UTF-8 encoded string without an end-of-string character. The length of the path should be 1024 bytes or less. The path may be full or partial, with or without a leading forward slash.
The first 8 bytes of the file information should contain “slnkrhap”.
| Value | Identifier | Description |
|---|---|---|
| "slnk" | kSymLinkFileType | Symbolic link file type |
| "rhap" | kSymLinkCreator | Symbolic link file creator |
The resource fork of a symbolic link is reserved and should be 0 bytes in size.
The catalog file
The catalog file is a B-tree file used to maintain information about the hierarchy of files and directories of a volume.
The block number of the first file extent of the catalog file (the header node) is stored in the master directory block (HFS) or the volume header (HFS+). The B-tree structure is described in section: B-tree files.
Each node in the catalog file is assigned a unique catalog node identifier (CNID). The CNID is used for both directory and file identifiers. For any given file or directory the parent identifier is the CNID of the parent directory. The first 16 CNIDs are reserved for use by Apple and include the following standard assignments:
| CNID | Identifier | Assignment |
|---|---|---|
| 0 | Unknown (Reserved) | |
| 1 | kHFSRootParentID | Parent identifier of the root directory (folder) |
| 2 | kHFSRootFolderID | Directory identifier of the root directory (folder) |
| 3 | kHFSExtentsFileID | Extents overflow file |
| 4 | kHFSCatalogFileID | Catalog file |
| 5 | kHFSBadBlockFileID | Bad allocation block file |
| 6 | kHFSAllocationFileID | Allocation file (HFS+) |
| 7 | kHFSStartupFileID | Startup file (HFS+) |
| 8 | kHFSAttributesFileID | Attributes file (HFS+) |
| 14 | kHFSRepairCatalogFileID | Used temporarily by fsck_hfs when rebuilding the catalog file |
| 15 | kHFSBogusExtentFileID | Bogus extent file, which is used temporarily during exchange files operations |
| 16 | kHFSFirstUserCatalogNodeID | First available CNID for user's files and folders |
Catalog file keys
In a catalog file a key consists of:
- parent directory identifier
- (optional) file or directory name
The volume reference number is not included in the search key.
Text encoding hint
| Encoding type | Value | Encodings bitmap number |
|---|---|---|
| MacRoman | 0 | 0 |
| MacJapanese | 1 | 1 |
| MacChineseTrad | 2 | 2 |
| MacKorean | 3 | 3 |
| MacArabic | 4 | 4 |
| MacHebrew | 5 | 5 |
| MacGreek | 6 | 6 |
| MacCyrillic | 7 | 7 |
| MacDevanagari | 9 | 9 |
| MacGurmukhi | 10 | 10 |
| MacGujarati | 11 | 11 |
| MacOriya | 12 | 12 |
| MacBengali | 13 | 13 |
| MacTamil | 14 | 14 |
| MacTelugu | 15 | 15 |
| MacKannada | 16 | 16 |
| MacMalayalam | 17 | 17 |
| MacSinhalese | 18 | 18 |
| MacBurmese | 19 | 19 |
| MacKhmer | 20 | 20 |
| MacThai | 21 | 21 |
| MacLaotian | 22 | 22 |
| MacGeorgian | 23 | 23 |
| MacArmenian | 24 | 24 |
| MacChineseSimp | 25 | 25 |
| MacTibetan | 26 | 26 |
| MacMongolian | 27 | 27 |
| MacEthiopic | 28 | 28 |
| MacCentralEurRoman | 29 | 29 |
| MacVietnamese | 30 | 30 |
| MacExtArabic | 31 | 31 |
| MacSymbol | 33 | 33 |
| MacDingbats | 34 | 34 |
| MacTurkish | 35 | 35 |
| MacCroatian | 36 | 36 |
| MacIcelandic | 37 | 37 |
| MacRomanian | 38 | 38 |
| MacFarsi | 140 | 49 |
| MacUkrainian | 152 | 48 |
HFS catalog key
The HFS catalog key is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | Key data size, in bytes, which consists of a signed 8-bit integer | |
| If key data size >= 6 | |||
| 1 | 1 | Unknown (Reserved) | |
| 2 | 4 | Parent identifier (CNID) | |
| 6 | 1 | Name size without the end-of-string character | |
| 7 | ... | Name string, which contains a narrow character string without end-of-string character | |
| ... | ... | Unknown (Alignment padding) | |
Note that a key data size of 0 indicates a records that is no longer in use.
The catalog node name always is stored as 32 bytes and therefore the maximum key size within an index node should be 37. In a leaf node the catalog node name varies in size.
Keys in a leaf node must be stored 16-bit aligned within the node data. The size of the alignment padding is not included in the key data size.
HFS+ and HFSX catalog key
The HFS+ and HFSX catalog key is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Key data size, in bytes | |
| If key data size >= 4 | |||
| 2 | 4 | Parent identifier, which contains a CNID | |
| If key data size >= 6 | |||
| 6 | 2 | Number of characters in the name string | |
| 8 | ... | Name string, which contains an UTF-16 big-endian string without end-of-string character | |
Note that the characters ‘:’ and U+2400 are stored as ‘/’ and U+0 respectively and must be converted before comparision.
The catalog data
A catalog leaf node can contain four different types of records:
- a folder record, which contains information about a single directory.
- a file record, which contains information about a single file.
- a folder thread record, which provides a link between a directory and its parent directory.
- a file thread record, which provides a link between a file and its parent directory.
The thread records are used to find the name and directory identifier of the parent of a given file or directory.
Each catalog data record consists of:
- the catalog data record header;
- the catalog data record data.
The catalog data record header
HFS catalog data record header
The HFS catalog data record header is 2 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | Record type, which consists of a signed 8-bit integer | |
| 1 | 1 | 0x00 | Unknown (Reserved), which consists of a signed 8-bit integer |
Note that to distinguish between HFS and HFS+ record types, record type should be treated as a 16-bit big-endian value.
HFS+ and HFSX catalog data record header
The HFS+ and HFSX catalog data record header is 2 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Record type |
The catalog data record types
| Value | Identifier | Description |
|---|---|---|
| 0x0001 | kHFSPlusFolderRecord | HFS+/HFSX Folder record |
| 0x0002 | kHFSPlusFileRecord | HFS+/HFSX File record |
| 0x0003 | kHFSPlusFolderThreadRecord | HFS+/HFSX Folder thread record |
| 0x0004 | kHFSPlusFileThreadRecord | HFS+/HFSX File thread record |
| 0x0100 | kHFSFolderRecord (or cdrDirRec) | HFS Folder record |
| 0x0200 | kHFSFileRecord (or cdrFilRec) | HFS File record |
| 0x0300 | kHFSFolderThreadRecord (or cdrThdRec) | HFS Folder thread record |
| 0x0400 | kHFSFileThreadRecord (or cdrFThdRec) | HFS File thread record |
The catalog folder record
HFS catalog folder record
The HFS catalog folder record (cdrDirRec, kHFSFolderRecord) is 70 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | 0x0100 | Record type |
| 2 | 2 | Folder flags | |
| 4 | 2 | Number of directory entries (valence) | |
| 6 | 4 | Identifier (CNID) | |
| 10 | 4 | Creation time, which contains a HFS timestamp in local time | |
| 14 | 4 | (last) content modification time, which contains a HFS timestamp in local time | |
| 18 | 4 | (last) backup time, which contains a HFS timestamp in local time | |
| 22 | 16 | Folder information | |
| 38 | 16 | Extended folder information | |
| 54 | 4 x 4 = 16 | Unknown (Reserved), which consists of an array of 32-bit integer values |
HFS catalog folder record flags
Not defined. The HFS catalog folder record appears to always have a corresponding folder thread record.
HFS+ and HFSX catalog folder record
The HFS+ and HFSX catalog folder record (HFSPlusCatalogFolder) is 88 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | 0x0001 | Record type |
| 2 | 2 | Flags | |
| 4 | 4 | Number of directory entries (valence) | |
| 8 | 4 | Identifier (CNID) | |
| 12 | 4 | Creation time, which contains a HFS timestamp in UTC | |
| 16 | 4 | (last) content modification time, which contains a HFS timestamp in UTC | |
| 20 | 4 | (last) record (or attribute) modification (or change) time, which contains a HFS timestamp in UTC | |
| 24 | 4 | (last) access time, which contains a HFS timestamp in UTC | |
| 28 | 4 | (last) backup time, which contains a HFS timestamp in UTC | |
| Permissions | |||
| 32 | 4 | Owner identifier | |
| 36 | 4 | Group identifier | |
| 40 | 1 | Administration flags | |
| 41 | 1 | Owner flags | |
| 42 | 2 | File mode | |
| 44 | 4 | Special permission data | |
| Folder information | |||
| 48 | 16 | Folder information | |
| Extended folder information | |||
| 64 | 16 | Extended folder information | |
| 80 | 4 | Text encoding hint | |
| 84 | 4 | 0x00 | Unknown (Reserved) |
The catalog file record
HFS catalog file record
The HFS catalog file record (cdrFilRec, kHFSFileRecord) is 102 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | 0x0200 | Record type |
| 2 | 1 | Flags, which consists of a signed 8-bit integer | |
| 3 | 1 | 0x00 | File type, which consists of a signed 8-bit integer and should contain 0 |
| 4 | 16 | File information | |
| 20 | 4 | Identifier (CNID) | |
| 24 | 2 | Data fork block number | |
| 26 | 4 | Data fork size | |
| 30 | 4 | Data fork allocated size | |
| 34 | 2 | Resource fork block number | |
| 36 | 4 | Resource fork size | |
| 40 | 4 | Resource fork allocated size | |
| 44 | 4 | Creation time, which contains a HFS timestamp in local time | |
| 48 | 4 | (last) content modification time, which contains a HFS timestamp in local time | |
| 52 | 4 | (last) backup time, which contains a HFS timestamp in local time | |
| 56 | 16 | Extended file information | |
| 72 | 2 | Clump size | |
| 74 | 12 | Data fork extents record | |
| 86 | 12 | Resource fork extents record | |
| 98 | 4 | 0x00 | Unknown (Reserved) |
TODO: determine if the data and resource fork block number values are used
HFS catalog file record flags
| Value | Identifier | Description |
|---|---|---|
| 0x0001 | File is locked and cannot be written to | |
| 0x0002 | Has thread record | |
| 0x0080 | kHFSHasDateAddedMask | Had added time |
HFS+ and HFSX catalog file record
The HFS+ and HFSX catalog file record (kHFSPlusFileRecord) is 248 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | 0x0002 | Record type |
| 2 | 2 | Flags | |
| 4 | 4 | 0x00 | Unknown (Reserved) |
| 8 | 4 | Identifier (CNID) | |
| 12 | 4 | Creation time, which contains a HFS timestamp in UTC | |
| 16 | 4 | (last) content modification time, which contains a HFS timestamp in UTC | |
| 20 | 4 | (last) record (or attribute) modification time, which contains a HFS timestamp in UTC | |
| 24 | 4 | (last) access time, which contains a HFS timestamp in UTC | |
| 28 | 4 | (last) backup time, which contains a HFS timestamp in UTC | |
| Permissions | |||
| 32 | 4 | Owner identifier | |
| 36 | 4 | Group identifier | |
| 40 | 1 | Administration flags | |
| 41 | 1 | Owner flags | |
| 42 | 2 | File mode | |
| 44 | 4 | Special permission data | |
| File information | |||
| 48 | 16 | File information (or user information) | |
| Extended file information | |||
| 64 | 16 | Extended file information (or finder information) | |
| 80 | 4 | Text encoding hint | |
| 84 | 4 | 0x00 | Unknown (Reserved) |
| 88 | 80 | Data fork descriptor | |
| 168 | 80 | Resource fork descriptor | |
HFS+ catalog file record flags
| Value | Identifier | Description |
|---|---|---|
| 0x0001 | kHFSFileLockedMask | File is locked and cannot be written to |
| 0x0002 | kHFSThreadExistsMask | Has thread record, which should be always set for a file record on HFS+/HSFX |
| 0x0004 | kHFSHasAttributesMask | Has extended attributes |
| 0x0008 | kHFSHasSecurityMask | Has ACLs |
| 0x0010 | kHFSHasFolderCountMask | Has number of sub-folder |
| 0x0020 | kHFSHasLinkChainMask | Has a hard link target (link chain), where the CNID of the hard link target is stored in the special permission data |
| 0x0040 | kHFSHasChildLinkMask | Has a child that is a directory link |
| 0x0080 | kHFSHasDateAddedMask | Had added time, where the extended folder of file information contains the time the folder or file was added (date_added) |
| 0x0100 | kHFSFastDevPinnedMask | Unknown |
| 0x0200 | kHFSDoNotFastDevPinMask | Unknown |
| 0x0400 | kHFSFastDevCandidateMask | Unknown |
| 0x0800 | kHFSAutoCandidateMask | Unknown |
The catalog thread record
The file thread record is similar to the folder thread record except that it refers to a file, instead of a directory.
HFS catalog file thread record
The HFS catalog thread record (kHFSFolderThreadRecord (or cdrThdRec), kHFSFileThreadRecord (or cdrFThdRec)) is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | 0x0300 or 0x0400 | Record type |
| 2 | 2 x 4 = 8 | 0x00 | Unknown (Reserved), which consists of an array of 32-bit integer values |
| 10 | 4 | Parent identifier (CNID) | |
| 14 | 1 | Number of characters in the name string, with a maximum of 31 | |
| 15 | ... | Name string, which contains a narrow character string without end-of-string character |
HFS+ and HFSX catalog file thread record
The HFS+ and HFSX catalog thread record (kHFSPlusFolderThreadRecord, kHFSPlusFileThreadRecord) is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | 0x0003 or 0x0004 | Record type |
| 2 | 2 | 0x00 | Unknown (Reserved), which consists of a unsigned 16-bit integer |
| 4 | 4 | Parent identifier (CNID) | |
| 8 | 2 | Number of characters in the name string, with a maximum of 255 | |
| 10 | ... | Name string, which contains an UTF-16 big-endian string without end-of-string character |
Permissions
For each file and folder HFS+ maintains basic access permissions record for each file and folder. These are similar to basic Unix file permissions.
TODO: add note about permissions on HFS
Owner and group identifier
The Mac OS X user ID of the owner of the file or folder. Mac OS X versions prior to 10.3 treats user ID 99 as if it was the user ID of the user currently logged in to the console. If no user is logged in to the console, user ID 99 is treated as user ID 0 (root). Mac OS X version 10.3 treats user ID 99 as if it was the user ID of the process making the call (in effect, making it owned by everyone simultaneously). These substitutions happen at run-time. The actual user ID on disk is not changed.
The Mac OS X group ID of the group associated with the file or folder. Mac OS X typically maps group ID 99 to the group named “unknown.” There is no run-time substitution of group IDs in Mac OS X.
Administration flags
| Value | Identifier | Description |
|---|---|---|
| 0x01 | SF_ARCHIVED | File has been archived |
| 0x02 | SF_IMMUTABLE | File is immutable and may not be changed |
| 0x04 | SF_APPEND | Writes to file may only append |
Owner flags
| Value | Identifier | Description |
|---|---|---|
| 0x01 | UF_NODUMP | Do not backup (dump) this file |
| 0x02 | UF_IMMUTABLE | File is immutable and may not be changed |
| 0x04 | UF_APPEND | Writes to file may only append |
| 0x08 | UF_OPAQUE | Directory is opaque |
File mode
| Value | Identifier | Description |
|---|---|---|
| 0xf000 (0170000) | S_IFMT | File type bitmask |
| 0x1000 (0010000) | S_IFIFO | Named pipe |
| 0x2000 (0020000) | S_IFCHR | Character-special file (Character device) |
| 0x4000 (0040000) | S_IFDIR | Directory |
| 0x6000 (0060000) | S_IFBLK | Block-special file (Block device) |
| 0x8000 (0100000) | S_IFREG | Regular file |
| 0xa000 (0120000) | S_IFLNK | Symbolic link |
| 0xc000 (0140000) | S_IFSOCK | Socket |
| 0xe000 (0160000) | S_IFWHT | Whiteout, which is a file entry that covers up all entries of a particular name from lower branches |
HFS+ uses the BSD file type and mode bits. Note that the constants from the header shown below are in octal (base eight), not hexadecimal.
| Octal value | Identifier | Description |
|---|---|---|
| 0004000 | S_ISUID | Set user identifier on execution |
| 0002000 | S_ISGID | Set group identifier on execution |
| 0001000 | S_ISTXT | Sticky bit |
| 0000700 | S_IRWXU | Read, write and execute access for owner |
| 0000400 | S_IRUSR | Read access for owner |
| 0000200 | S_IWUSR | Write access for owner |
| 0000100 | S_IXUSR | Execute access for owner |
| 0000070 | S_IRWXG | Read, write and execute access for group |
| 0000040 | S_IRGRP | Read access for group |
| 0000020 | S_IWGRP | Write access for group |
| 0000010 | S_IXGRP | Execute access for group |
| 0000007 | S_IRWXO | Read, write and execute access for other |
| 0000004 | S_IROTH | Read access for other |
| 0000002 | S_IWOTH | Write access for other |
| 0000001 | S_IXOTH | Execute access for other |
Note that if the sticky bit is set for a directory, then Mac OS restricts movement, deletion, and renaming of files in that directory. Files may be removed or renamed only if the user has write access to the directory; and is the owner of the file or the directory, or is the super-user.
HFS+ file special permission data
The special permission data is used to store the following information:
- hard link reference (iNodeNum)
- number of (hard) links (linkCount) in indirect node files
- device numbers of block (S_IFBLK) and character (S_IFCHR) devices files
File system hierarchy
File and folder records have a search key with a non-empty name string. In thread records the name string in the search key is empty. E.g. to list the file entries in a directory:
- find all the file or folder records given the parent CNID
Finding a file or directory by its CNID is a two-step process:
- use the CNID to look up the thread record for the file or directory
- use the thread record to look up the file or folder record
File forks
Forks in HFS and HFS+ can be compared to data streams in NTFS. In HFS+ the fork values are grouped in a separate fork descriptor structure. HFS+ also defines extended attributes (named forks). These are not stored in the catalog file but in the attributes file.
HFS+ fork descriptor structure
HFS+ maintains information about file contents using the HFS+ fork descriptor structure (HFSPlusForkData).
The fork descriptor structure is 80 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Size, in bytes | |
| 8 | 4 | Clump size, in bytes | |
| 12 | 4 | Number of blocks | |
| 16 | 64 | Data extents record |
The extents overflow file
In HFS and HFS+ extents (contiguous ranges of blocks) are used to track which blocks belong to a file. The first three (HFS) and eight (HFS+) are stored in the catalog file. Additional extents are stored in the extents overflow file.
The structure of an extents overflow file is relatively simple compared to that of a catalog file. The function of the extents overflow file is to store those file extents that are not contained in the master directory block (MDB) or volume header and the catalog file
Note that the file system B-tree files can have additional extents in the extents overflow file. This has been observed with the attributes file. It is currently unknown if the extents (overflow) file itself can have overflow extents.
The extents overflow key (record)
Disks initialized using the enhanced Disk Initialization Manager introduced in system software version might contain extent records for some blocks that do not belong to any actual file in the file system. These extent records have been marked as a bad block (CNID 5). See the chapter “Disk Initialization Manager” in this book for details on bad block sparing.
The key has been selected so that the extent records for a particular fork are grouped together in the B-tree, right next to all the extent records for the other fork of the file. The fork offset of the preceding extent record is needed to determine the key of the next extent record
In an extents overflow file the search key consists of:
- fork type
- file identifier
- first block in the extent
HFS extents overflow key (record)
The HFS extents overflow key (record) is 8 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | 7 | Key data size, in bytes, which consists of a signed 8-bit integer |
| 1 | 1 | Fork type, which consists of a signed 8-bit integer | |
| 2 | 4 | File identifier (CNID) | |
| 6 | 2 | Logical block number |
The first 8 extents in a fork are held in its catalog file record. So the number of extent records for a fork is:
(number_of_extents - 3 + 2) / 4
HFS+ and HFSX extents overflow key (record)
The HFS+ and HFSX extents overflow key (record) is 12 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | 10 | Key data size, in bytes, which consists of an unsigned 16-bit integer |
| 2 | 1 | Fork type, which consists of a signed 8-bit integer | |
| 3 | 1 | 0x00 | Unknown (Padding) |
| 4 | 4 | File identifier (CNID) | |
| 8 | 4 | Logical block number |
The first 8 extents in a fork are held in its catalog file record. So the number of extent records for a fork is:
(number_of_extents - 8 + 7) / 8
HFS fork types
| Value | Identifier | Description |
|---|---|---|
| -1 (0xff) | Resource fork | |
| 0 (0x00) | Data fork |
The extent (data) record
An extent is a contiguous range of blocks that have been allocated to an individual file. An extent is represented by an extent descriptor.
HFS extents record
The HFS extents record (HFSExtentRecord) is 12 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 3 x 4 = 12 | Array of HFS extent descriptors |
HFS extent descriptor
The HFS extents descriptor (HFSExtentDescriptor) is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Physical block number, which contains a block number relative from the start of the data area | |
| 2 | 2 | Number of blocks |
extent_offset = (data_area_block_number + extent_block_number) * block_size
An unused extent descriptor should have both the block number and number of blocks set to 0.
HFS+ and HFSX extents record
The HFS+ and HFSX extents record (HFSPlusExtentRecord) is 64 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 x 8 = 64 | Array of HFS+ extent descriptors |
HFS+ and HFSX extent descriptor
The HFS+ and HFSX extents descriptor (HFSPlusExtentDescriptor) is 8 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Physical block number, which contains a block number relative from the start of the volume | |
| 4 | 4 | Number of blocks |
extent_offset = extent_block_number * block_size
An unused extent descriptor should have both the block number and number of blocks set to 0.
Bad Block File
The extents overflow file is also used to hold information about the bad blocks; refered to as the bad block file. The bad block file is used to mark areas on the disk as bad, unable to be used for storing data; typically to map out bad sectors on the storage medium.
Typically, blocks are larger than sectors. If a single sector is found to be bad, the entire block is unusable. The bad block file is sometimes used to mark blocks as unusable when they are not bad, e.g. in the HFS wrapper.
Bad block extent records are always assumed to reference the data fork (fork type of 0).
Allocation (bitmap) file
The allocation file is uzed to keep track of whether each block in a volume is currently allocated to some file system structure or not. The contents of the allocation file is a bitmap. The bitmap contains one bit for each block in the volume.
- If a bit is set, the corresponding block is currently in use by some file system structure.
- If a bit is clear, the corresponding block is not currently in use, and is available for allocation.
The size of the allocation file depends on the number of blocks in the volume, which in turn depends both on the size of the disk and on the size of the volume’s blocks. For example, a volume on a 1 GB disk and having an block size of 4 KB needs an allocation file size of 256 Kbits (32 KiB, or 8 blocks). Since the allocation file itself is allocated using blocks, it always occupies an integral number of blocks (its size may be rounded up).
The allocation file may be larger than the minimum number of bits required for the given volume size. Any unused bits in the bitmap must be set to 0.
Each byte in the allocation file holds the state of eight blocks. The byte at offset X into the file contains the allocation state of allocations blocks (N x 8) through (N x 8 + 7). Within each byte, the most significant bit holds information about the block with the lowest number, the least significant bit holds information about the block with the highest number. Listing 1 shows how you would test whether an block is in use, assuming that you’ve read the entire allocation file into memory.
Determining whether a block is in use.
static Boolean IsAllocationBlockUsed(UInt32 thisAllocationBlock,
UInt8 *allocationFileContents)
{
UInt8 thisByte;
thisByte = allocationFileContents[thisAllocationBlock / 8];
return (thisByte & (1 << (7 - (thisAllocationBlock % 8)))) != 0;
}
Attributes file
The attributes file is a B-tree file used to store extended attributes.
The location of the attributes file can be found in the HFS+ and HFSX volume header.
Attributes file keys
An attributes file key is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Key data size, in bytes | |
| If key data size >= 12 | |||
| 2 | 2 | Unknown | |
| 4 | 4 | Identifier (CNID) | |
| 8 | 4 | Unknown | |
| 12 | 2 | Number of characters in the name string | |
| 14 | ... | Name string, which contains an UTF-16 big-endian string without end-of-string character | |
Note that the name of an extended attribute appears to be case senstive even on a case insensitive file system.
The attributes file data
The attributes file defines two types of attributes:
- Fork data attributes, which are used for attributes whose data is large. The attribute’s data is stored in extents on the volume and the attribute merely contains a reference to those extents.
- Extension attributes, which are used to augment fork descriptor structure, allowing a forks to have more than eight extents.
Attributes file data record header
Each attributes file data record starts with a type value, which describes the type of attribute data record.
The attributes file data record header is 4 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Record type |
The attributes data record types
| Value | Identifier | Description |
|---|---|---|
| 0x00000010 | kHFSPlusAttrInlineData | Attribute record with inline data |
| 0x00000020 | kHFSPlusAttrForkData | Attribute record with fork descriptor |
| 0x00000030 | kHFSPlusAttrExtents | Attribute record with extents overflow |
Note that at the moment it is unclear when an attribute record of type kHFSPlusAttrExtents is created and how it should be handled.
The inline data attribute record
The inline data attribute record is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | 0x00000010 | Record type |
| 4 | 4 | 0 | Unknown (Reserved) |
| 8 | 4 | Unknown | |
| 12 | 4 | Attribute data size | |
| 16 | ... | Attribute data |
The fork descriptor attribute record
The fork descriptor attribute record is 88 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | 0x00000020 | Record type |
| 4 | 4 | 0 | Unknown (Reserved) |
| 8 | 80 | Attribute fork descriptor |
The extents attribute record
The extents attribute record is 72 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | 0x00000030 | Record type |
| 4 | 4 | 0 | Unknown (Reserved) |
| 8 | 64 | Attribute extents record |
Startup file
The startup file is a file system metadata file intended to hold information needed when booting a system that does not have built-in (ROM) support for HFS+ (or HFSX). A boot loader can find the startup file without full knowledge of the format using the first eight extents of the startup file located in the volume header.
Format wise it is valid for the startup file to contain more than eight extents, but in doing so the purpose of the startup file is defeated.
Next allocation search
The next block number is used by Mac OS as a hint for where to start searching for available blocks when allocating space for a file.
Metadata zone and hot files
In Mac OS X 10.3 a metadata zone was instroduced to store certain file system metadata, such as allocation bitmap file, extents overflow file, and the catalog file, the journal file and frequently used small files (also referred to as “hot files”) near each other to reduces seek time for typical accesses.
Hot File B-tree
The hot file B-tree is a file named “.hotfiles.btree” stored the root directory.
Journal
A HFS+ (or HFSX) volume may have an optional journal to speed recovery when mounting a volume that was not unmounted safely. The purpose of the journal is to ensure that when a group of related changes are being made, that either all of those changes are actually made, or none of them are made. The journal makes it quick and easy to restore the volume structures to a consistent state, without having to scan all of the structures. The journal is used only for the volume structures and metadata; it does not protect the contents of a fork.
The volume header specifies if journalling is activated.
The journal data stuctures consist of:
- a journal information block, contains the location and size of the journal header and journal buffer;
- a journal header, describes which part of the journal buffer is active and contains transactions waiting to be committed;
- a journal buffer, a cyclic buffer to hold the file system meta data transactions.
On HFS+ volumes, the journal information block is stored as a file. The name of that file is “.journal_info_block” and it is stored in the volume’s root directory.
The journal header and journal buffer are stored together in a different file named “.journal”, also in the volume’s root directory. Each of these files are contiguous on disk, they occupy exactly one extent.
The volume header contains the extent of the journal information block file. The journal information block contains the location of the journal file.
Journal information block
The journal information block describes where the journal header and journal buffer are stored. The journal information block is stored at the start of the block referred to by the volume header.
The journal information block is 44 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Journal flags | |
| 4 | 8 x 4 = 32 | Device signature | |
| 36 | 8 | Journal header offset | |
| 44 | 8 | Journal size, in bytes, which includes the size of the journal header and the journal buffer, but not the journal information block | |
| 52 | 32 x 4 = 128 | 0x00 | Unknown (Reserved) |
Journal flags
The journal flags consist of the following values:
| Value(s) | Description |
|---|---|
| 0x00000001 | On volume, where the journal header offset is relative to the start of the volume |
| 0x00000002 | On other device, where the device signature identifies the device containing the journal and the journal header offset is relative to the start of the device |
| 0x00000004 | Needs initialization, to indicate that there are no valid transactions in the journal and needs to be initialized |
Note that according to TN1150 journals stored on a separate device are not supported.
The journal header
The journal header is 44 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "\x4a\x4e\x4c\x78" | Signature |
| 4 | 4 | "\x12\x34\x56\x78" | Byte order (or endian) signature |
| 8 | 8 | First transaction start offset | |
| 16 | 8 | Next transaction start offset | |
| 24 | 8 | Journal size, in bytes, which includes the size of the journal header and buffer | |
| 32 | 4 | Journal block header size, in bytes, typically ranges from 4096 to 16384 | |
| 36 | 4 | checksum | |
| 40 | 4 | Journal header size, in bytes, typically the size of one sector |
First and next transaction offset
The first transaction offset contains the offset in bytes from the start of the journal header to the start of the first (oldest) transaction.
The next transaction offset contains the offset in bytes from the start of the journal header to the end of the last (newest) transaction. Note that this field may be less than the start field, indicating that the transactions wrap around the end of the journal’s circular buffer. If end equals start, then the journal is empty, and there are no transactions that need to be replayed.
Journal transactions
A single transaction is stored in the journal as several blocks. These blocks include both the data to be written and the location where that data is to be written. This is represented on storage medium by a block list header, which describes the number and sizes of the blocks, immediately followed by the contents of those blocks.
Since block list headers are of limited size, a single transaction may consist of several block list headers and their associated block contents. If the next value in the first block information structure is non-zero, then the next block list header is a continuation of the same transaction.
The journal buffer is treated as a circular buffer. When reading or writing the journal buffer, the I/O operation must stop at the end of the journal buffer and resume (wrap around) immediately following the journal header. Block list headers or the contents of blocks may wrap around in this way. Only a portion of the journal buffer is active at any given time; this portion is indicated by the start and end fields of the journal header. The part of the journal buffer that is not active contains no meaningful data, and must be ignored.
To prevent ambiguity when start equals end, the journal is never allowed to be perfectly full (all of the journal buffer used by block lists and blocks). If the journal was perfectly full, and start was not equal to jhdr_size, then end would be equal to start. You would then be unable to differentiate between an empty and full journal.
When the journal is not empty (contains transactions), it must be replayed to be sure the volume is consistent. That is, the data from each of the transactions must be written to the correct blocks on disk.
The journal block list header
The block list header describes a list of blocks included in a transaction. A transaction may include several block lists if it modifies more blocks than can be represented in a single block list.
The journal block list header is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Maximum number of journal blocks | |
| 2 | 2 | Number of journal blocks following the journal block header, typically 1 | |
| 4 | 4 | Block list size, in bytes, which includess the size of the header and blocks | |
| 8 | 4 | Checksum | |
| 12 | 4 | 0x00 | Unknown (Alignment padding) |
| 16 | ... | Journal block information array |
Note that the number of journal blocks includes the first journal block, The first journal block is reserved to be used when multiple blocks need to be chained, therefore the number of journal blocks actually containing data is minus one (-1).
Journal block information
The journal block information is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Block sector number | |
| 8 | 4 | Block size, in bytes | |
| 12 | 4 | Next journal block |
Journal checksum
The journal header and block list header both contain checksum values. The checksums are verified as part of a basic consistency check of these journal data structures. To verify the checksum, temporarily set the checksum field to 0 and then call the hfs_plus_calculate_checksum routine as specified below.
uint32_t hfs_plus_calculate_checksum(
uint8_t *buffer,
size_t buffer_size )
{
size_t buffer_offset = 0;
uint32_t checksum = 0;
for( buffer_offset = 0;
buffer_offset < buffer_size;
buffer_offset++)
{
checksum = ( checksum << 8 ) ^ ( checksum + buffer[ buffer_offset ] );
}
return( ~checksum );
}
Application specific data structures
HFS, HFS+ and HFSX contain application specific data structures.
Finder information
The finder information in the master directory block (MDB) and volume header consists of an array of 32-bit values. This array contains information used by the Mac OS Finder and the system software boot process.
| Array entry | Description |
|---|---|
| 0 | Bootable system directory identifier (CNID), i.e. "System Folder" in Mac OS 8 or 9, or "/System/Library/CoreServices" in Mac OS X. Typically 3 or 5, is 0 if the volume is not bootable |
| 1 | Startup application parent identifier (CNID), i.e. "Finder". Is 0 if the volume is not bootable |
| 2 | Directory identifier (CNID) to display in Finder on mount, or 0 if none |
| 3 | Directory identifier (CNID) of a bootable Mac OS 8 or 9 System Folder, or 0 if none |
| 4 | Unknown (Reserved) |
| 5 | Directory identifier (CNID) of a bootable Mac OS X system, the "/System/Library/CoreServices" directory, or 0 if none |
| 6 and 7 | Mac OS X volume identifier, consist of a 64-bit integer |
File information
HFS file information
The HFS file information is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 x 1 = 4 | File type, which consists of an array of unsigned 8-bit integers | |
| 4 | 4 x 1 = 4 | File creator, which consists of an array of unsigned 8-bit integers | |
| 8 | 2 | Finder flags | |
| 10 | 4 | Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically | |
| 14 | 2 | File icon window, which contains the window in which the file's icon appears |
HFS extended file information
The HFS extended file information is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Finder icon identifier | |
| 2 | 3 x 2 = 6 | Unknown (Reserved), which consists of an array of signed 16-bit integers | |
| 8 | 1 | Extended finder script code flags | |
| 9 | 1 | Extended finder flags | |
| 10 | 2 | Finder comment identifier, which consists of a signed 16-bit integer | |
| 12 | 4 | Put away folder identifier (CNID) |
HFS+ and HFSX file information
The HFS+ and HFSX file information (FileInfo) is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 x 1 = 4 | File type, which consists of an array of unsigned 8-bit integers | |
| 4 | 4 x 1 = 4 | File creator, which consists of an array of unsigned 8-bit integers | |
| 8 | 2 | Finder flags | |
| 10 | 4 | Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically | |
| 14 | 2 | Unknown (Reserved) |
HFS+ and HFSX extended file information
The HFS+ and HFSX extended file information (ExtendedFileInfo) is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Unknown (Reserved) | |
| If kHFSHasDateAddedMask is not set | |||
| 4 | 4 | Unknown (Reserved) | |
| If kHFSHasDateAddedMask is set | |||
| 4 | 4 | Added time, which contains a POSIX timestamp in UTC | |
| Common | |||
| 8 | 2 | Extended finder flags | |
| 10 | 2 | Unknown (Reserved), which consists of a signed 16-bit integer | |
| 12 | 4 | Put away folder identifier (CNID) | |
Folder information
HFS folder information
The HFS folder information is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Window position and dimension (boundaries), which contains the top, left, bottom, right-coordinate values | |
| 8 | 2 | Finder flags | |
| 10 | 4 | Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically | |
| 14 | 2 | Folder view |
HFS extended folder information
The HFS extended folder information is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Scroll position for icon view, which contains x and y-coordinate values | |
| If kHFSHasDateAddedMask is not set | |||
| 4 | 4 | Open folder identifier chain, which consists of a signed 32-bit integer | |
| If kHFSHasDateAddedMask is set | |||
| 4 | 4 | Added time, which contains a POSIX timestamp in UTC | |
| Common | |||
| 8 | 1 | Extended finder script code flags | |
| 9 | 1 | Extended finder flags | |
| 10 | 2 | Finder comment identifier, which consists of a signed 16-bit integer | |
| 12 | 4 | Put away folder identifier (CNID) | |
HFS+ and HFSX folder information
The HFS+ and HFSX folder information is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Window position and dimension (boundaries), which contains the top, left, bottom, right-coordinate values | |
| 8 | 2 | Finder flags | |
| 10 | 4 | Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically | |
| 14 | 2 | Unknown (Reserved) |
HFS+ and HFSX extended folder information
The HFS+ and HFSX extended folder information is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Scroll position for icon view, which contains x and y-coordinate values | |
| 4 | 4 | Unknown (Reserved), which consists of a signed 32-bit integer | |
| 8 | 2 | Extended finder flags | |
| 10 | 2 | Unknown (Reserved), which consists of a signed 16-bit integer | |
| 12 | 4 | Put away folder identifier (CNID) |
Finder flags
The finder flags consists of the following values:
| Value(s) | Applies to | Description |
|---|---|---|
| 0x0001 | Files and folders | Is on desktop |
| 0x000e | Files and folders | Color |
| 0x0040 | Files | Is shared |
| 0x0080 | Files | Has no INITs |
| 0x0100 | Files | Has been inited |
| 0x0400 | Files and folders | Has custom icon |
| 0x0800 | Files | Is stationary |
| 0x1000 | Files and folders | Name locked |
| 0x2000 | Files | Has bundle |
| 0x4000 | Files and folders | Is invisible |
| 0x8000 | Files | Is alias |
Extended finder flags
The extended finder flags consists of the following values:
| Value(s) | Description |
|---|---|
| 0x0004 | Has routing information |
| 0x0100 | Has custom badge resource |
| 0x8000 | Extended flags are invalid, which indicates that set the other extended flags should be ignored |
Notes
struct Point {
SInt16 v;
SInt16 h;
};
typedef struct Point Point;
struct Rect {
SInt16 top;
SInt16 left;
SInt16 bottom;
SInt16 right;
};
typedef struct Rect Rect;
/* OSType is a 32-bit value made by packing four 1-byte characters
together. */
typedef UInt32 FourCharCode;
typedef FourCharCode OSType;
File content
HFS supports multiple ways to store file content:
- Data fork
- Compressed data extended attribute
- Compressed data extended attribute with resource fork
- Resource fork
- Extended attribute (named fork)
Data fork
The file content size is stored in the data fork descriptor of the catalog file record.
The extents of the file content are stored in the fork descriptor and extents overflow file.
Compressed data extended attribute
The file has an attribute record with inline data named “com.apple.decmpfs” with compression method 3, 5 or 7.
The file content size is stored in the compressed data header of the extended attribute.
For compression method 3 or 7 the file content data is stored in the extended attribute after the decmpfs compressed data header.
For compression method 5 the file content data contains 0-byte values. There are 12 bytes stored after the decmpfs compressed data header that consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Unknown (Seen: 1) | |
| 4 | 4 | Unknown | |
| 8 | 4 | Unknown (Seen: 0) |
Compressed data extended attribute with resource fork
The file has an attribute record with inline data named “com.apple.decmpfs” with compression method 4 or 8.
The file content size is stored in the compressed data header of the extended attribute.
The file content data is stored in a “com.apple.ResourceFork” extended attribute.
The compressed data starts with metadata that contains the offsets of the compressed data blocks.
ZLIB (DEFLATE) compressed data
- ZLIB (DEFLATE) compressed header
- Unknown (empty values)
- ZLIB (DEFLATE) compressed data block offsets and sizes
- ZLIB (DEFLATE) compressed data blocks
- ZLIB (DEFLATE) compressed footer
ZLIB (DEFLATE) compressed header
The ZLIB (DEFLATE) compressed header is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Compressed data block descriptors offset, where the offset is relative from the start of the ZLIB (DEFLATE) compressed data | |
| 4 | 4 | Compressed footer offset, where the offset is relative from the start of the ZLIB (DEFLATE) compressed data | |
| 8 | 4 | Compressed data block descriptors and data size | |
| 12 | 4 | Compressed footer size |
Note that the values in the ZLIB (DEFLATE) compressed header are stored in big-endian.
ZLIB (DEFLATE) compressed data block descriptors
The ZLIB (DEFLATE) compressed data block descriptors are of variable size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Compressed data size | |
| 4 | 4 | Number of compressed data block offset and size tuples | |
| 8 | 8 x ... | Array of compressed data block descriptors |
ZLIB (DEFLATE) compressed data block descriptor
The ZLIB (DEFLATE) compressed data block descriptor is 8 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Compressed block offset, where the offset is relative from the start of the ZLIB (DEFLATE) compressed data + 20 | |
| 4 | 4 | Compressed block size |
ZLIB (DEFLATE) compressed footer
The ZLIB (DEFLATE) compressed footer is 50 bytes size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 24 | Unknown (empty values) | |
| 24 | 2 | Unknown | |
| 26 | 2 | Unknown | |
| 28 | 2 | Unknown | |
| 30 | 2 | Unknown | |
| 32 | 4 | "cmpf" | Unknown (signature) |
| 36 | 4 | Unknown | |
| 40 | 4 | Unknown | |
| 44 | 6 | Unknown (empty values) |
Note that the values in the ZLIB (DEFLATE) compressed footer are stored in big-endian.
LZVN compressed data
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 x ... | Array of compressed data block offsets, where an offset is relative from the start of the LZVN compressed data | |
| ... | ... | LZVN compressed data blocks |
Note that the compressed data block contains a maximum of 65536 bytes of data. The compressed data block therefore should not exceed 65537 bytes in size.
Resource fork
The file content size is stored in the resource fork descriptor of the catalog file record.
The extents of the file content are stored in the fork descriptor and extents overflow file.
Extended attribute (named fork)
Extended attributes, also referred to as named forks, are stored in the HFS+ attributes file.
HFS wrapper
TODO: complete section
A HFSX volume cannot be wrapped in a HFS volume.
References
- hfs_format.h
- Data Organization on Volumes, by Apple Inc.
- Technical Note TN1150: HFS plus volume format, by Apple Inc.
Macintosh File System (MFS)
The Macintosh File System (MFS) is the first file system created for Mac OS, intended for 400 KiB floppy disks.
Overview
A MFS file system consists of:
- optional boot block
- master directory block (MDB)
- file directory area
- data area
- optional backup (or alternate) master directory block (MDB)
The backup master directory block (MDB), is stored in the last 2 sectors of the volume.
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | big-endian |
| Date and time values | TODO |
| Character strings | Narrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a system defined codepage |
Terminology
| Term | Description |
|---|---|
| Clump size | Size of the group of (allocation) blocks (or clump), in bytes, to avoid fragmentation |
Boot Block
If a volume is bootable, the first 2 blocks of the volume contain boot block. The boot block consists of:
- boot block header
- boot code
- unknown (filler)
Boot Block Header
The boot block header is 138 or 144 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | "LK" (or "\x4c\x4b") | Boot block signature |
| 2 | 4 | Boot code entry point | |
| 6 | 1 | Flags | |
| 7 | 1 | Format version | |
| 8 | 2 | Page flags (or Secondary Sound and Video Pages) | |
| 10 | 1 | System file name size, with a maximum of 15 | |
| 11 | 15 | System file name | |
| 26 | 1 | Finder (or shell) file name size, with a maximum of 15 | |
| 27 | 15 | Finder (or shell) file name, typically "Finder" | |
| 42 | 1 | Debugger file name size, with a maximum of 15 | |
| 43 | 15 | Debugger file name, typically "Macsbug" | |
| 58 | 1 | Disassembler (or second debugger) file name size, with a maximum of 15 | |
| 59 | 15 | Disassembler (or second debugger) file name, typically "Disassembler" | |
| 74 | 1 | Startup screen file name size, with a maximum of 15 | |
| 75 | 15 | Startup screen file name, typically "StartUpScreen" | |
| 90 | 1 | Startup (or bootup) file name size, with a maximum of 15 | |
| 91 | 15 | Startup (or bootup) file name, typically "Finder" | |
| 106 | 1 | Clipboard (or scrap) file name size, with a maximum of 15 | |
| 107 | 15 | Clipboard (or scrap) file name, typically "Clipboard" | |
| 122 | 2 | Number of allocated file control blocks (FCBs) | |
| 124 | 2 | Number of elements in the event queue, typically 20 | |
| 126 | 4 | System heap size on Macintosh computer with 128 KiB of RAM | |
| 130 | 4 | System heap size on Macintosh computer with 256 KiB of RAM | |
| 134 | 4 | System heap size on Macintosh computer with +512 KiB of RAM | |
| Newer boot block header format | |||
| 138 | 4 | Additional system heap space | |
| 140 | 4 | Fraction of available RAM for the system heap | |
Note that “LK” presumably is short for “Larry Kenyon” who originally designed MFS.
Boot code entry point
The boot code entry point contains machine-language instructions that translate to:
BRA.S *+ 0x90
Or for older versions of the boot block header:
BRA.S *+ 0x88
BRA.W *+ 0x88
BRA $88(PC) * $6000,$0086
This instruction jumps to the main boot code following the boot block header.
This field is ignored, however, if bit 6 is clear in the high-order byte of the boot block version number or if the low-order byte contains 0x0d.
Boot Block Header Flags
| Bit(s) | Description |
|---|---|
| 0 - 4 | Unknown (Reserved), should contain 0 |
| 5 | Use relative system heap sizing |
| 6 | Execute boot code |
| 7 | Newer boot block header format is used |
If bit 7 of the flag byte is clear, then bits 5 and 6 are ignored and the version number is set in the format version value.
If the format version value is:
- less than 21, the values in the system heap size on 128K Mac and 256K Mac should be ignored and the value in system heap size on all machines should be used.
- 13 the boot code should be executed using the value in boot code entry point.
- greater than or equal to 21 the value in system heap size on all machines should be used.
If bit 7 of the flag byte is set
- bit 6 should be used to determine whether to execute the boot code using the value in boot code entry point.
- bit 5 should be used to determine whether to use relative System heap sizing. If bit 5 is
- clear the value in system heap size on all machines should be used.
- is set the System heap is extended by the value in the additional system heap space plus the fraction of available RAM for the system heap.
Master Directory Block (MDB)
The Master Directory Block (MDB) is located at offset 1024 of the volume and consists of:
- master directory block header
- block map
Master Directory Block (MDB) header
The Master Directory Block (MDB) header is 64 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | "\xd2\xd7" | Volume signature |
| 2 | 4 | Creation date and time, which contains a HFS timestamp in local time | |
| 6 | 4 | Last modification date and time, which contains a HFS timestamp in local time | |
| 10 | 2 | Volume attribute flags | |
| 12 | 2 | Number of files in the root directory | |
| 14 | 2 | File directory area sector number, contains a sector number relative from the start of the volume, where 0 is the first sector number | |
| 16 | 2 | File directory area size, in number of sectors | |
| 18 | 2 | Number of blocks | |
| 20 | 4 | Block size, in bytes, must be a multitude of 512 | |
| 24 | 4 | Clump size, in bytes | |
| 28 | 2 | Data area sector number, contains a sector number relative from the start of the volume, where 0 is the first sector number | |
| 30 | 4 | Next available file identifier | |
| 34 | 2 | Number of unused blocks | |
| 36 | 1 | Volume label size, with a maximum of 27 | |
| 37 | 27 | Volume label |
Block map
TODO: describe similar to FAT-12 block allocation table
File Directory Area
The file directory area consists of:
- one or more file directory entries, where an individual file directory entry does not span multiple blocks
File Directory Entry
A file directory entry is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 | Flags, where 0x80 indicates the file directory entry is in use | |
| 1 | 1 | 0 | Format version |
| 2 | 4 | "\x3f\x3f\x3f\x3f" | File type |
| 6 | 4 | File creator | |
| 10 | 2 | Finder flags | |
| 12 | 4 | Window position and dimension (boundaries), which contains the top, left, bottom, right-coordinate values | |
| 16 | 2 | Folder file identifier, where 0 represents the main volume, -2 the desktop, -3 the trash, otherwise, if positive, a file identifier | |
| 18 | 4 | File identifier | |
| 22 | 2 | Data fork block number, contains 0 if the file entry has no data fork | |
| 24 | 4 | Data fork size, in bytes | |
| 28 | 4 | Data fork allocated size, in bytes | |
| 32 | 2 | Resource fork block number, contains 0 if the file entry has no resource fork | |
| 34 | 4 | Resource fork size, in bytes | |
| 38 | 4 | Resource fork allocated size, in bytes | |
| 42 | 4 | Creation date and time, which contains a HFS timestamp in local time | |
| 46 | 4 | (Content) modification date and time, which contains a HFS timestamp in local time | |
| 50 | 1 | File name size, with a maximum of 255 | |
| 51 | ... | File name | |
| ... | ... | 16-bit alignment padding |
New Technologies File System (NTFS) format
The New Technologies File System (NTFS) format is the primary file system for Microsoft Windows versions that are based on Windows NT.
Overview
An New Technologies File System (NTFS) consists of:
- boot record
- boot loader
- Master File Table (MFT)
- Mirror Master File Table (MFT)
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | little-endian |
| Date and time values | FILETIME in UTC |
| Character strings | UCS-2 little-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00" |
Versions
| Format version | Remarks |
|---|---|
| 1.0 | Introduced in Windows NT 3.1 |
| 1.1 | Introduced in Windows NT 3.5, also seen to be used by Windows NT 3.1 |
| 1.2 | Introduced in Windows NT 3.51 |
| 3.0 | Introduced in Windows 2000 |
| 3.1 | Introduced in Windows XP |
Note that the format versions mentioned above are the version as used by NTFS. Another common versioning schema uses the Windows version, e.g. NTFS 5.0 is the version of NTFS used on Windows XP which is version 3.1 in schema mentioned above.
Windows does not necessarily uses the latest format version, e.g. Windows 10 (1809) has been observed to use NTFS version 1.2 for 64k cluster block size.
Terminology
Cluster
NTFS refers to it file system blocks as clusters. Note that these are not the same as the physical clusters of a harddisk. For clarity this document will refer to these as cluster blocks. In other sources they are also referred to as logical clusters.
Typically a cluster block is 8 sectors (or 8 x 512 = 4096 bytes) in size. A cluster block number is relative to the start of the boot record.
Virtual cluster
The term virtual cluster refers to cluster blocks which are relative to the start of a data stream.
Long and short (file) name
In Windows terminology the name of a file (or directory) can either be short or long. The short name is an equivalent of the file name in the (DOS) 8.3 format. The long name is actual the (full) name of the file. The term long refers to the aspect that the name is longer than the short variant. Because most documentation refer to the (full) name as the long name, for clarity sake so will this document.
Metadata files
NTFS uses the Master File Table (MFT) to store information about files and directories. The MFT entries reference the different volume and file system metadata. There are several predefined metadata files.
The following metadata files are predefined and use a fixed MFT entry number.
| MFT entry number | File name | Description |
|---|---|---|
| 0 | "$MFT" | Master File Table |
| 1 | "$MFTMirr" | Back up of the first 4 entries of the Master File Table |
| 2 | "$LogFile" | Metadata transaction journal |
| 3 | "$Volume" | Volume information |
| 4 | "$AttrDef" | MFT entry attribute definitions |
| 5 | "." | Root directory |
| 6 | "$Bitmap" | Cluster block allocation bitmap |
| 7 | "$Boot" | Boot record (or boot code) |
| 8 | "$BadClus" | Bad clusters |
| Used in NTFS version 1.2 and earlier | ||
| 9 | "$Quota" | Quota information |
| Used in NTFS version 3.0 and later | ||
| 9 | "$Secure" | Security and access control information |
| Common | ||
| 10 | "$UpCase" | Case folding mappings |
| 11 | "$Extend" | A directory containing extended metadata files |
| 12-15 | Unknown (Reserved), which are marked as in-use but are empty | |
| 16-23 | Unused, which are marked as unused | |
| Used in NTFS version 3.0 and later | ||
| 24 | "$Extend$Quota" | Quota information |
| 25 | "$Extend$ObjId" | Unique file identifiers for distributed link tracking |
| 26 | "$Extend$Reparse" | Backreferences to reparse points |
| Transactional NTFS metadata files, which have been observed in Windows Vista and later | ||
| 27 | "$Extend$RmMetadata" | Resource manager metadata directory |
| 28 | "$Extend$RmMetadata$Repair" | Repair information |
| 29 or 30 | "$Extend$RmMetadata$TxfLog" | Transactional NTFS (TxF) log metadata directory |
| 30 or 31 | "$Extend$RmMetadata$Txf" | Transactional NTFS (TxF) metadata directory |
| 31 or 32 | "$Extend$RmMetadata$TxfLog$Tops" | TxF Old Page Stream (TOPS) file, which is used to store data that has been overwritten inside a currently active transaction |
| 32 or 33 | "$Extend$RmMetadata$TxfLog$TxfLog.blf" | Transactional NTFS (TxF) base log metadata file |
| Observed in Windows 10 and later | ||
| 29 | "$Extend$Deleted" | Temporary location for files that have an open handle but a request has been made to delete them |
| Common | ||
| ... | A file or directory | |
The following metadata files are predefined, however the MFT entry number is commonly used but not fixed.
| MFT entry number | File name | Description |
|---|---|---|
| "$Extend$UsnJrnl" | USN change journal |
The boot record
The boot record is stored at the start of the volume (in the $Boot metadata file) and contains:
- the file system signature
- the BIOS parameter block
- the boot loader
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 3 | Boot entry point | |
| 3 | 8 | "NTFS\x20\x20\x20\x20" | File system signature (Also known as OEM identifier or dummy identifier) |
| DOS version 2.0 BIOS parameter block (BPB) | |||
| 11 | 2 | Bytes per sector. Note that the following values are supported by mkntfs: 256, 512, 1024, 2048 and 4096 | |
| 13 | 1 | Number of sectors per cluster block | |
| 14 | 2 | 0 | Unknown (Reserved Sectors), which is not used by NTFS and must be 0 |
| 16 | 1 | 0 | Number of cluster block allocation tables, which is not used by NTFS and must be 0 |
| 17 | 2 | 0 | Number of root directory entries, which is not not used by NTFS and must be 0 |
| 19 | 2 | 0 | Number of sectors (16-bit), which is not used by NTFS must be 0 |
| 21 | 1 | Media descriptor | |
| 22 | 2 | 0 | Cluster block allocation table size (16-bit) in number of sectors, which is not used by NTFS and must be 0 |
| DOS version 3.4 BIOS parameter block (BPB) | |||
| 24 | 2 | 0x3f | Sectors per track, which is not used by NTFS |
| 26 | 2 | 0xff | Number of heads, which is not used by NTFS |
| 28 | 4 | 0x3f | Number of hidden sectors, which is not used by NTFS |
| 32 | 4 | 0x00 | Number of sectors (32-bit), which is not used by NTFS must be 0 |
| NTFS version 8.0 BIOS parameter block (BPB) or extended BPB, which was introduced in Windows NT 3.1 | |||
| 36 | 1 | 0x80 | Unknown (Disc unit number), which is not used by NTFS |
| 37 | 1 | 0x00 | Unknown (Flags), which is not used by NTFS |
| 38 | 1 | 0x80 | Unknown (BPB version signature byte), which is not used by NTFS |
| 39 | 1 | 0x00 | Unknown (Reserved), which is not used by NTFS |
| 40 | 8 | Number of sectors (64-bit) | |
| 48 | 8 | Master File Table (MFT) cluster block number | |
| 56 | 8 | Mirror MFT cluster block number | |
| 64 | 4 | MFT entry size | |
| 68 | 4 | Index entry size | |
| 72 | 8 | Volume serial number | |
| 80 | 4 | 0 | Checksum, which is not used by NTFS |
| Common | |||
| 84 | 426 | Boot code | |
| 510 | 2 | "\x55\xaa" | The (boot) signature |
Boot entry point
The boot entry point often contains a jump instruction to the boot code at offset 84 followed by a no-operation, e.g.
eb52 jmp 0x52
90 nop
Number of sectors per cluster block
The number of sectors per cluster block value as used by mkntfs is defined as following:
- Values 0 to 128 represent sizes of 0 to 128 sectors.
- Values 244 to 255 represent sizes of
2^(256-n)sectors. - Other values are unknown.
Cluster block size
The cluster block size can be determined as following:
cluster block size = bytes per sector x sectors per cluster block
Different NTFS implementations support different cluster block sizes. Known supported cluster block size:
| Cluster block size | Bytes per sector | Supported by |
|---|---|---|
| 256 | 256 | mkntfs |
| 512 | 256 - 512 | mkntfs, ntfs3g, Windows |
| 1024 | 256 - 1024 | mkntfs, ntfs3g, Windows |
| 2048 | 256 - 2048 | mkntfs, ntfs3g, Windows |
| 4096 | 256 - 4096 | mkntfs, ntfs3g, Windows |
| 8192 | 256 - 4096 | mkntfs, ntfs3g, Windows |
| 16K (16384) | 256 - 4096 | mkntfs, ntfs3g, Windows |
| 32K (32768) | 256 - 4096 | mkntfs, ntfs3g, Windows |
| 64K (65536) | 256 - 4096 | mkntfs, ntfs3g, Windows |
| 128K (131072) | 256 - 4096 | mkntfs, ntfs3g, Windows 10 (1903) |
| 256K (262144) | 256 - 4096 | mkntfs, ntfs3g, Windows 10 (1903) |
| 512K (524288) | 256 - 4096 | mkntfs, ntfs3g, Windows 10 (1903) |
| 1M (1048576) | 256 - 4096 | mkntfs, ntfs3g, Windows 10 (1903) |
| 2M (2097152) | 512 - 4096 | mkntfs, ntfs3g, Windows 10 (1903) |
Note that Windows 10 (1903) requires the partition containing the NTFS file system to be aligned with the cluster block size. For example for a cluster block size of 128k the partition must 128 KiB aligned. The default partition partition alignment appears to be 64 KiB.
mkntfs restricts the cluster size to:
bytes_per_sector >= cluster_block_size > 4096 * bytes_per_sector
Master File Table (MFT) offset
The Master File Table (MFT) offset can be determined as following:
mft_offset = boot_record_offset + (mft_cluster_block_number * cluster_block_size)
The lower 32-bit part of the NTFS volume serial number is the Windows API (WINAPI) volume serial number. This can be determined by comparing the output of:
fsutil fsinfo volumeinfo C:
fsutil fsinfo ntfsinfo C:
Often the total number of sectors in the boot record will be smaller than the underlying partition. A (nearly identical) backup of the boot record is stored in last sector of cluster block, that follows the last cluster block of the volume. Often this is the 512 bytes after the last sector of the volume, but not necessarily. The backup boot record is not included in the total number of sectors.
Master File Table (MFT) and index entry size
The Master File Table (MFT) entry size and index entry size are defined as following:
- Values 0 to 127 represent sizes of 0 to 127 cluster blocks.
- Values 128 to 255 represent sizes of 2^(256-n) bytes or 2^(-n) if considered as a signed byte.
- Other values are not considered valid.
BitLocker Drive Encryption (BDE)
BitLocker Drive Encryption (BDE) uses the file system signature: “-FVE-FS-”. Where FVE is an abbreviation of Full Volume Encryption.
The data structures of BDE on Windows Vista and 7 differ.
A Windows Vista BDE volume starts with:
eb 52 90 2d 46 56 45 26 46 53 2d
A Windows 7 BDE volume starts with:
eb 58 90 2d 46 56 45 26 46 53 2d
BDE is largely a stand-alone but has some integration with NTFS.
TODO: link to BDE format documentation
Volume Shadow Snapshots (VSS)
Volume Shadow Snapshots (VSS) uses the GUID 3808876b-c176-4e48-b7ae-04046e6cc752 (stored in little-endian) to identify its data.
VSS is largely a stand-alone but has some integration with NTFS.
TODO: link to VSS format documentation
Media descriptor
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 1 bit | Sides, where single-sided (0) and double-sided (1) | |
| 0.1 | 1 bit | Track size, where 9 sectors per track (0) and 8 sectors per track (1) | |
| 0.2 | 1 bit | Density, where 80 tracks (0) and 40 tracks (1) | |
| 0.3 | 1 bit | Type, where Fixed disc (0) and Removable disc (1) | |
| 0.4 | 4 bits | Always set to 1 |
The boot loader
| Offset | Size | Value | Description |
|---|---|---|---|
| 512 | Windows NT (boot) loader (NTLDR/BOOTMGR) |
The Master File Table (MFT)
The MFT consist of an array of MFT entries. The offset of the MFT table can be found in the volume header and the size of the MFT is defined by the MFT entry of the $MFT metadata file.
Note that the MFT can consists of multiple data ranges, defined by the data runs in the $MFT metadata file.
MFT entry
Although the size of a MFT entry is defined in the volume header is commonly 1024 bytes in size and consists of:
- The MFT entry header
- The fix-up values
- An array of MFT attribute values
- Padding, which should contain 0-byte values
Note that the MFT entry can be filled entirely with 0-byte values. Seen in Windows XP for MFT entry numbers 16 - 23.
MFT entry header
The MFT entry header (FILE_RECORD_SEGMENT_HEADER) is 42 or 48 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| MULTI_SECTOR_HEADER | |||
| 0 | 4 | "BAAD", "FILE" | Signature |
| 4 | 2 | The fix-up values (or update sequence array) offset, which contain an offset relative from the start of the MFT entry | |
| 6 | 2 | The number of fix-up values (or update sequence array size) | |
| Common | |||
| 8 | 8 | Metadata transaction journal sequence number, which contains a $LogFile Sequence Number (LSN) | |
| 16 | 2 | Sequence (number) | |
| 18 | 2 | Reference (link) count | |
| 20 | 2 | Attributes offset (or first attribute offset), which contains an offset relative from the start of the MFT entry | |
| 22 | 2 | MFT entry flags | |
| 24 | 4 | Used size in bytes | |
| 28 | 4 | MFT entry size in bytes | |
| 32 | 8 | Base record file reference | |
| 40 | 2 | First available attribute identifier | |
| If NTFS version is 3.0 | |||
| 42 | 2 | Unknown (wfixupPattern) | |
| 44 | 4 | Unknown | |
| If NTFS version is 3.1 | |||
| 42 | 2 | Unknown (wfixupPattern) | |
| 44 | 4 | MFT entry number | |
“BAAD” signature
According to NTFS documentation if during chkdsk, when a multi-sector item is found where the multi-sector header does not match the values at the end of the sector, it marks the item as “BAAD” and fill it with 0-byte values except for a fix-up value at the end of the first sector of the item. The “BAAD” signature has been seen to be used on Windows NT4 and XP.
Sequence number
According to FILE_RECORD_SEGMENT_HEADER structure the sequence number is incremented each time that a file record segment is freed; it is 0 if the segment is not used.
Base record file reference
The base record file reference is used to store additional attributes for another MFT entry, e.g. for attribute lists.
MFT entry flags
| Value | Identifier | Description |
|---|---|---|
| 0x0001 | FILE_RECORD_SEGMENT_IN_USE, MFT_RECORD_IN_USE | In use |
| 0x0002 | FILE_FILE_NAME_INDEX_PRESENT, FILE_NAME_INDEX_PRESENT, MFT_RECORD_IS_DIRECTORY | Has file name (or $I30) index. When this flag is set the file entry represents a directory |
| 0x0004 | MFT_RECORD_IN_EXTEND | Unknown. According to ntfs_layout.h this is set for all system files present in the $Extend directory |
| 0x0008 | MFT_RECORD_IS_VIEW_INDEX | Is index. When this flag is set the file entry represents an index. According to ntfs_layout.h this is set for all indices other than $I30 |
The fix-up values
The fix-up values are of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Fix-up placeholder value | |
| 2 | 2 x number of fix-up values | Fix-up (original) value array |
On disk the last 2 bytes for each 512 byte block is replaced by the fix-up placeholder value. The original value is stored in the corresponding fix-up (original) value array entry.
Note that there can be more fix-up values than the number of 512 byte blocks in the data.
According to MULTI_SECTOR_HEADER structure the update sequence array must end before the last USHORT value in the first sector. It also states that the update sequence array size value contains the number of bytes, but based on analysis of data samples it seems to be more likely to the number of words.
In NT4 (version 1.2) the MFT entry is 42 bytes in size and the fix-up values are stored at offset 42. This is likely where the name wfixupPattern originates from.
TODO: provide examples on applying the fix-up values.
The file reference
The file reference (FILE_REFERENCE or MFT_SEGMENT_REFERENCE) is 8 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 6 | MFT entry number | |
| 6 | 2 | Sequence number |
Note that the index value in the MFT entry is 32-bit in size.
MFT attribute
The MFT attribute consist of:
- the attribute header
- the attribute resident or non-resident data
- the attribute name
- Unknown data, likely alignment padding (4-byte alignment)
- resident attribute data or non-resident attribute data runs
- alignment padding (8-byte alignment), can contain remnant data
MFT attribute header
The MFT attribute header (ATTRIBUTE_RECORD_HEADER) is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Attribute type (or type code) | |
| 4 | 4 | Attribute size (or record length), which includes the 8 bytes of the attribute type and size | |
| 8 | 1 | Non-resident flag (or form code), where RESIDENT_FORM (0) and NONRESIDENT_FORM (1) | |
| 9 | 1 | Name size (or name length), which contains the number of characters without the end-of-string character | |
| 10 | 2 | Name offset, which contains an offset relative from the start of the MFT attribute | |
| 12 | 2 | Attribute data flags | |
| 14 | 2 | Attribute identifier (or instance), which contains an unique identifier to distinguish between attributes that contain segmented data |
MFT attribute data flags
| Value | Identifier | Description |
|---|---|---|
| 0x0001 | Is LZNT1 compressed | |
| 0x00ff | ATTRIBUTE_FLAG_COMPRESSION_MASK | |
| 0x4000 | ATTRIBUTE_FLAG_ENCRYPTED | Is encrypted |
| 0x8000 | ATTRIBUTE_FLAG_SPARSE | Is sparse |
TODO: determine the meaning of compression flag in the context of resident $INDEX_ROOT. Do the data flags have a different meaning for different attributes?
Resident MFT attribute
The resident MFT attribute data is present when the non-resident flag is not set (0). The resident data is 8 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Data size (or value length) | |
| 4 | 2 | Data offset (or value size), which contains an offset relative from the start of the MFT attribute | |
| 6 | 1 | Indexed flag | |
| 7 | 1 | 0x00 | Unknown (Padding) |
TODO: determine the meaning of indexed flag bits, other than the LSB
Non-resident MFT attribute
The non-resident MFT attribute data is present when the non-resident flag is set (1). The non-resident data is 48 or 56 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | First (or lowest) Virtual Cluster Number (VCN) of the data | |
| 8 | 8 | Last (or highest) Virtual Cluster Number (VCN) of the data | |
| 16 | 2 | Data runs offset (or mappings pairs offset), which contains an offset relative from the start of the MFT attribute | |
| 18 | 2 | Compression unit size, which contains the compression unit size as 2^(n) number of cluster blocks | |
| 20 | 4 | Unknown (Padding) | |
| 24 | 8 | Allocated data size (or allocated length), which contains the allocated data size in number of bytes. This value is not valid if the first VCN is nonzero | |
| 32 | 8 | Data size (or file size), which contains the data size in number of bytes. This value is not valid if the first VCN is nonzero | |
| 40 | 8 | Valid data size (or valid data length), which contains the valid data size in number of bytes. This value is not valid if the first VCN is nonzero | |
| If compression unit size > 0 | |||
| 48 | 8 | Compressed data size | |
The total size of the data runs should be larger or equal to the data size.
Note that Windows will fill data beyond the valid data size with 0-byte values. The data size remains unchanged. This applies to compressed and uncompressed data. If the first VCN is zero a valid data size of 0 represents a file entirely filled with 0-byte values.
TODO: determine the meaning of a VCN of -1
For more information about compressed MFT attributes see compression.
Attribute name
The attribute name is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | ... | Name, which contains an UCS-2 little-endian string without end-of-string character |
Data runs
The data runs are stored in a variable size (data) runlist. This runlist consists of runlist elements.
A runlist element is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 4 bits | Number of cluster blocks value size, which contains the number of bytes used to store the data run size | |
| 0.4 | 4 bits | Cluster block number value size, which contains the number of bytes used to store the data run size | |
| 1 | Size value size | Data run number of cluster blocks, which contains the number of cluster blocks | |
| ... | Cluster block number value size | Data run cluster block number |
The data run cluster block number is a singed value, where the MSB is the singed bit, e.g. if the data run cluster block contains “dbc8” it corresponds to the 64-bit value 0xffffffffffffdbc8.
The first data run offset contains the absolute cluster block number where successive data run offsets are relative to the last data run offset.
Note that the cluster block number byte size is the first nibble when reading the byte stream, but here it is represented as the upper nibble of the first byte.
The last runlist element is (0, 0), which is stored as a 0-byte value.
According to NTFS documentation the size of the runlist is rounded up to the next multitude of 4 bytes, but based on analysis of data samples it seems that the size of the trailing data can be even larger than 3 and are not always 0-byte values.
TODO: provide examples of data runs
Sparse data runs
The MFT attribute data flag (ATTRIBUTE_FLAG_SPARSE) indicates if the data stream is sparse or not, where the runlist can contain both sparse and non-sparse data runs.
A sparse data run has a cluster block number value size of 0, representing there is no offset (cluster block number). A sparse data run is filled with 0-byte values.
Compressed data streams also define sparse data runs without setting the ATTRIBUTE_FLAG_SPARSE flag.
Note that $BadClus:$Bad also defines a data run with a cluster block number value size of 0, without setting the ATTRIBUTE_FLAG_SPARSE flag.
Compresssed data runs
The MFT attribute data flags (0x00ff) indicate if the data stream is compressed or not.
Windows supports compressed data runs for NTFS file systems with a cluster block size of 4096 bytes or less.
Windows 10 supports Windows Overlay Filter (WOF) compressed data, which stores the LZXPRESS Huffman or LZX compressed data in alternate data stream named WofCompressedData and links it to the default data stream using a reparse point.
The data is stored in compression unit blocks. A compression unit typically consists of 16 cluster blocks. However the actual value is stored in the non-resident MFT attribute.
Also see compression.
The attributes
Known attribute types
The attribute types are stored in the $AttrDef metadata file.
| Value | Identifier | Description |
|---|---|---|
| 0x00000000 | Unused | |
| 0x00000010 | $STANDARD_INFORMATION | Standard information |
| 0x00000020 | $ATTRIBUTE_LIST | Attributes list |
| 0x00000030 | $FILE_NAME | The file or directory name |
| Used in NTFS version 1.2 and earlier | ||
| 0x00000040 | $VOLUME_VERSION | Volume version |
| Used in NTFS version 3.0 and later | ||
| 0x00000040 | $OBJECT_ID | Object identifier |
| Common | ||
| 0x00000050 | $SECURITY_DESCRIPTOR | Security descriptor |
| 0x00000060 | $VOLUME_NAME | Volume label |
| 0x00000070 | $VOLUME_INFORMATION | Volume information |
| 0x00000080 | $DATA | Data stream |
| 0x00000090 | $INDEX_ROOT | Index root |
| 0x000000a0 | $INDEX_ALLOCATION | Index allocation |
| 0x000000b0 | $BITMAP | Bitmap |
| Used in NTFS version 1.2 and earlier | ||
| 0x000000c0 | $SYMBOLIC_LINK | Symbolic link |
| Used in NTFS version 3.0 and later | ||
| 0x000000c0 | $REPARSE_POINT | Reparse point |
| Common | ||
| 0x000000d0 | $EA_INFORMATION | (HPFS) extended attribute information |
| 0x000000e0 | $EA | (HPFS) extended attribute |
| Used in NTFS version 1.2 and earlier | ||
| 0x000000f0 | $PROPERTY_SET | Property set |
| Used in NTFS version 3.0 and later | ||
| 0x00000100 | $LOGGED_UTILITY_STREAM | Logged utility stream |
| Common | ||
| 0x00001000 | First user defined attribute | |
| 0xffffffff | End of attributes marker | |
Attribute chains
Multiple attributes can be chained to make up a single attribute data stream, e.g. the attributes:
- $INDEX_ALLOCATION ($I30) VCN: 0
- $INDEX_ALLOCATION ($I30) VCN: 596
The first attribute will contain the size of the data defined by all the attributes and successive attributes should have a size of 0.
It is assumed that the attributes in a chain must be continuous and defined in-order.
The standard information attribute
The standard information attribute ($STANDARD_INFORMATION) contains the basic file entry metadata. It is stored as a resident MFT attribute.
The standard information data (STANDARD_INFORMATION) is either 48 or 72 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Creation date and time, which contains a FILETIME | |
| 8 | 8 | Last modification (or last written) dat and time, which contains a FILETIME | |
| 16 | 8 | MFT entry last modification date and time, which contains a FILETIME | |
| 24 | 8 | Last access date and time, which contains a FILETIME | |
| 32 | 4 | File attribute flags | |
| 36 | 4 | Unknown (Maximum number of versions) | |
| 40 | 4 | Unknown (Version number) | |
| 44 | 4 | Unknown (Class identifier) | |
| If NTFS version 3.0 or later | |||
| 48 | 4 | Owner identifier | |
| 52 | 4 | Security descriptor identifier, which contains the entry number in the security ID index ($Secure:$SII). Also see Access Control | |
| 56 | 8 | Quota charged | |
| 64 | 8 | Update Sequence Number (USN) | |
Note that MFT entries have been observed without a $STANDARD_INFORMATION attribute, but with other attributes such as $FILE_NAME and an $I30 index.
Recent version of NTFS support case-sentive file names. If a directory is case-sensitive the corresponding $STANDARD_INFORMATION attribute will have a maximum number of versions of 0 and a version number of 1.
The attribute list attribute
The attribute list attribute ($ATTRIBUTE_LIST) is used to store MFT attributes outside the MFT entry, e.g. when the MFT entry is too small to store all the attributes.
The entries in the list reference the location of MFT attributes. The attribute list attribute can be stored as either a resident (for a small amount of data) or non-resident MFT attribute.
Note that MFT entry 0 also can contain an attribute list and allows to store listed attributes beyond the first data run.
The attribute list
An attribute list consists of:
- one or more attribute list entries
The attribute list entry
An attribute list entry (ATTRIBUTE_LIST_ENTRY) is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Attribute type (or type code) | |
| 4 | 2 | Size (or record length), which includes the 6 bytes of the attribute type and size | |
| 6 | 1 | Name size (or name length), which contains the number of characters without the end-of-string character | |
| 7 | 1 | Name offset, which contains an offset relative from the start of the attribute list entry | |
| 8 | 8 | Data first (or lowest) VCN | |
| 16 | 8 | File reference (or segment reference), which contains a reference to the MFT entry that contains (part of) the attribute data | |
| 24 | 2 | Attribute identifier (or instance), which contains an unique identifier to distinguish between attributes that contain segmented data | |
| 26 | ... | Name, which contains an UCS-2 little-endian string without end-of-string character | |
| ... | ... | alignment padding (8-byte alignment), can contain remnant data |
The file name attribute
The file name attribute ($FILE_NAME) contains the basic file system information, like the parent file entry, various date and time values and name. It is stored as a resident MFT attribute.
The file name data (FILE_NAME) is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Parent file reference | |
| 8 | 8 | Creation date and time, which contains a FILETIME | |
| 16 | 8 | Last modification (or last written) date and time, which contains a FILETIME | |
| 24 | 8 | MFT entry last modification date and time, which contains a FILETIME | |
| 32 | 8 | Last access date and time, which contains a FILETIME | |
| 40 | 8 | Allocated (or reserved) file size | |
| 48 | 8 | Data size | |
| 56 | 4 | File attribute flags | |
| If FILE_ATTRIBUTE_REPARSE_POINT is set | |||
| 60 | 4 | Reparse point tag | |
| If FILE_ATTRIBUTE_REPARSE_POINT is not set | |||
| 60 | 4 | Unknown (extended attribute data size) | |
| Common | |||
| 64 | 1 | Name string size, which contains the number of characters without the end-of-string character | |
| 65 | 1 | Namespace of the name string | |
| 66 | ... | Name, which contains an UCS-2 little-endian string without end-of-string character | |
An MFT attribute can contain multiple file name attributes, e.g. for a separate (long) name and short name.
In several cases on a Vista NTFS volume the MFT entry contained both a DOS & Windows and POSIX name space $FILE_NAME attribute. However the directory entry index ($I30) of the parent directory only contained the DOS & Windows name.
In case of a hard link the MFT entry will contain additional file name attributes with the parent file reference of each hard link.
Namespace
| Value | Identifier | Description |
|---|---|---|
| 0 | POSIX | Case-sensitive character set that consists of all Unicode characters except for: "\0" (zero character), "/" (forward slash). The ":" (colon) is valid for NTFS but not for Windows |
| 1 | FILE_NAME_NTFS, WINDOWS | Case-insensitive sub set of the POSIX character set that consists of all Unicode characters except for: " * / : < > ? \ | +. Note that names cannot end with a "." (dot) or " " (space) |
| 2 | FILE_NAME_DOS, DOS | Case-insensitive sub set of the WINDOWS character set that consists of all upper case ASCII characters except for: " * + , / : ; < = > ? \. Note that the name must follow the 8.3 format |
| 3 | DOS_WINDOWS | Both the DOS and WINDOWS names are identical, which is the same as the DOS character set, with the exception that lower case is used as well |
Note that the Windows API function CreateFile allows to create case-sensitive file names when the flag FILE_FLAG_POSIX_SEMANTICS is set.
Long to short name conversion
A short name can be determined from a long name with the following approach. In the long name:
- ignore Unicode characters beyond the first 8-bit (extended ASCII)
- ignore control characters and spaces (character < 0x20)
- ignore non-allowed characters
" * + , / : ; < = > ? \ - ignore dots except the last one, which is used for the extension
- make all letters upper case
Additional observations:
[or]are replaced by an underscore (_)
Make the name unique:
- use the characters 1 to 6 add ~1 and if the long name has an extension add the a dot and its first 3 letters, e.g. “Program Files” becomes “PROGRA~1” or “ ~PLAYMOVIE.REG“ becomes “~PLAYM~1.REG”
- if the name already exists try ~2 up to ~9, e.g. “Program Data”, in the same directory as “Program Files”, becomes “PROGRA~2”
- if the name already exists use a 16-bit hexadecimal value for characters 3 to 6 with ~1, e.g. “x86_microsoft-windows-r..ry-editor.resources_31bf3856ad364e35_6.0.6000.16386_en-us_f89a7b0005d42fd4” in a directory with a lot of file names starting with “x86_microsoft”, becomes “X8FCA6~1.163”
TODO: determine if the behavior is dependent on a setting that can be changed with fsutil
The volume version attribute
The volume version attribute ($VOLUME_VERSION) contains volume version.
TODO: complete section. Need a pre NTFS version 3.0 volume with this attribute. $AttrDef indicates the attribute to be 8 bytes in size.
The object identifier attribute
The object identifier attribute ($OBJECT_ID) contains distributed link tracker properties. It is stored as a resident MFT attribute.
The object identifier attribute data is either 16 or 64 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 16 | Droid file identifier, which contains a GUID | |
| 16 | 16 | Birth droid volume identifier, which contains a GUID | |
| 32 | 16 | Birth droid file identifier, which contains a GUID | |
| 48 | 16 | Birth droid domain identifier, which contains a GUID |
Droid in this context refers to CDomainRelativeObjId.
The security descriptor attribute
TODO: determine if this override any value in $Secure:$SDS?
The security descriptor attribute ($SECURITY_DESCRIPTOR) contains a Windows NT security descriptor. It can be stored as either a resident (for a small amount of data) and non-resident MFT attribute.
TODO: link to security descriptor format documentation
The volume name attribute
The volume name attribute ($VOLUME_NAME) contains the volume label. It is stored as a resident MFT attribute.
The volume name attribute data is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | ... | Volume label, which contains an UCS-2 little-endian string without end-of-string character |
The volume name attribute is used in the $Volume metadata file MFT entry.
The volume information attribute
The volume information attribute ($VOLUME_INFORMATION) contains information about the volume. It is stored as a resident MFT attribute.
The volume information attribute data is 12 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Unknown | |
| 8 | 1 | Major format version | |
| 9 | 1 | Minor format version | |
| 10 | 2 | Volume flags |
The volume information attribute is used in the $Volume metadata file MFT entry.
Volume flags
| Value | Identifier | Description |
|---|---|---|
| 0x0001 | VOLUME_IS_DIRTY | Is dirty |
| 0x0002 | VOLUME_RESIZE_LOG_FILE | Re-size journal ($LogFile) |
| 0x0004 | VOLUME_UPGRADE_ON_MOUNT | Upgrade on next mount |
| 0x0008 | VOLUME_MOUNTED_ON_NT4 | Mounted on Windows NT 4 |
| 0x0010 | VOLUME_DELETE_USN_UNDERWAY | Delete USN in progress |
| 0x0020 | VOLUME_REPAIR_OBJECT_ID | Repair object identifiers |
| 0x0080 | Unknown | |
| 0x4000 | VOLUME_CHKDSK_UNDERWAY | chkdsk in progress |
| 0x8000 | VOLUME_MODIFIED_BY_CHKDSK | Modified by chkdsk |
The data stream attribute
The data stream attribute ($DATA) contains the file data. It can be stored as either a resident (for a small amount of data) and non-resident MFT attribute.
Multiple data attributes for the same data stream can be used in the attribute list to define different parts of the data stream data. The first data stream attribute will contain the size of the entire data stream data. Other data stream attributes should have a size of 0. Also see attribute chains.
The index root attribute
The index root attribute ($INDEX_ROOT) contains the root of the index tree. It is stored as a resident MFT attribute.
Also see the index and the index root.
The index allocation attribute
The index allocation attribute ($INDEX_ALLOCATION) contains an array of index entries. It is stored as a non-resident MFT attribute.
The index allocation attribute itself does not define which attribute type it contains in the index value data. For this information it needs the corresponding index root attribute.
Multiple index allocation attributes for the same index can be used in the attribute list to define different parts of the index allocation data. The first index allocation attribute will contain the size of the entire index allocation data. Other index allocation attributes should have a size of 0. Also see attribute chains.
Also see the index.
The bitmap attribute
The bitmap attribute ($BITMAP) contains the allocation bitmap. It can be stored as either a resident (for a small amount of data) and non-resident MFT attribute.
It is used to maintain information about which entry is used and which is not. Every bit in the bitmap represents an entry. The index is stored byte-wise with the LSB of the byte corresponds to the first allocation element. The allocation element can represent different things:
- an MFT entry in the MFT (nameless) bitmap;
- an index entry in an index ($I30).
The allocation element is allocated if the corresponding bit contains 1 or unallocated if 0.
The symbolic link attribute
The symbolic link attribute ($SYMBOLIC_LINK) contains a symbolic link.
TODO: complete section. Need a pre NTFS version 3.0 volume with this attribute. $AttrDef indicates the attribute is of variable size.
The reparse point attribute
The reparse point attribute ($REPARSE_POINT) contains information about a file system-level link. It is stored as a resident MFT attribute.
Als see the reparse point.
The (HPFS) extended attribute information
The (HPFS) extended attribute information ($EA_INFORMATION) contains information about the extended attribute ($EA).
The extended attribute information data is 8 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Size of an extended attribute entry | |
| 2 | 2 | Number of extended attributes which have the NEED_EA flag set | |
| 4 | 4 | Size of the extended attribute ($EA) data |
The (HPFS) extended attribute
The (HPFS) extended attribute ($EA) contains the extended attribute data.
The extended attribute data is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Offset to next extended attribute entry, where the offset is relative from the start of the extended attribute data | |
| 4 | 1 | Extended attribute flags | |
| 5 | 1 | Number of characters of the extended attribute name | |
| 6 | 2 | Value data size | |
| 8 | ... | The extended attribute name, which contains an ASCII string | |
| ... | ... | Value data | |
| ... | ... | Unknown |
TODO: determine if the name is 2-byte aligned
Extended attribute flags
| Value | Identifier | Description |
|---|---|---|
| 0x80 | NEED_EA | Unknown (Need EA) flag |
TODO: determine what the NEED_EA flag is used for
UNITATTR extended attribute value data
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Unknown (equivalent of st_mode?) |
The property set attribute
The property set attribute ($PROPERTY_SET) contains a property set.
TODO: complete section. Need a pre NTFS version 3.0 volume with this attribute. $AttrDef does not seem to always define this attribute.
The logged utility stream attribute
TODO: complete section
| Value | Identifier | Description |
|---|---|---|
| $EFS | Encrypted NTFS (EFS) | |
| $TXF_DATA | Transactional NTFS (TxF) |
The attribute types
The attribute types are stored in the $AttrDef metadata file.
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 128 | Attribute which contains an UCS-2 little-endian string with end-of-string character. Unused bytes are filled with 0-byte values | |
| 128 | 4 | Attribute type (or type code) | |
| 132 | 8 | Unknown | |
| 140 | 4 | Unknown (flags?) | |
| 144 | 8 | Unknown (minimum attribute size?) | |
| 152 | 8 | Unknown (maximum attribute size?) |
The index
The index structures are used for various purposes one of which are the directory entries.
The root of the index is stored in index root. The index root attribute defines which type of attribute is stored in the index and the root index node.
If the index is too large part of the index is stored in an index allocation attribute with the same attribute name. The index allocation attribute defines a data stream which contains index entries. Each index entry contains an index node.
An index consists of a tree, where both the branch and index leaf nodes contain the actual data. E.g. in case of a directory entries index, any node that contains index value data make up for the directory entries.
The index value data in a branch node signifies the upper bound of the values in the that specific branch. E.g. if directory entries index branch node contains the name “textfile.txt” all names in that index branch are smaller than “textfile.txt”.
Note the actual sorting order is dependent on the collation type defined in the index root attribute.
The index allocation attribute is accompanied by a bitmap attribute with the corresponding attribute name. The bitmap attribute defines the allocation of virtual cluster blocks within the index allocation attribute data stream.
Note that the index allocation attribute can be present even though it is not used.
Common used indexes
Indexes commonly used by NTFS are:
| Value | Identifier | Description |
|---|---|---|
| $I30 | Directory entries (used by directories) | |
| $SDH | Security descriptor hashes (used by $Secure) | |
| $SII | Security descriptor identifiers (used by $Secure) | |
| $O | Object identifiers (used by $ObjId) | |
| $O | Owner identifiers (used by $Quota) | |
| $Q | Quotas (used by $Quota) | |
| $R | Reparse points (used by $Reparse) |
The index root
The index root consists of:
- index root header
- index node header
- an array of index values
The index root header
The index root header is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Attribute type, which contains the type of the indexed attribute or 0 if none | |
| 4 | 4 | Collation type, which contains a value to indicate the ordering of the index entries | |
| 8 | 4 | Index entry size | |
| 12 | 4 | Number of cluster blocks per index entry |
Note that for NTFS version 1.2 the index entry size does not have to match the index entry size in the volume header. The correct size seems to be the value in the index root header.
Collation type
| Value | Identifier | Description |
|---|---|---|
| 0x00000000 | COLLATION_BINARY | Binary, where the first byte is most significant |
| 0x00000001 | COLLATION_FILENAME | UCS-2 strings case-insensitive, where the case folding is stored in $UpCase |
| 0x00000002 | COLLATION_UNICODE_STRING | UCS-2 strings case-sensitive, where upper case letters should come first |
| 0x00000010 | COLLATION_NTOFS_ULONG | Unsigned 32-bit little-endian integer |
| 0x00000011 | COLLATION_NTOFS_SID | NT security identifier (SID) |
| 0x00000012 | COLLATION_NTOFS_SECURITY_HASH | Security hash first, then NT security identifier |
| 0x00000013 | COLLATION_NTOFS_ULONGS | An array of unsigned 32-bit little-endian integer values |
The index entry
The index entry consists of:
- the index entry header
- the index node header
- The fix-up values
- alignment padding (8-byte alignment), contains zero-bytes
- an array of index values
The index entry header
The index entry header is 24 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | "INDX" | Signature |
| 4 | 2 | The fix-up values offset, which contains an offset relative from the start of the index entry header | |
| 6 | 2 | The number of fix-up values | |
| 8 | 8 | Metadata transaction journal sequence number, which contains a $LogFile Sequence Number (LSN) | |
| 16 | 8 | Virtual Cluster Number (VCN) of the index entry |
Note that there can be more fix-up value than supported by the index entry data size.
The index node header
The index node header is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Index values offset, where the offset is relative from the start of the index node header | |
| 4 | 4 | Index node size, where the value includes the size of the index node header | |
| 8 | 4 | Allocated index node size, where the value includes the size of the index node header | |
| 12 | 4 | Index node flags |
In an index entry (index allocation attribute) the index node size includes the size of the fix-up values and the alignment padding following it.
The remainder of the index node contains remnant data and/or zero-byte values.
The index node flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | Is branch node, which is used to indicate if the node is a branch node that has sub nodes |
The index value
The index value is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | File reference | |
| 8 | 2 | Size, which includes the 10 bytes of the file reference and size | |
| 10 | 2 | Key data size | |
| 12 | 4 | Index value flags | |
| If index key data size > 0 | |||
| 16 | ... | Key data | |
| ... | ... | Data | |
| If index value flag 0x00000001 (is branch node) is set | |||
| ... | 8 | Sub node Virtual Cluster Number (VCN) | |
The index values are stored 8 byte aligned.
Note that some other sources define the index value flags as a 16-bit value followed by 2 bytes of padding.
The index value flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | Has sub node, when set the index value contains a sub node Virtual Cluster Number (VCN) | |
| 0x00000002 | Is last, when set the index value is the last in the index values array |
Index key and value data
Directory entry index value
The MFT attribute name of the directory entry index is: $I30.
The directory entry index value contains a file name attribute in the index key data.
Note that the index value data can contain remnant data.
The short and long names of the same file have a separate index values. The short name uses the DOS name space and the long name the WINDOWS name space. Index values with a single name use either the POSIX or DOS_WINDOWS name space.
A hard link to a file in the same directory has separate index values.
Security descriptor hash index value
The MFT attribute name of the security descriptor hash index is: $SDH. It appears to only to be used by the $Secure metadata file.
Also see the security descriptor hash index value.
Security descriptor identifier index value
The MFT attribute name of the security descriptor identifier index is: $SII. It appears to only to be used by the $Secure metadata file.
Also see the security descriptor identifier index value.
Compression
Compressed data-runs
NTFS compression groups 16 cluster blocks together. This group of 16 cluster blocks also named a compression unit, which is either “compressed” or uncompressed.
The term compressed is quoted here because the group of cluster blocks can also contain uncompressed data. A group of cluster blocks is “compressed” when it is compressed size is smaller than its uncompressed data size. Within a group of cluster blocks each of the 16 blocks is “compressed” individually.
The compression unit size is stored in the non-resident MFT attribute. The maximum uncompressed data size is always the cluster size (in most case 4096).
Note that a resident $DATA attribute with the compression type in the data flags is stored uncompressed.
The data runs in the $DATA attribute define cluster block ranges, e.g.
21 02 35 52
This data run defines 2 data blocks starting at block number 21045 followed by 14 sparse blocks. The total number of blocks in the compression unit is 16. Compressed data is stored in the first 2 blocks and the 14 sparse blocks are only there to make sure the data runs add up to the compression unit size. They do not define actual sparse data.
Another example:
21 40 37 52
This data run defines 64 data blocks starting at block number 21047. Since this data run is larger than the compression unit size the data is stored uncompressed.
If the data run was e.g. 60 data blocks followed by 4 sparse blocks the first 3 compression units (blocks 1 to 48) would be uncompressed and the last compression unit (blocks 49 to 64) would be compressed.
Also “sparse data” and “sparse compression unit” data runs can be mixed. If in the previous example the 60 data blocks would be followed by 20 sparse blocks the last compression unit (blocks 65 to 80) would be sparse.
A compression unit can consists of multiple compressed data runs, e.g. 1 data block followed by 4 data blocks followed by 11 sparse blocks. Data runs have been observed where the last data run size does not align with the compression unit size.
The sparse blocks data run can be stored in a subsequent attribute in an attribute chain and can be stored in multiple data runs.
NTFS compression stores the “compressed” data in blocks. Each block has a 2 byte block header.
The block is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Block size | |
| 2 | compressed data size | Uncompressed or LZNT1 compressed data |
The upper 4 bits of the block size are used as flags:
| Bit(s) | Description |
|---|---|
| 0 - 11 | Compressed data size |
| 12 - 14 | Unknown |
| 15 | Data is compressed |
TODO: link to LZNT1 documentation
Windows Overlay Filter (WOF) compressed data
A MFT entry that contains Windows Overlay Filter (WOF) compressed data has the following attributes:
- reparse point attribute with tag 0x80000017, which defines the compression method
- a nameless data attribute that is sparse and contains the uncompressed data size
- a data attribute named WofCompressedData that contains LZXPRESS Huffman or LZX compressed data
| Offset | Size | Value | Description |
|---|---|---|---|
| Chunk offset table | |||
| 0 | ... | Array of 32-bit of 64-bit compressed data chunk offsets, where the offset is relative from the start of the data chunks | |
| Data chunks | |||
| ... | ... | One or more compressed or uncompressed data chunks | |
Note that if the chunk size equals the size of the uncompressed data the chunk is stored (as-is) uncompressed.
The size of the chunk offset table is:
number of chunk offsets = uncompressed size / compression unit size
The offset of the first compressed data chunk is at the end of the chunk offset table and is not stored in the chunk offset table.
If the uncompressed size of a chunk is smaller than the compression unit size the chunk is stored uncompressed.
Also see Windows Overlay Filter (WOF) compression method.
The reparse point
The reparse point is used to create file system-level links. Reparse data is stored in the reparse point attribute. The reparse point data (REPARSE_DATA_BUFFER) is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Reparse point tag | |
| 4 | 2 | Reparse data size | |
| 6 | 2 | 0 | Unknown (Reserved) |
| 8 | ... | Reparse data |
TODO: determine if non-native (Microsoft) reparse points are stored with their GUID
The reparse point tag
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 16 bits | Type | |
| 2.0 | 12 bits | Unknown (Reserved) | |
| 3.4 | 4 bits | Flags |
Reparse point tag flags
| Value | Identifier | Description |
|---|---|---|
| 0x1 | Unknown (Reserved) | |
| 0x2 | Is alias (Name surrogate bit), when this bit is set, the file or directory represents another named entity in the system | |
| 0x4 | Is high-latency media (Reserved) | |
| 0x8 | Is native (Microsoft-bit) |
Known reparse point tags
| Value | Identifier | Description |
|---|---|---|
| 0x00000000 | IO_REPARSE_TAG_RESERVED_ZERO | Unknown (Reserved) |
| 0x00000001 | IO_REPARSE_TAG_RESERVED_ONE | Unknown (Reserved) |
| 0x00000002 | IO_REPARSE_TAG_RESERVED_TWO | Unknown (Reserved) |
| 0x80000005 | IO_REPARSE_TAG_DRIVE_EXTENDER | Used by Home server drive extender |
| 0x80000006 | IO_REPARSE_TAG_HSM2 | Used by Hierarchical Storage Manager Product |
| 0x80000007 | IO_REPARSE_TAG_SIS | Used by single-instance storage (SIS) filter driver |
| 0x80000008 | IO_REPARSE_TAG_WIM | Used by the WIM Mount filter |
| 0x80000009 | IO_REPARSE_TAG_CSV | Used by Clustered Shared Volumes (CSV) version 1 |
| 0x8000000a | IO_REPARSE_TAG_DFS | Used by the Distributed File System (DFS) |
| 0x8000000b | IO_REPARSE_TAG_FILTER_MANAGER | Used by filter manager test harness |
| 0x80000012 | IO_REPARSE_TAG_DFSR | Used by the Distributed File System (DFS) |
| 0x80000013 | IO_REPARSE_TAG_DEDUP | Used by the Data Deduplication (Dedup) |
| 0x80000014 | IO_REPARSE_TAG_NFS | Used by the Network File System (NFS) |
| 0x80000015 | IO_REPARSE_TAG_FILE_PLACEHOLDER | Used by Windows Shell for placeholder files |
| 0x80000016 | IO_REPARSE_TAG_DFM | Used by Dynamic File filter |
| 0x80000017 | IO_REPARSE_TAG_WOF | Used by Windows Overlay Filter (WOF), for either WIMBoot or compression |
| 0x80000018 | IO_REPARSE_TAG_WCI | Used by Windows Container Isolation (WCI) |
| 0x8000001b | IO_REPARSE_TAG_APPEXECLINK | Used by Universal Windows Platform (UWP) packages to encode information that allows the application to be launched by CreateProcess |
| 0x8000001e | IO_REPARSE_TAG_STORAGE_SYNC | Used by the Azure File Sync (AFS) filter |
| 0x80000020 | IO_REPARSE_TAG_UNHANDLED | Used by Windows Container Isolation (WCI) |
| 0x80000021 | IO_REPARSE_TAG_ONEDRIVE | Unknown (Not used) |
| 0x80000023 | IO_REPARSE_TAG_AF_UNIX | Used by the Windows Subsystem for Linux (WSL) to represent a UNIX domain socket |
| 0x80000024 | IO_REPARSE_TAG_LX_FIFO | Used by the Windows Subsystem for Linux (WSL) to represent a UNIX FIFO (named pipe) |
| 0x80000025 | IO_REPARSE_TAG_LX_CHR | Used by the Windows Subsystem for Linux (WSL) to represent a UNIX character special file |
| 0x80000036 | IO_REPARSE_TAG_LX_BLK | Used by the Windows Subsystem for Linux (WSL) to represent a UNIX block special file |
| 0x9000001c | IO_REPARSE_TAG_PROJFS | Used by the Windows Projected File System filter, for files managed by a user mode provider such as VFS for Git |
| 0x90001018 | IO_REPARSE_TAG_WCI_1 | Used by Windows Container Isolation (WCI) |
| 0x9000101a | IO_REPARSE_TAG_CLOUD_1 | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000201a | IO_REPARSE_TAG_CLOUD_2 | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000301a | IO_REPARSE_TAG_CLOUD_3 | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000401a | IO_REPARSE_TAG_CLOUD_4 | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000501a | IO_REPARSE_TAG_CLOUD_5 | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000601a | IO_REPARSE_TAG_CLOUD_6 | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000701a | IO_REPARSE_TAG_CLOUD_7 | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000801a | IO_REPARSE_TAG_CLOUD_8 | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000901a | IO_REPARSE_TAG_CLOUD_9 | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000a01a | IO_REPARSE_TAG_CLOUD_A | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000b01a | IO_REPARSE_TAG_CLOUD_B | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000c01a | IO_REPARSE_TAG_CLOUD_C | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000d01a | IO_REPARSE_TAG_CLOUD_D | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000e01a | IO_REPARSE_TAG_CLOUD_E | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0x9000f01a | IO_REPARSE_TAG_CLOUD_F | Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive |
| 0xa0000003 | IO_REPARSE_TAG_MOUNT_POINT | Junction (or mount point) |
| 0xa000000c | IO_REPARSE_TAG_SYMLINK | Symbolic link |
| 0xa0000010 | IO_REPARSE_TAG_IIS_CACHE | Used by Microsoft Internet Information Services (IIS) caching |
| 0xa0000019 | IO_REPARSE_TAG_GLOBAL_REPARSE | Used by NPFS to indicate a named pipe symbolic link from a server silo into the host silo |
| 0xa000001a | IO_REPARSE_TAG_CLOUD | Used by the Cloud Files filter, for files managed by a sync engine such as Microsoft OneDrive |
| 0xa000001d | IO_REPARSE_TAG_LX_SYMLINK | Used by the Windows Subsystem for Linux (WSL) to represent a UNIX symbolic link |
| 0xa000001f | IO_REPARSE_TAG_WCI_TOMBSTONE | Used by Windows Container Isolation (WCI) |
| 0xa0000022 | IO_REPARSE_TAG_PROJFS_TOMBSTONE | Used by the Windows Projected File System filter, for files managed by a user mode provider such as VFS for Git |
| 0xa0000027 | IO_REPARSE_TAG_WCI_LINK | Used by Windows Container Isolation (WCI) |
| 0xa0001027 | IO_REPARSE_TAG_WCI_LINK_1 | Used by Windows Container Isolation (WCI) |
| 0xc0000004 | IO_REPARSE_TAG_HSM | Used by Hierarchical Storage Manager Product |
| 0xc0000014 | IO_REPARSE_TAG_APPXSTRM | Unknown (Not used) |
Junction or mount point reparse data
A reparse point with tag IO_REPARSE_TAG_MOUNT_POINT (0xa0000003) contains junction or mount point reparse data. The junction or mount point reparse data is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Substitute name offset, where the offset is relative from the start of the reparse name data | |
| 2 | 2 | Substitute name size in bytes, where the size of the end-of-string character is not included | |
| 4 | 2 | Display name offset, where the offset is relative from the start of the reparse name data | |
| 6 | 2 | Display name size in bytes, where the size of the end-of-string character is not included | |
| Reparse name data | |||
| 8 | ... | Substitute name, which contains an UCS-2 little-endian string without end-of-string character | |
| ... | ... | Display name, which contains an UCS-2 little-endian string without end-of-string character | |
Note that it is currently unclear if the names contain an end-of-string character or if they are followed by alignment padding.
TODO: determine what character values like 0x0002 represent in the substitute name
00000010: 5c 00 3f 00 3f 00 02 00 43 00 3a 00 5c 00 55 00 \.?.?... C.:.\.U.
00000020: 73 00 65 00 72 00 73 00 5c 00 74 00 65 00 73 00 s.e.r.s. \.t.e.s.
00000030: 74 00 5c 00 44 00 6f 00 63 00 75 00 6d 00 65 00 t.\.D.o. c.u.m.e.
00000040: 6e 00 74 00 73 00 00 00 n.t.s...
Symbolic link reparse data
A reparse point with tag IO_REPARSE_TAG_SYMLINK (0xa000000c) contains symbolic link reparse data. The symbolic link reparse data is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Substitute name offset, where the offset is relative from the start of the reparse name data | |
| 2 | 2 | Substitute name size in bytes | |
| 4 | 2 | Display name offset, where the offset is relative from the start of the reparse name data | |
| 6 | 2 | Display name size, in bytes | |
| 8 | 4 | Symbolic link flags | |
| Reparse name data | |||
| 12 | ... | Substitute name, which contains an UCS-2 little-endian string without end-of-string character | |
| ... | ... | Display name, which contains an UCS-2 little-endian string without end-of-string character | |
Symbolic link flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | SYMLINK_FLAG_RELATIVE | The substitute name is a path name relative to the directory containing the symbolic link |
Windows Overlay Filter (WOF) reparse data
A reparse point with tag IO_REPARSE_TAG_WOF (0x80000017) contains Windows Overlay Filter (WOF) reparse data. The Windows Overlay Filter (WOF) reparse data is 16 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| External provider information | |||
| 0 | 4 | 1 | Unknown (WOF version) |
| 4 | 4 | 2 | Unknown (WOF provider) |
| Internal provider information | |||
| 8 | 4 | 1 | Unknown (file information version) |
| 12 | 4 | Compression method | |
Windows Overlay Filter (WOF) compression method
| Value | Identifier | Description |
|---|---|---|
| 0 | LZXPRESS Huffman with 4k window (compression unit) | |
| 1 | LZX with 32k window (compression unit) | |
| 2 | LZXPRESS Huffman with 8k window (compression unit) | |
| 3 | LZXPRESS Huffman with 16k window (compression unit) |
TODO: link to LZXPRESS Huffman and LZX documentation
Windows Container Isolation (WCI) reparse data
A reparse point with tag IO_REPARSE_TAG_WCI (0x80000018) contains Windows Container Isolation (WCI) reparse data. The Windows Container Isolation (WCI) reparse data is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | 1 | Version |
| 4 | 4 | 0 | Unknown (reserved) |
| 8 | 16 | Look-up identifier, which contains a GUID | |
| 24 | 2 | Name size in bytes | |
| 26 | ... | Name, which contains an UCS-2 little-endian string without end-of-string character |
The allocation bitmap
The metadata file $Bitmap contains the allocation bitmap.
Every bit in the allocation bitmap represents a block the size of the cluster block, where the LSB is the first bit in a byte.
TODO: describe what the $SRAT data stream is used for.
Access control
The $Secure metadata file contains the security descriptors used for access control.
| Type | Name | Description |
|---|---|---|
| Data | $SDS | Security descriptor data stream, which contains all the Security descriptors on the volume |
| Index | $SDH | Security descriptor hash index |
| Index | $SII | Security descriptor identifier index, which contains the mapping of the security descriptor identifier (in $STANDARD_INFORMATION) to the offset of the security descriptor data (in $Secure:$SDS) |
Security descriptor hash ($SDH) index
The security descriptor hash index value
| Offset | Size | Value | Description |
|---|---|---|---|
| Key data | |||
| 0 | 4 | Security descriptor hash | |
| 4 | 4 | Security descriptor identifier | |
| Value data | |||
| 8 | 4 | Security descriptor hash | |
| 12 | 4 | Security descriptor identifier | |
| 16 | 8 | Security descriptor data offset (in $SDS) | |
| 24 | 4 | Security descriptor data size (in $SDS) | |
| 28 | 4 | Unknown | |
Security descriptor identifier ($SII) index
The security descriptor identifier index value
| Offset | Size | Value | Description |
|---|---|---|---|
| Key data | |||
| 0 | 4 | Security descriptor identifier | |
| Value data | |||
| 4 | 4 | Security descriptor hash | |
| 8 | 4 | Security descriptor identifier | |
| 12 | 8 | Security descriptor data offset (in $SDS) | |
| 20 | 4 | Security descriptor data size (in $SDS) | |
TODO: describe the hash algorithm
Security descriptor ($SDS) data stream
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Security descriptor hash | |
| 4 | 4 | Security descriptor identifier | |
| 12 | 8 | Security descriptor data offset (in $SDS) | |
| 20 | 4 | Security descriptor data size (in $SDS) | |
| 24 | ... | Security descriptor data | |
| ... | ... | Alignment padding (2-byte alignment) |
TODO: link to security descriptor format documentation
The object identifiers
$ObjID:$O
| Offset | Size | Value | Description |
|---|---|---|---|
| Key data | |||
| 0 | 16 | File (or object) identifier, which contains a GUID | |
| Value data | |||
| 4 | 8 | File reference | |
| 12 | 16 | Birth droid volume identifier, which contains a GUID | |
| 28 | 16 | Birth droid file (or object) identifier, which contains a GUID | |
| 44 | 16 | Birth droid domain identifier, which contains a GUID | |
Metadata transaction journal (log file)
TODO: complete section
The metadata file $LogFile contains the metadata transaction journal and consists of:
- Log File Service restart page header
- The fix-up values
Log File service restart page header
The Log File service restart page header (LFS_RESTART_PAGE_HEADER) is 30 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| MULTI_SECTOR_HEADER | |||
| 0 | 4 | "CHKD", "RCRD", "RSTR" | Signature |
| 4 | 2 | The fix-up values (or update sequence array) offset, which contain an offset relative from the start of the restart page header | |
| 6 | 2 | The number of fix-up values (or update sequence array size) | |
| Common | |||
| 8 | 8 | Checkdisk last LSN | |
| 16 | 4 | System page size | |
| 20 | 4 | Log page size | |
| 24 | 2 | Restart offset | |
| 26 | 2 | Minor format version | |
| 28 | 2 | Major format version | |
Log File service restart page versions
| Major format version | Remarks |
|---|---|
| -1 | Beta Version |
| 0 | Transition |
| 1 | Update sequence support |
USN change journal
The metadata file $Extend$UsnJrnl contains the USN change journal. It is a sparse file in which NTFS stores records of changes to files and directories. Applications make use of the journal to respond to file and directory changes as they occur, like e.g. the Windows File Replication Service (FRS) and the Windows (Desktop) Search service.
The USN change journal consists of:
- the $UsnJrnl:$Max data stream, containing metadata like the maximum size of the journal
- the $UsnJrnl:$J data stream, containing the update (or change) entries. The $UsnJrnl:$J data stream is sparse.
USN change journal metadata
The USN change journal metadata is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 8 | Maximum size in bytes | |
| 8 | 8 | Allocation (size) delta in bytes | |
| 16 | 8 | Update (USN) journal identifier, which contains a FILETIME | |
| 24 | 8 | Unknown (empty) |
USN change journal entries
The $UsnJrnl:$J data stream consists of an array of USN change journal entries. The USN change journal entries are stored on a per block-basis and 8-byte aligned. Therefore the remainder of the block can contain 0-byte values.
TODO: describe journal block size
Once the stream reaches maximum size the earliest USN change journal entries are removed from the stream and replaced with a sparse data run.
USN change journal entry
The USN change journal entry (USN_RECORD_V2) is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Entry (or record) size | |
| 4 | 2 | 2 | Major format version |
| 6 | 2 | 0 | Minor format version |
| 8 | 8 | File reference | |
| 16 | 8 | Parent file reference | |
| 24 | 8 | Update sequence number (USN), which contains the file offset of the USN change journal entry which is used as a unique identifier | |
| 32 | 8 | Update date and time, which contains a FILETIME | |
| 40 | 4 | Update reason flags | |
| 44 | 4 | Update source flags | |
| 48 | 4 | Security descriptor identifier, which contains the entry number in the security ID index ($Secure:$SII). Also see Access Control | |
| 52 | 4 | File attribute flags | |
| 56 | 2 | Name size in bytes | |
| 58 | 2 | Name offset, which is relative from the start of the USN change journal entry | |
| 60 | (name size) | Name, which contains an UCS-2 little-endian string without end-of-string character | |
| ... | ... | 0x00 | Unknown (Padding) |
Update reason flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | USN_REASON_DATA_OVERWRITE | The data in the file or directory is overwritten |
| 0x00000002 | USN_REASON_DATA_EXTEND | The file or directory is extended |
| 0x00000004 | USN_REASON_DATA_TRUNCATION | The file or directory is truncated |
| 0x00000010 | USN_REASON_NAMED_DATA_OVERWRITE | One or more named data streams ($DATA attributes) of file were overwritten |
| 0x00000020 | USN_REASON_NAMED_DATA_EXTEND | One or more named data streams ($DATA attributes) of file were extended |
| 0x00000040 | USN_REASON_NAMED_DATA_TRUNCATION | One or more named data streams ($DATA attributes) of a file were truncated |
| 0x00000100 | USN_REASON_FILE_CREATE | The file or directory was created |
| 0x00000200 | USN_REASON_FILE_DELETE | The file or directory was deleted |
| 0x00000400 | USN_REASON_EA_CHANGE | The extended attributes of the file were changed |
| 0x00000800 | USN_REASON_SECURITY_CHANGE | The access rights (security descriptor) of a file or directory were changed |
| 0x00001000 | USN_REASON_RENAME_OLD_NAME | The name changed, where the USN change journal entry contains the old name |
| 0x00002000 | USN_REASON_RENAME_NEW_NAME | The name changed, where the USN change journal entry contains the new name |
| 0x00004000 | USN_REASON_INDEXABLE_CHANGE | Content indexed status changed. The file attribute FILE_ATTRIBUTE_NOT_CONTENT_INDEXED was changed |
| 0x00008000 | USN_REASON_BASIC_INFO_CHANGE | Basic file or directory attributes changed. One or more file or directory attributes were changed e.g. read-only, hidden, system, archive, or sparse attribute, or one or more time stamps |
| 0x00010000 | USN_REASON_HARD_LINK_CHANGE | A hard link was created or deleted |
| 0x00020000 | USN_REASON_COMPRESSION_CHANGE | The file or directory was compressed or decompressed |
| 0x00040000 | USN_REASON_ENCRYPTION_CHANGE | The file or directory was encrypted or decrypted |
| 0x00080000 | USN_REASON_OBJECT_ID_CHANGE | The object identifier of a file or directory was changed |
| 0x00100000 | USN_REASON_REPARSE_POINT_CHANGE | The reparse point that in a file or directory was changed, or a reparse point was added to or deleted from a file or directory |
| 0x00200000 | USN_REASON_STREAM_CHANGE | A named data stream ($DATA attribute) is added to or removed from a file, or a named stream is renamed |
| 0x00400000 | USN_REASON_TRANSACTED_CHANGE | Unknown |
| 0x80000000 | USN_REASON_CLOSE | The file or directory was closed |
Update source flags
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | USN_SOURCE_DATA_MANAGEMENT | The operation added a private data stream to a file or directory. The modifications did not change the application data |
| 0x00000002 | USN_SOURCE_AUXILIARY_DATA | The operation was caused by the operating system. Although a write operation is performed on the item, the data was not changed |
| 0x00000004 | USN_SOURCE_REPLICATION_MANAGEMENT | The operation was caused by file replication |
Alternate data streams (ADS)
| Data stream name | Description |
|---|---|
| "♣BnhqlkugBim0elg1M1pt2tjdZe", "♣SummaryInformation", "{4c8cc155-6c1e-11d1-8e41-00c04fb9386d}" | Used to store properties, where ♣ (black club) is Unicode character U+2663 |
| "{59828bbb-3f72-4c1b-a420-b51ad66eb5d3}.XPRESS" | Used during remote differential compression |
| "AFP_AfpInfo", "AFP_Resource" | Used to store Macintosh operating system property lists |
| "encryptable" | Used to store attributes relating to thumbnails in the thumbnails database |
| "favicon" | Used to store favorite icons for web pages |
| "ms-properties" | Used to store properties |
| "OECustomProperty" | Used to store custom properties related to email files |
| "Zone.Identifier" | Used to store the Internet Explorere URL security zone of the origin |
ms-properties
The ms-properties alternate data stream contains a Windows Serialized Property Store (SPS).
TODO: link to Windows Serialized Property Store (SPS) format documentation
Zone.Identifier
The Zone.Identifier alternate data stream contains ASCII text in the form:
[ZoneTransfer]
ZoneId=3
Where ZoneId refers to the Internet Explorer URL security zone of the origin.
Transactional NTFS (TxF)
As of Vista Transactional NTFS (TxF) was added.
In TxF the resource manager (RM) keeps track of transactional metadata and log files. The TxF related metadata files are stored in the metadata directory:
$Extend\$RmMetadata
Resource manager repair information
The resource manager repair information metadata file: $Extend$RmMetadata$Repair consists of the following data streams:
- the default (unnamed) data stream
- the $Config data stream, contains the resource manager repair configuration information
TODO: determine the purpose of the default (unnamed) data stream
Resource manager repair configuration information
TODO: complete section
The $Repair:$Config data streams contains:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 4 | Unknown | |
| 4 | 4 | Unknown |
Transactional NTFS (TxF) metadata directory
TODO: complete section
The transactional NTFS (TxF) metadata directory: $Extend$RmMetadata$Txf is used to isolate files for delete or overwrite operations.
TxF Old Page Stream (TOPS) file
The TxF Old Page Stream (TOPS) file: $Extend$RmMetadata$TxfLog$Tops consists of the following data streams:
- the default (unnamed) data stream, contains metadata about the resource manager, such as its GUID, its CLFS log policy, and the LSN at which recovery should start
- the $T data stream, contains the file data that is partially overwritten by a transaction as opposed to a full overwrite, which would move the file into the Transactional NTFS (TxF) metadata directory
TxF Old Page Stream (TOPS) metadata
TODO: complete section
The $Tops default (unnamed) data streams contains:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 2 | Unknown | |
| 2 | 2 | Size of TOPS metadata | |
| 4 | 4 | Unknown (Number of resource managers/streams?) | |
| 8 | 16 | Resource Manager (RM) identifier, which contains a GUID | |
| 24 | 8 | Unknown (empty) | |
| 32 | 8 | Base (or log start) LSN of TxFLog stream | |
| 40 | 8 | Unknown | |
| 48 | 8 | Last flushed LSN of TxFLog stream | |
| 56 | 8 | Unknown | |
| 64 | 8 | Unknown (empty) | |
| 72 | 8 | Unknown (Restart LSN?) | |
| 80 | 20 | Unknown |
TxF Old Page Stream (TOPS) file data
The $Tops:$T data streams contains the file data that is partially overwritten by a transaction. It consists of multiple pending transaction XML-documents.
TODO: describe start of each sector containing 0x0001
A pending transaction XML-document starts with an UTF-8 byte-order-mark. Is roughly contains the following data:
<?xml version='1.0' encoding='utf-8'?>
<PendingTransaction Version="2.0" Identifier="...">
<Transactions>
<Transaction TransactionId="...">
<Install Application="..., Culture=..., Version=..., PublicKeyToken=...,
ProcessorArchitecture=..., versionScope=..."
RefGuid="..."
RefIdentifier="..."
RefExtra="..."/>
...
</Transaction>
</Transactions>
<ChangeList>
<Change Family="..., Culture=..., PublicKeyToken=...,
ProcessorArchitecture=..., versionScope=..."
New="..."/>
...
</ChangeList>
<POQ>
<BeginTransaction id="..."/>
<CreateFile path="..."
fileAttribute="..."/>
<DeleteFile path="..."/>
<MoveFile source="..." destination="..."/>
<HardlinkFile source="..." destination="..."/>
<SetFileInformation path="..."
securityDescriptor="binary base64:..."
flags="..."/>
<CreateKey path="..."/>
<SetKeyValue path="..."
name="..."
type="..."
encoding="base64"
value="..."/>
<DeleteKeyValue path="..."
name="..."/>
...
</POQ>
<InstallerQueue Length="...">
<Action Installer="..."
Mode="..."
Phase="..."
Family="..., Culture=..., PublicKeyToken=...,
ProcessorArchitecture=..., versionScope=..."
Old="..."
New="..."/>
...
</InstallerQueue >
</PendingTransaction>
Transactional NTFS (TxF) Common Log File System (CLFS) files
TxF uses a Common Log File System (CLFS) log store and the logged utility stream attribute named $TXF_DATA.
TODO: link to CLFS format documentation
The base log file (BLF) of the TxF log store is:
$Extend\$RmMetadata\$TxfLog\TxfLog.blf
Commonly the corresponding container files are:
$Extend\$RmMetadata\$TxfLog\TxfLogContainer00000000000000000001
$Extend\$RmMetadata\$TxfLog\TxfLogContainer00000000000000000002
TxF uses a multiplexed log store which contains the following streams:
- the KtmLog stream used for Kernel Transaction Manager (KTM) metadata records
- TxfLog stream, which contains the TxF log records.
Transactional data logged utility stream attribute
The transactional data ($TXF_DATA) logged utility stream attribute is 56 bytes in size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 6 | Unknown (remnant data) | |
| 6 | 8 | Resource manager root file reference, which contains an NTFS file reference that refers to the MFT | |
| 14 | 8 | Unknown (USN index?) | |
| 22 | 8 | File identifier (TxID), which contains a TxF file identifier | |
| 30 | 8 | Data LSN, which contains a CLFS LSN of file data transaction records | |
| 38 | 8 | Metadata LSN, which contains a CLFS LSN of file system metadata transaction records | |
| 46 | 8 | Directory index LSN, which contains a CLFS LSN of directory index transaction records | |
| 54 | 2 | Unknown (Flags?) |
Note that a single MFT entry can contain multiple Transactional data logged utility stream attributes.
Windows definitions
File attribute flags
The file attribute flags consist of the following values:
| Value | Identifier | Description |
|---|---|---|
| 0x00000001 | FILE_ATTRIBUTE_READONLY | Is read-only |
| 0x00000002 | FILE_ATTRIBUTE_HIDDEN | Is hidden |
| 0x00000004 | FILE_ATTRIBUTE_SYSTEM | Is a system file or directory |
| 0x00000008 | Is a volume label, which is not used by NTFS | |
| 0x00000010 | FILE_ATTRIBUTE_DIRECTORY | Is a directory, which is not used by NTFS |
| 0x00000020 | FILE_ATTRIBUTE_ARCHIVE | Should be archived |
| 0x00000040 | FILE_ATTRIBUTE_DEVICE | Is a device, which is not used by NTFS |
| 0x00000080 | FILE_ATTRIBUTE_NORMAL | Is normal file. Note that none of the other flags should be set |
| 0x00000100 | FILE_ATTRIBUTE_TEMPORARY | Is temporary |
| 0x00000200 | FILE_ATTRIBUTE_SPARSE_FILE | Is a sparse file |
| 0x00000400 | FILE_ATTRIBUTE_REPARSE_POINT | Is a reparse point or symbolic link |
| 0x00000800 | FILE_ATTRIBUTE_COMPRESSED | Is compressed |
| 0x00001000 | FILE_ATTRIBUTE_OFFLINE | Is offline. The data of the file is stored on an offline storage |
| 0x00002000 | FILE_ATTRIBUTE_NOT_CONTENT_INDEXED | Do not index content. The content of the file or directory should not be indexed by the indexing service |
| 0x00004000 | FILE_ATTRIBUTE_ENCRYPTED | Is encrypted |
| 0x00008000 | Unknown (seen on Windows 95 FAT) | |
| 0x00010000 | FILE_ATTRIBUTE_VIRTUAL | Is virtual |
The following flags are mainly used in the file name attribute and sparsely in the standard information attribute. It could be that they have a different meaning in both types of attributes or that the standard information flags are not updated. For now the latter is assumed.
| Value | Identifier | Description |
|---|---|---|
| 0x10000000 | Unknown (Is directory or has $I30 index? Note that an $Extend directory without this flag has been observed) | |
| 0x20000000 | Is index view |
Corruption scenarios
Data steam with inconsistent data flags
An MFT entry contains an $ATTRIBUTE_LIST attribute that contains multiple $DATA attributes. The $DATA attributes define a LZNT1 compressed data stream though only the first $DATA attribute has the compressed data flag set.
Note that it is unclear if this is a corruption scenario or not.
MFT entry: 220 information:
Is allocated : true
File reference : 220-59
Base record file reference : Not set (0)
Journal sequence number : 51876429013
Number of attributes : 5
Attribute: 1
Type : $STANDARD_INFORMATION (0x00000010)
Creation time : Jun 05, 2019 06:56:26.032730300 UTC
Modification time : Oct 05, 2019 06:56:04.150940700 UTC
Access time : Oct 05, 2019 06:56:04.150940700 UTC
Entry modification time : Oct 05, 2019 06:56:04.150940700 UTC
Owner identifier : 0
Security descriptor identifier : 5862
Update sequence number : 11553149976
File attribute flags : 0x00000820
Should be archived (FILE_ATTRIBUTE_ARCHIVE)
Is compressed (FILE_ATTRIBUTE_COMPRESSED)
Attribute: 2
Type : $ATTRIBUTE_LIST (0x00000020)
Attribute: 3
Type : $FILE_NAME (0x00000030)
Parent file reference : 33996-57
Creation time : Jun 05, 2019 06:56:26.032730300 UTC
Modification time : Oct 05, 2019 06:56:03.510061800 UTC
Access time : Oct 05, 2019 06:56:03.510061800 UTC
Entry modification time : Oct 05, 2019 06:56:03.510061800 UTC
File attribute flags : 0x00000020
Should be archived (FILE_ATTRIBUTE_ARCHIVE)
Namespace : POSIX (0)
Name : setupapi.dev.20191005_085603.log
Attribute: 4
Type : $DATA (0x00000080)
Data VCN range : 513 - 1103
Data flags : 0x0000
Attribute: 5
Type : $DATA (0x00000080)
Data VCN range : 0 - 512
Data size : 4487594 bytes
Data flags : 0x0001
Directory entry with outdated file reference
The directory entry: \ProgramData\McAfee\Common Framework\Task\5.ini
File entry:
Path : \ProgramData\McAfee\Common Framework\Task\5.ini
File reference : 51106-400
Name : 5.ini
Parent file reference : 65804-10
Size : 723
Creation time : Sep 16, 2011 20:47:54.561041200 UTC
Modification time : Apr 07, 2012 21:07:02.684060000 UTC
Access time : Apr 07, 2012 21:07:02.652810200 UTC
Entry modification time : Apr 07, 2012 21:07:02.684060000 UTC
File attribute flags : 0x00002020
Should be archived (FILE_ATTRIBUTE_ARCHIVE)
Content should not be indexed (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED)
The corresponding MFT entry:
MFT entry: 51106 information:
Is allocated : true
File reference : 51106-496
Base record file reference : Not set (0)
Journal sequence number : 0
Number of attributes : 3
Attribute: 1
Type : $STANDARD_INFORMATION (0x00000010)
Creation time : Sep 16, 2011 20:47:54.561041200 UTC
Modification time : Apr 07, 2012 21:07:02.684060000 UTC
Access time : Apr 07, 2012 21:07:02.652810200 UTC
Entry modification time : Apr 07, 2012 21:07:02.684060000 UTC
Owner identifier : 0
Security descriptor identifier : 1368
Update sequence number : 1947271600
File attribute flags : 0x00002020
Should be archived (FILE_ATTRIBUTE_ARCHIVE)
Content should not be indexed (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED)
Attribute: 2
Type : $FILE_NAME (0x00000030)
Parent file reference : 65804-10
Creation time : Sep 16, 2011 20:47:54.561041200 UTC
Modification time : Apr 07, 2012 21:07:02.652810200 UTC
Access time : Apr 07, 2012 21:07:02.652810200 UTC
Entry modification time : Apr 07, 2012 21:07:02.652810200 UTC
File attribute flags : 0x00002020
Should be archived (FILE_ATTRIBUTE_ARCHIVE)
Content should not be indexed (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED)
Namespace : DOS and Windows (3)
Name : 1.ini
Attribute: 3
Type : $DATA (0x00000080)
Data size : 723 bytes
Data flags : 0x0000
TODO: determine if $LogFile could be used to recover from this corruption scenario
LZNT1 compressed block with data size of 0
Not sure if this is a corruption scenario or a data format edge case.
A compression unit (index 30) consisting of the following data runs:
reading data run: 60.
data run:
00000000: 11 01 01 ...
value sizes : 1, 1
number of cluster blocks : 1 (size: 4096)
cluster block number : 687143 (1) (offset: 0xa7c27000)
reading data run: 61.
data run:
00000000: 01 0f ..
value sizes : 1, 0
number of cluster blocks : 15 (size: 61440)
cluster block number : 0 (0) (offset: 0x00000000)
Is sparse
Contains the following data:
a7c27000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
...
a7c27ff0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
This relates to an empty LZNT1 compressed block.
compressed data offset : 0 (0x00000000)
compression chunk header : 0x0000
compressed chunk size : 1
signature value : 0
is compressed flag : 0
It was observed in 2 differnt NTFS implementations that the entire block is filled with 0-byte values.
TODO: verify behavior of Windows NTFS implementation.
Truncated LZNT1 compressed block
Not sure if this is a corruption scenario or a data format edge case.
A compression unit (index 0) consisting of the following data runs:
reading data run: 0.
data run:
00000000: 31 08 48 d8 01 1.H..
value sizes : 1, 3
number of cluster blocks : 8 (size: 32768)
cluster block number : 120904 (120904) (offset: 0x1d848000)
reading data run: 1.
data run:
00000000: 01 08 ..
value sizes : 1, 0
number of cluster blocks : 8 (size: 32768)
cluster block number : 0 (0) (offset: 0x00000000)
Is sparse
Contains the following data:
1d848000 bd b7 50 44 46 50 00 01 00 01 00 40 e0 00 07 0b |..PDFP.....@....|
...
1d84c000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
1d84fff0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
This relates to a LZNT1 compressed block that appears to be truncated at offset 16384 (0x00004000).
compressed data offset : 16384 (0x00004000)
compression flag byte : 0x00
Different behavior was observed in 2 differnt NTFS implementations:
- one implementation fills the compressed block with the uncompressed data it could read and the rest with with 0-byte values
- another implementation seems to provide the data that was already in its buffer
TODO: verify behavior of Windows NTFS implementation.
References
- How NTFS Works, by Microsoft
- Master File Table, by Microsoft
- NTFS Attribute Types, by Microsoft
- File Attribute Constants, by Microsoft
- Reparse Tags, by Microsoft
- NTFS documentation, by Richard Russon
- ATTRIBUTE_LIST_ENTRY structure, by Microsoft
- ATTRIBUTE_RECORD_HEADER structure, by Microsoft
- FILE_RECORD_SEGMENT_HEADER structure, by Microsoft
- MULTI_SECTOR_HEADER structure, by Microsoft
- REPARSE_DATA_BUFFER structure (ntifs.h), by Microsoft
- REPARSE_DATA_BUFFER_EX structure (ntifs.h), by Microsoft
- USN_RECORD_V2, by Microsoft
- Zone.Identifier Stream Name, by Microsoft
- the Internet Explorer URL security zone, by Microsoft
- ntfs_layout.h, by Anton Altaparmakov
Assorted formats
Property list (plist) format
The property list (plist) formats are used to store various kinds of data, for example configuration data. The format is know to be used stand-alone as well as embedded in other data formats.
Overview
Known plist formats are:
- ASCII plist format
- Binary plist format
- XML plist format
TODO: What about other plist formats like JSON?
Value types
| Type | Description |
|---|---|
| array | Collection of plist values without key |
| boolean | Boolean value |
| data | Binary data |
| date | Date and time value |
| dictionary | Collection of plist values with key |
| integer | Signed integer value |
| real | Floating-point value |
| string | String value |
ASCII plist format
TODO: complete section
Binary plist format
A binary plist file consists of:
- header
- object table
- offset table
- trailer
| Characteristics | Description |
|---|---|
| Byte order | big-endian |
| Date and time values | Number of seconds since Jan 1, 2001 00:00:00 UTC |
| Character strings | UTF-16 big-endian |
Binary plist header
The binary plist header (CFBinaryPlistHeader) is 8 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 6 | "bplist" | Signature |
| 6 | 2 | Format version |
Format versions
| Version | Description |
|---|---|
| "00" | Supported as of Tiger |
| "01" | Supported as of Leopard |
| "0x" | Supported as of Snow Leopard, where x is any character |
Object table
The object table consists of:
- zero or more objects
Objects are of variable size and consist of:
- an object maker byte
- (optional) object data
Object marker byte
| Value | Identifier | Description |
|---|---|---|
| 0x00 | kCFBinaryPlistMarkerNull | Empty value (NULL) |
| 0x08 | kCFBinaryPlistMarkerFalse | Boolean False |
| 0x09 | kCFBinaryPlistMarkerTrue | Boolean True |
| 0x0f | kCFBinaryPlistMarkerFill | Unknown (Fill byte?) |
| 0x1# | kCFBinaryPlistMarkerInt | Integer, where 2^# is the number of bytes |
| 0x2# | kCFBinaryPlistMarkerInt | Floating point, where 2^# is the number of bytes |
| 0x33 | kCFBinaryPlistMarkerDate | Date and time value, which is stored as a 64-bits floating point that contains the number of seconds since Jan 1, 2001 00:00:00 UTC |
| 0x4# | kCFBinaryPlistMarkerData | Binary data, where # is the number of bytes. If # is 15 then the object marker byte is followed by a 32-bit integer that contains the size of the data |
| 0x5# | kCFBinaryPlistMarkerASCIIString | ASCII string, where # is the number of characters. If # is 15 then the object marker byte is followed by an integer object that contains the number of characters in the string. The string is stored in ASCII (with codepage?) without an end-of-string marker |
| 0x6# | kCFBinaryPlistMarkerUnicode16String | Unicode string, where # is the number of characters. If # is 15 then the object marker byte is followed by an integer object that contains the number of characters in the string. The string is stored in UTF-16 big-endian without an end-of-string marker |
| 0x7# | Unused | |
| 0x8# | kCFBinaryPlistMarkerUID | UID, where # + 1 is the number of bytes |
| 0x9# | Unused | |
| 0xa# | kCFBinaryPlistMarkerArray | Array of objects, where # is the number of elements. If # is 15 then the object marker byte is followed by an integer object that contains the number of elements in the array |
| 0xb# | Unused | |
| 0xc# | kCFBinaryPlistMarkerSet | Set of objects, where # is the number of elements. If # is 15 then the object marker byte is followed by an integer object that contains the number of ele,emts in the set |
| 0xd# | kCFBinaryPlistMarkerDict | Dictionary of key value pairs, where # is the number of key value pairs. If # is 15 then the object marker byte is followed by an integer object that contains the number of key value pairs in the dictionary |
| 0xe# | Unused | |
| 0xf# | Unused |
Array object
The array object consists of:
- array object marker with number of elements
- array of object references that identify the element objects.
- the element object data
The byte size of the object reference is defined in the trailer. An object reference of 1 will refer to the first object in the (object) offset table.
Set object
The set object consists of:
- set object marker with number of elements
- array of object references that identify the element objects.
- the element object data
The byte size of the object reference is defined in the trailer. An object reference of 1 will refer to the first object in the (object) offset table.
Dictionary object
The dictionary object consists of:
- dictionary object marker with number of key and value pairs
- array of key references that identify key objects.
- array of object references that identify the value objects.
- the key/value object data
The byte size of the key and object reference is defined in the trailer. A key and object reference of 1 will refer to the first object in the (object) offset table.
(Object) offset table
The offset table consists of an array of offsets. The trailer defines:
- The location of the offset table
- The offset byte size
- The number of offsets in the table
The offset values are relative from the start of the file.
Binary plist trailer
The binary plist trailer (CFBinaryPlistTrailer) is 32 bytes in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 5 x 1 | 0 | Unknown (0-byte values) |
| 5 | 1 | 0 | Unknown (Sort version) |
| 6 | 1 | Offset byte size | |
| 7 | 1 | Key and object reference byte size | |
| 8 | 8 | Number of objects | |
| 16 | 8 | Root (or top-level) object | |
| 24 | 8 | Offset table offset, where the offset is relative to the start of the file |
XML plist format
A XML plist file consists of:
- optional XML declaration
- optional Document Type Definition (DTD)
- plist root XML element
- key-value pair XML elements
For example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist SYSTEM "file://localhost/System/Library/DTDs/PropertyList.dtd">
<plist version="1.0">
...
</plist>
Zlib compressed data
Zlib compression is commonly used in file formats. The zlib compressed data format, as defined in RFC1950, allows for multiple techniques but only the Deflate compression method, a variation of LZ77, is used.
Overview
Zlib compressed data consist of:
- data header
- compressed data
- Adler-32 checksum of the uncompressed data
Characteristics
| Characteristics | Description |
|---|---|
| Byte order | big-endian |
Data header
The data header is 2 or 6 bytes in size and consist of:
| Offset | Size | Value | Description |
|---|---|---|---|
| The bit values are stored a 8-bit values | |||
| 0.0 | 4 bits | Compression method | |
| 0.4 | 4 bits | Compression information | |
| Flags | |||
| 1.0 | 5 bits | Check bits | |
| 1.5 | 1 bit | Preset dictionary flag | |
| 1.6 | 2 bits | Compression level. The compression level is used mainly for re-compression | |
| If the dictionary identifier flag is set | |||
| 2 | 4 | Preset dictionary identifier, which contains an Adler-32 used to identifier the preset dictionary | |
| Common | |||
| ... | ... | Compressed data | |
| ... | 4 | Checksum, which contains an Adler-32 of the compressed data | |
The check bits value must be such that when the first 2 bytes are represented as a 16-bit unsigned integer in big-endian byte order the value is a multiple of 31, such that:
((first * 256) + second) % 31 = 0
Compression method
| Value | Identifier | Description |
|---|---|---|
| 8 | Deflate (RFC1951), with a maximum window size of 32 KiB | |
| 15 | Reserved for additional header data |
Note that RFC1950 only defines 8 as a valid compression method.
Compression information
The value of the compression information is dependent on the compression method.
Compression information - compression method 8 (Deflate)
For compression method 8 (Deflate) the compression information contains the base-2 logarithm of the LZ77 window size minus 8.
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.0 | 4 bits | Window size, which consists of a base-2 logarithm (2n), with a maximum value of 7 (32 KiB) |
To determine the corresponding window size:
1 << (7 + 8)
E.g. a compression information value of 7 indicates a 32768 bytes window size. Values larger than 7 are not allowed according to RFC1950 and thus the maximum window size is 32768 bytes.
Compression level
| Value | Identifier | Description |
|---|---|---|
| 0 | Fastest | |
| 1 | Fast | |
| 2 | Default | |
| 3 | Slowest, maximum compression |
Compressed data
Deflate compressed data
The deflate compressed data consists of one or more deflate compressed blocks. Each block consists of:
- block header
- block data
Note that a block can reference uncompressed data that is stored in a previous block.
Block header
The block header is 3 bits in size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0 | 1 bit | Last block (in stream) marker, where 1 represents the last block and 0 otherwise | |
| 0.1 | 2 bits | Block type |
Block types
| Value | Identifier | Description |
|---|---|---|
| 0 | Uncompressed (or stored) block | |
| 1 | Fixed Huffman compressed block | |
| 2 | Dynamic Huffman compressed block | |
| 3 | Reserved (not used) |
Uncompressed block data
The uncompressed block data is of variable size and consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.3 | 5 bits | Empty values (not used) | |
| 1 | 2 | Uncompressed data size | |
| 3 | 2 | Copy of uncompressed data size, which contains a 1s complement of the uncompressed data size | |
| 5 | ... | Uncompressed data |
The uncompressed data size can range between 0 and 65535 bytes.
Huffman compressed block data
The uncompressed block data is of variable size and consists of:
- Optional dynamic Huffman table
- Encoded bit-stream
- End-of-stream (or end-of-block or end-of-data) marker
Dynamic Huffman table
The dynamic Huffman table consists of:
| Offset | Size | Value | Description |
|---|---|---|---|
| 0.3 | 5 bits | Number of literal codes, which is value + 257. The number of literal codes must be smaller than 286 | |
| 1.0 | 5 bits | Number of distance codes, which is value + 1. The number of distance codes must be smaller than 30 | |
| 1.5 | 4 bits | The number of Huffman codes for the code sizes, which is value + 4 | |
| 2.1 | ... | The code sizes | |
| ... | ... | Huffman encoded stream of the Huffman codes for the literals | |
| ... | ... | Huffman encoded stream of the Huffman codes for the distances |
A single code size value is 3 bits of size. A value of 0 means the code size is not used in the Huffman encoding of the literal and distance codes.
The codes size values are stored in the following sequence:
16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
The first value applies to a code size of 16, the second to 17, etc. Code sizes that are not stored default to 0.
The code size values are used to construct the code sizes Huffman table. This must be a complete Huffman table which is used to decode the literal and distance codes. The corresponding codes size Huffman encoding is defined as:
| Value | Identifier | Description |
|---|---|---|
| 0 - 15 | Represents a code size of 0 - 15 | |
| 16 | Copy the previous code size 3 - 6 times. The next 2 bits indicate repeat length (0 = 3, ... , 3 = 6), e.g. codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will expand to 12 code lengths of 8 (1 + 6 + 5) | |
| 17 | Repeat a code length of 0 for 3 - 10 times (3 bits of length) | |
| 18 | Repeat a code length of 0 for 11 - 138 times (7 bits of length) |
Both the literal and distance Huffman codes are stored Huffman encoded using the code sizes Huffman table. Code sizes that are not stored default to 0. The code size for the literal code 256 (end-of-block) should be set and thus not 0.
Encoded bit-stream
The encoded bit-stream is stored in 8-bit integers, where bit values are stored back-to-front. So that 3 least-significant bits (LSB) would represent a 3-bit value at the start of the -stream. Note that the LSB of the 3-bit value is the LSB of the byte value.
Deflate uses a Huffman tree of 288 Huffman codes (or symbols) where the values:
- 0 - 255; represent the literal byte values: 0 - 255
- 256: represents the end of (compressed) stream (or block)
- 257 - 285 (combined with extra-bits): represent a (size, offset) tuple (or match length) of 3 - 258 bytes
- 286, 287: are not used (reserved) and their use is considered illegal although the values are still part of the tree
This document refers to this Huffman tree as the literals Huffman tree.
The bits in the encoded bit-stream correspond to values in the literals Huffman tree. If a symbol is found that represents a compression size and offset tuple (or match length code) the bits following the literals symbol contains a distance (Huffman) code. The match length coedes might require additional (or extra) bits to store the length (or size).
The distances Huffman tree contains space for 32 symbols. See section Distance codes. The distance code might require additional (or extra) bits to store the distance.
Literal codes
The literal codes consist of:
| Value | Identifier | Description |
|---|---|---|
| 0x00 – 0xff | literal byte values | |
| 0x100 | end-of-block marker | |
| 0 additional bits | ||
| 0x101 | Size of 3 | |
| 0x102 | Size of 4 | |
| 0x103 | Size of 5 | |
| 0x104 | Size of 6 | |
| 0x105 | Size of 7 | |
| 0x106 | Size of 8 | |
| 0x107 | Size of 9 | |
| 0x108 | Size of 10 | |
| 1 additional bit | ||
| 0x109 | Size of 11 to 12 | |
| 0x10a | Size of 13 to 14 | |
| 0x10b | Size of 15 to 16 | |
| 0x10c | Size of 17 to 18 | |
| 2 additional bits | ||
| 0x10d | Size of 19 to 22 | |
| 0x10e | Size of 23 to 26 | |
| 0x10f | Size of 27 to 30 | |
| 0x110 | Size of 31 to 34 | |
| 3 additional bits | ||
| 0x111 | Size of 35 to 42 | |
| 0x112 | Size of 43 to 50 | |
| 0x113 | Size of 51 to 58 | |
| 0x114 | Size of 59 to 66 | |
| 4 additional bits | ||
| 0x115 | Size of 67 to 82 | |
| 0x116 | Size of 83 to 98 | |
| 0x117 | Size of 99 to 114 | |
| 0x118 | Size of 115 to 130 | |
| 5 additional bits | ||
| 0x119 | Size of 131 to 162 | |
| 0x11a | Size of 163 to 194 | |
| 0x11b | Size of 195 to 226 | |
| 0x11c | Size of 227 to 257 | |
| 0 additional bits | ||
| 0x11d | Size of 258 | |
Distance codes
The distance codes consist of:
| Value | Identifier | Description |
|---|---|---|
| 0 | distance of 1 | |
| 1 | distance of 2 | |
| 2 | distance of 3 | |
| 3 | distance of 4 | |
| 1 additional bit | ||
| 4 | distance of 5 - 6 | |
| 5 | distance of 7 - 8 | |
| 2 additional bits | ||
| 6 | distance of 9 - 12 | |
| 7 | distance of 13 - 16 | |
| 3 additional bits | ||
| 8 | distance of 17 - 24 | |
| 9 | distance of 25 - 32 | |
| 4 additional bits | ||
| 10 | distance of 33 - 48 | |
| 11 | distance of 49 - 64 | |
| 5 additional bits | ||
| 12 | distance of 65 - 96 | |
| 13 | distance of 97 - 128 | |
| 6 additional bits | ||
| 14 | distance of 129 - 192 | |
| 15 | distance of 193 - 256 | |
| 7 additional bits | ||
| 16 | distance of 257 - 384 | |
| 17 | distance of 385 - 512 | |
| 8 additional bits | ||
| 18 | distance of 513 - 768 | |
| 19 | distance of 769 - 1024 | |
| 9 additional bits | ||
| 20 | distance of 1025 - 1536 | |
| 21 | distance of 1537 - 2048 | |
| 10 additional bits | ||
| 22 | distance of 2049 - 3072 | |
| 23 | distance of 3073 - 4096 | |
| 11 additional bits | ||
| 24 | distance of 4097 - 6144 | |
| 25 | distance of 6145 - 8192 | |
| 12 additional bits | ||
| 26 | distance 8193 - 12288 | |
| 27 | distance 12289 - 16384 | |
| 13 additional bits | ||
| 28 | distance 16385 - 24576 | |
| 29 | distance 24577 - 32768 | |
| other | ||
| 30-31 | not used, reserved and illegal but still part of the tree | |
TODO: complete this section
Additional bits
The additional bits are stored in big-endian (MSB first) and indicate the index into the corresponding array of size values (or base size + additional size).
| Value | Identifier | Description |
|---|---|---|
| 0 additional bits | ||
| 0 | Offset of 1 | |
| 1 | Offset of 2 | |
| 2 | Offset of 3 | |
| 3 | Offset of 4 | |
| 1 additional bit | ||
TODO: complete this section
Decompression
The decompression in pseudo code:
if block_header.type == HUFFMANN_FIXED:
{
initialize the fixed Huffman trees
}
do
{
read block_header from input stream
if( block_header.type == UNCOMPRESSED )
{
align with next byte
read and check block_header.size and block_header.size_copy
read data of block_header.size
}
else
{
if( block_header.type == HUFFMANN_DYNAMIC )
{
read the dynamic Huffman trees (see subsection below)
}
loop (until end of block code recognized)
{
decode literal/length value from input stream
if( value < 256 )
{
copy value (literal byte) to output stream
}
else if value = end of block (256)
{
break from loop
}
else (value = 257..285)
{
decode distance from input stream
move backwards distance bytes in the output
stream, and copy length bytes from this
position to the output stream.
}
}
}
}
while( block_header.last_block_flag == 0 );
Adler-32 checksum
Zlib provides a highly optimized version of the algorithm provided below.
uint32_t adler32(
uint8_t *buffer,
size_t buffer_size,
uint32_t previous_key )
{
size_t buffer_iterator = 0;
uint32_t lower_word = previous_key & 0xffff;
uint32_t upper_word = ( previous_key >> 16 ) & 0xffff;
for( buffer_iterator = 0;
buffer_iterator < buffer_size;
buffer_iterator++ )
{
lower_word += buffer[ buffer_iterator ];
upper_word += lower_word;
if( ( buffer_iterator != 0 )
&& ( ( buffer_iterator % 0x15b0 == 0 )
|| ( buffer_iterator == buffer_size - 1 ) ) )
{
lower_word = lower_word % 0xfff1;
upper_word = upper_word % 0xfff1;
}
}
return( ( upper_word << 16 ) | lower_word );
}