Introduction

Keramics provides read-only access to a collection of data formats.

This document is intended as a working document of specifications of data formats used by the Keramics project. These specifications are based on available documentation and analysis of data samples.

Note that these might differ from authorative format specifications and are works in progress.

Storage media image formats

A storage media image format is used to store data from storage media devices such as a hard disk, a floppy or optical disk like CD-ROM or DVD.

Formats

Expert Witness Compression Format (EWF)
Expert Witness Compression Format version 2 (EWF2)
Mac OS sparse bundle
Mac OS sparse image
Parallels Disk Image (PDI)
QEMU Copy-On-Write (QCOW)
Universal Disk Image Format (UDIF)
Virtual Hard Disk (VHD)
Virtual Hard Disk version 2 (VHDX)
VMWare Virtual Disk Format (VMDK)

Expert Witness Compression Format (EWF)

EWF is short for Expert Witness Compression Format. It is a file type used to store storage media images for digital forensic purposes. It is currently widely used in the field of computer forensics in proprietary tooling like EnCase en FTK. The original specification of the format was provided by ASR Data for the SMART application.

The EWF format was succeeded by the Expert Witness Compression Format version 2 in EnCase 7 (EWF2-Ex01 and EWF2-Lx01). EnCase 7 also uses a different version of EWF-L01 then its predecessors.

Overview

The Expert Witness Compression Format (EWF) is used to store:

storage media images, such as hard disks, USB sticks, optical disks
individual volumes or partitions
“physical” RAM and process memory

EWF can store data compressed or uncompressed, in a single image in one or more segment files. Each segment file consist of a standard header, followed by multiple sections. A single section cannot span multiple files. Sections are arranged back-to-back.

Terminology

In this document when referred to the EWF format it refers to the original specification by ASR Data. The newer formats like that of EnCase are deducted from the original specification and will be referred to as the EWF-E01, because of the default file extension. Whereas the Logical File Evidence (LVF) format introduced in EnCase 5, which is also stored in the EWF format will be referred to as EWF-L01. The SMART format is viewed separately to allow for discussion if the implementation differs from the specification by ASR Data and will be referred to as the EWF-S01, because of the default file extension.

All offsets are relative to the beginning of an individual section, unless otherwise noted. EnCase allows a maximum size of a segment file to be 2000 MiB. This has to do with the size of the offset of the chunk of media data. This is a 32 bit value where the most significant bit (MSB) is used as a compression flag. Therefore the maximum offset size (31 bit) can address about 2048 MiB. In EnCase 6.7 an addition was made to the table value to provide for a base offset to allow for segment files greater than 2048 MiB.

A chunk is defined as the sector size (per default 512 bytes) multiplied by the block size, the number of sectors per chunk (block) (per default 64 sectors). The data within the EWF format is stored in little-endian. The terms block and chunk are used intermittently.

Segment file

EWF stores data in one or more segment files (or segments). Each segment file consists of:

A file header.
One or more sections.

File header

Each segment file starts with a file header.

EWF defines that the file header consists of 2 parts, namely:

a signature part
fields part

EWF, EWF-E01 and SMART (EWF-S01)

The file header, used by both the EWF-E01 and SMART (EWF-S01) formats, is 13 bytes in size and consists of:

Offset	Size	Value	Description
0	8	"EVF\x09\x0d\x0a\xff\x00"	Signature
8	1	0x01	Start of fields
9	2		Segment number, which must be 1 or higher
11	2	0x0000	End of fields

The segment number contains a number which refers to the number of the segment file, starting with 1 for the first file.

Note this means there could only be a maximum of 65535 (0xffff) files, if it is an unsigned value.

EWF-L01

The file header, used by the EWF-L01 format, is 13 bytes in size and consists of:

Offset	Size	Value	Description
0	8	"LVF\x09\x0d\x0a\xff\x00"	Signature
8	1	0x01	Start of fields
9	2		Segment number, which must be 1 or higher
11	2	0x0000	End of fields

The segment number contains a number which refers to the number of the segment file, starting with 1 for the first file.

Note this means there could only be a maximum of 65535 (0xffff) files, if it is an unsigned value.

Segment file extensions

The SMART (EWF-S01) and the EWF-E01 formats use a different naming convention for the segment files.

SMART (EWF-S01)

The SMART (EWF-S01) extension naming has two distinct parts.

The first segment file has the extension ‘.s01’.
- The next segment file has the extension ’.s02.
- This will continue up to ‘.s99’.
After which the next segment file has the extension ‘.saa’.
- The next segment file has the extension ‘.sab’.
- This will continue up to ‘.saz’.
- The next segment file has the extension ‘.sba’.
- This will continue up to ‘.szz’.
- The next segment file has the extension ‘.faa’.
- This will continue up to ‘.zzz’.
- Not confirmed but other sources report it will even continue to the use the extensions ‘.{aa’.

Keramics supports extensions up to .zzz

EWF-E01

The EWF-E01 extension naming has two distinct parts.

The first segment file has the extension ‘.E01’.
- The next segment file has the extension ’.E02.
- This will continue up to ‘.E99’.
After which the next segment file has the extension ‘.EAA’.
- The next segment file has the extension ‘.EAB’.
- This will continue up to ‘.EAZ’.
- The next segment file has the extension ‘.EBA’.
- This will continue up to ‘.EZZ’.
- The next segment file has the extension ‘.FAA’.
- This will continue up to ‘.ZZZ’.
- Not confirmed but other sources report it will even continue to the use the extensions ‘.[AA’.

Keramics supports extensions up to .ZZZ

EWF-L01

The EWF-L01 extension naming has two distinct parts.

The first segment file has the extension ‘.L01’.
- The next segment file has the extension ’.L02.
- This will continue up to ‘.L99’.
After which the next segment file has the extension ‘.LAA’.
- The next segment file has the extension ‘.LAB’.
- This will continue up to ‘.LAZ’.
- The next segment file has the extension ‘.LBA’.
- This will continue up to ‘.LZZ’.
- The next segment file has the extension ‘.MAA’.
- This will continue up to ‘.ZZZ’.
- Not confirmed but other sources report it will even continue to the use the extensions ‘.[AA’.

Keramics supports extensions up to .ZZZ

Segment file set identifier GUID

Segment file sets do not have a strict unique identifier. However the volume section contains a GUID that can be used for this purpose. Where:

linen 5 to 6 use a time and MAC address based version (1) of the GUID
EnCase 5 to 7 and linen 6 to 7 use a random based version (4) of the GUID

Note that in linen 6 the switch from a version 1 to 4 GUID was somewhere made between version 6.01 and 6.19.

See RFC4122 for more information about the different GUID versions.

The sections

The remainder of the segment file consists of sections. Every section starts with the same data this will be referred to as the section header.

Section header

The section header consist of 76 bytes, it contains information about a specific section.

Offset	Size	Value	Description
0	16		Section type, a string containing the section type definition, such as "header" or "volume"
16	8		Next section offset, where the offset is relative from the start of the segment file
24	8		Section size
32	40	0x00	Unknown (Padding)
72	4		Checksum, which contains an Adler-32 of all the previous data within the section header

Some sections contain additional data, refer to paragraph section types for more information.

Note Expert Witness 1.35 (for Windows) does not set the section size.

Note that in EnCase 2 DOS version the padding itself does not contains 0-byte values but data, probably the memory is not filled with 0-byte values.

Section types

There are multiple section types. ASR Data - E01 Compression Format defines the following:

Header section
Volume section
Table section
Next and Done section

The following sections type were found analyzing more recent EnCase files (EWF-E01):

Header2 section
Disk section
Sectors section
Table2 section
Data section
Error2 section
Session section
Hash section
Digest section

The following sections type were found analyzing more recent EnCase files (EWF-L01):

Ltree section
Ltypes section

Header2 section

The header2 section is identified in the section data type field as “header2”. Some aspects of this section are:

Found in EWF-E01 in EnCase 4 to 7, and EWF-L01 in EnCase 5 to 7
Found at the start of the first segment file. Not found in subsequent segment files.
The same header2 section is found twice directly after one and other.

The additional data this section contains is the following:

Offset	Size	Value	Description
76 (0x4c)	...		Information about the acquired media

The information about the acquired media consists of zlib compressed data. It contains text in UTF16 format specifying information about the acquired media. The text multiple lines separated by an end of line character(s).

The first 2 bytes of the UTF16 string are the byte order mark (BOM):

0xff 0xfe for UTF-16 litte-endian
0xfe 0xff for UTF-16 big-endian

In the next paragraphs the various variants of the header2 section are described.

EnCase 4 (EWF-E01)

In EnCase 4 (EWF-E01) the header2 information consist of 5 lines, and contains the equivalent information as the header section.

Line number	Value	Description
1	1	The number of categories provided
2	main	The name/type of the category provided
3		Identifiers for the values in the 4th line
4		The data for the different identifiers in the 3rd line
5		(an empty line)

The end of line character(s) is a newline (0x0a).

Note this end of line character differs from the one used in the header section.

The 3rd and the 4th line consist of the following tab (0x09) separated values.

Identifier number	Character in 3rd line	Value in 4th line
1	a	Unique description
2	c	Case number
3	n	Evidence number
4	e	Examiner name
5	t	Notes
6	av	Version, which contains the EnCase version used to acquire the media
7	ov	Platform, which contains the platform/operating system used to acquire the media
8	m	Acquisition date and time
9	u	System date and time
10	p	Password hash

Also see header2 values

Note the hashing algorithm is the same as for the header section.

EnCase 5 to 7 (EWF-E01)

In EnCase 5 to 7 (EWF-E01) the header2 information consist of 17 lines, and contains:

Line number	Value	Description
1	3	The number of categories provided
2	main	The name/type of the category provided
3		Identifier for the values in the category
4		The data for the different identifiers in the category
5		(an empty line)
6	srce	The name/type of the category provided, also see sources category
7
8		Identifier for the values in the category
9		The data for the different identifiers in the category
10
11		(an empty line)
12	sub	The name/type of the category provided, also see subjects category
13
14		Identifier for the values in the category
15		The data for the different identifiers in the category
16
17		(an empty line)

The end of line character(s) is a newline (0x0a).

Main category

The 3rd and the 4th line consist of the following tab (0x09) separated values.

Note the actual values in this category are dependent on the version of EnCase.

Identifier number	Character in 3rd line	Value in 4th line
1	a	Unique description
2	c	Case number
3	n	Evidence number
4	e	Examiner name
5	t	Notes
6	md	The model of the media, such as hard disk model (introduced in EnCase 6)
7	sn	The serial number of media (introduced in EnCase 6)
8	l	The device label (introduced in EnCase 6.19)
9	av	Version, which contains the EnCase version used to acquire the media. EnCase limits this value to 12 characters
10	ov	Platform, which contains the platform/operating system used to acquire the media
11	m	Acquisition date and time
12	u	System date and time
13	p	Password hash
14	pid	Process identifier, which contains the identifier of the process memory acquired (introduced in EnCase 6.12/Winen 6.11)
15	dc	Unknown
16	ext	Extents, which contains the extents of the process memory acquired (introduced in EnCase 6.12/Winen 6.11)

Also see header2 values

Note that both the acquiry and system date and time are empty in a file created by winen.

Note that the date values in the header section (not the header2 section) are set to: “Thu Jan 1 00:00:00 1970”. Where the time is dependent on the time zone and daylight savings.

Note that in a Logicube Dossier generated header2 section an additional emtpy value in the 4th line was observed. The number of values in the 3rd and 4th can differ.

Sources category

Line 6 the srce category contains information about acquisition sources.

TODO: describe what a source is in the context of EnCase.

Line 7 consists of 2 values, namely the values are “0 1”.

The 8th line consist of the following tab (0x09) separated values.

Note that the actual values in this category are dependent on the version of EnCase.

Identifier number	Character in 8rd line	Meaning
1	p
2	n
3	id	Identifier, which contains an integer identifying the source
4	ev	Evidence number, which contains a string
5	tb	Total bytes, which contains an integer
6	lo	Logical offset, which contains an integer which is -1 when value is not set
7	po	Physical offset, which contains an integer which is -1 when value is not set
8	ah	MD5 hash, which contains a string with the MD5 hash of the source
9	sh	SHA1 hash, contains a string with the SHA1 hash of the source (introduced in EnCase 6.19)
10	gu	Device GUID, which contains a string with a GUID or "0" if not set
11	pgu	Primary device GUID, which contains a string with a GUID or "0" if not set (introduced in EnCase 7)
12	aq	Acquisition date and time, which contains an integer with a POSIX timestamp

Line 9 consists of 2 values, namely the values are “0 0”.

Line 10 contains the values defined by line 8.

Note the default values of some of these values has changed around EnCase 6.12.

If the “ha” value contains “00000000000000000000000000000000” this means the MD5 hash is not set. The same applies for the “sha” value when it contains “0000000000000000000000000000000000000000” the SHA1 has is not set.

Subjects category

Line 12 the sub category contains information about subjects.

TODO: describe what a subject is in the context of EnCase.

Line 13 consists of 2 values, namely the values are “0 1”.

The 14th line consist of the following tab (0x09) separated values.

Identifier number	Character in 14rd line	Meaning
1	p
2	n
3	id	Identifier, which contains an integer identifying the subject
4	nu	Unknown (Number)
5	co	Unknown (Comment)
6	gu	Unknown (GUID)

Line 15 consists of 2 values, namely the values are “0 0”.

Line 16 contains the values defined by line 14.

Note that the default values of some of these values has changed around EnCase 6.12.

EnCase 5 to 7 (EWF-L01)

The EnCase 5 to 7 (EWF-E01) header2 section specification also applies to the EnCase 5 to 7 (EWF-L01) format. However:

both the acquired and system date and time are not set

Header2 values

Identifier	Description	Notes
a	Unique description	Free form string. Note that EnCase might not respond when this value is large e.g. >= 1 MiB
av	Version	Free form string. EnCase limits this string to 12 - 1 characters
c	Case number	Free form string. EnCase limits this string to 3000 - 1 characters
dc	Unknown
e	Examiner name	Free form string. EnCase limits this string to 3000 - 1 characters
ext	Extents	Extents header value
l	Device label	Free form string
m	Acquisition date and time	String containing POSIX 32-bit epoch timestamp, e.g. "1142163845" which represents the date: March 12 2006, 11:44:05
md	Model	Free form string. EnCase limits this string to 3000 - 1 characters
n	Evidence number	Free form string. EnCase limits this string to 3000 - 1 characters
ov	Platform	Free form string. EnCase limits this string to 24 - 1 characters
pid	Process identifier	String containing the process identifier (pid) number
p	Password hash	String containing the password hash. If no password is set it should be simply the character '0'
sn	Serial Number	Free form string. EnCase limits this string to 3000 - 1 characters
t	Notes	Free form string. EnCase limits this string to 3000 - 1 characters
u	System date and time	String containing POSIX 32-bit epoch timestamp, e.g. "1142163845" which represents the date: March 12 2006, 11:44:05

Note the restrictions were tested with EnCase 7.02.01, older versions could have a restriction of 40 characters instead of 3000 characters.

Extents header value

An extents header value consist of:

number of entries
entries that consist of: S <1> <2> <3>

Header section

The header section is identified in the section data type field as “header”. Some aspects of this section are:

Defined in ASR Data - E01 Compression Format
Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)
Found at the start of the first segment file or in EnCase 4 to 7 after the header2 section in the first segment file. Typically not found in subsequent segment files with the exception of Logicube Dossier generated EWF-E01 files.

The additional data this section contains is the following:

Offset	Size	Value	Description
76 (0x4c)	...		Information about the acquired media

The information about the acquired media consists of zlib compressed data. It contains text in ASCII format specifying information about the acquired media. The text multiple lines separated by an end of line character(s).

In the next paragraphs the various variants of the header section are described. In all cases the information consists of at least 4 lines:

Line number	Value	Description
1	1	The number of categories provided
2	main	The name/type of the category provided
3		Identifiers for the values in the 4th line
4		The data for the different identifiers in the 3rd line

An additional 5th line is found in FTK Imager, EnCase 1 to 7 (EWF-E01).

Line number	Value	Description
5		(an empty line)

EWF format

Some aspects of this section are:

ASR Data - E01 Compression Format specifies the end of line character(s) is a newline (0x0a).

According to ASR Data - E01 Compression Format the 3rd and the 4th line consist of the following tab (0x09) separated values:

Identifier number	Character in 3rd line	Value in 4th line
1	c	Case number
2	n	Evidence number
3	a	Unique description
4	e	Examiner name
5	t	Notes
6	m	Acquisition date and time
7	u	System date and time
8	p	Password hash
9	r	Compression level

Also see header values

ASR Data - E01 Compression Format states that the Expert Witness Compression uses ‘f’, fastest compression.

EnCase 1 (EWF-E01)

Some aspects of this section are:

The header section is defined only once.
It is the first section of the first segment file. It is not found in subsequent segment files.
The header data itself is compressed using zlib.
The end of line character(s) is a carriage return (0x0d) followed by a newline (0x0a).

The 3rd and the 4th line consist of the following tab (0x09) separated values“

Identifier number	Character in 3rd line	Value in 4th line
1	c	Case number
2	n	Evidence number
3	a	Unique description
4	e	Examiner name
5	t	Notes
6	m	Acquisition date and time
7	u	System date and time
8	p	Password hash
9	r	Compression level

Also see header values

SMART (EWF-S01)

Some aspects of this section are:

The header section is defined once.
It is the first section of the first segment file. It is not found in subsequent segment files.
The header data is always processed by zlib, however the same compression level is used as for the chunks. This could mean compression level 0 which is no compression.

The SMART format uses the FTK Imager (EWF-E01) specification for this section. Note that this could be something FTK Imager specific.

EnCase 2 and 3 (EWF-E01)

Some aspects of this section are:

The same header section defined twice.
It is the first and second section of the first segment file. It is not found in subsequent segment files.
The header data itself is compressed using zlib.
The end of line character(s) is a carriage return (0x0d) followed by a newline (0x0a).

The 3rd and the 4th line consist of the following tab (0x09) separated values:

Identifier number	Character in 3rd line	Value in 4th line
1	c	Case number
2	n	Evidence number
3	a	Unique description
4	e	Examiner name
5	t	Notes
6	av	Version, which contains the EnCase version used to acquire the media
7	ov	Platform, which contains the platform/operating system used to acquire the media
8	m	Acquisition date and time
9	u	System date and time
10	p	Password hash
11	r	Compression level

Also see header values

EnCase 4 to 7 (EWF-E01)

Some aspects of this section are:

The header is defined only once.
It resides after the header2 sections of the first segment file. It is not found in subsequent segment files.
The header data itself is compressed using zlib.
The end of line character(s) is a carriage return (0x0d) followed by a newline (0x0a).

The 3rd and the 4th line consist of the following tab (0x09) separated values:

Identifier number	Character in 3rd line	Value in 4th line
1	c	Case number
2	n	Evidence number
3	a	Unique description
4	e	Examiner name
5	t	Notes
6	av	Version, which contains the EnCase version used to acquire the media
7	ov	Platform, which contains the platform/operating system used to acquire the media
8	m	Acquisition date and time
9	u	System date and time
10	p	Password hash

Also see header values

linen 5 to 7 (EWF-E01)

Some aspects of this section are:

The same header section defined twice.
It is the first and second section of the first segment file. It is not found in subsequent segment files.
The header data itself is compressed using zlib.
The end of line character(s) is a newline (0x0a).

The header information consist of 18 lines

The remainder of the string contains the following information:

Line number	Value	Description
1	3	The number of categories provided
2	main	The name/type of the category provided
3		Identifier for the values in the 4th line
4		The data for the different identifiers in the 3rd line
5		(an empty line)
6	srce	The name/type of the section provided, also see Sources category
7
8		Identifier for the values in the section
9
10
11		(an empty line)
12	sub	The name/type of the section provided, also see Subjects category
13
14		Identifier for the values in the section
15
16
17		(an empty line)

The end of line character(s) is a newline (0x0a).

Main category - linen 5

The 3rd and the 4th line consist of the following tab (0x09) separated values.

Note the actual values in this category are dependent on the version of linen.

Identifier number	Character in 3rd line	Value in 4th line
1	a	Unique description
2	c	Case number
3	n	Evidence number
4	e	Examiner name
5	t	Notes
6	av	Version, which contains the linen version used to acquire the media
7	ov	Platform, which contains the platform/operating system used to acquire the media
8	m	Acquisition date and time
9	u	System date and time
10	p	Password hash

Also see header values

Main category - linen 6 to 7

The 3rd and the 4th line consist of the following tab (0x09) separated values.

Note the actual values in this category are dependent on the version of linen.

Identifier number	Character in 3rd line	Value in 4th line
1	a	Unique description
2	c	Case number
3	n	Evidence number
4	e	Examiner name
5	t	Notes
6	md	The model of the media, such as hard disk model (Introduced in linen 6)
7	sn	The serial number of media (Introduced in linen 6)
8	l	The device label (Introduced in linen 6.19)
9	av	Version, which contains the linen version used to acquire the media
10	ov	Platform, which contains the platform/operating system used to acquire the media
11	m	Acquisition date and time
12	u	System date and time
13	p	Password hash
14	pid	Process identifier, which contains the identifier of the process memory acquired (Introduced in linen 6.19 or earlier)
15	dc	Unknown (Introduced in linen 6)
16	ext	Extents, which contains the extents of the process memory acquired (Introduced in linen 6.19 or earlier)

Note as of linen 6.19 the acquire date and time is in UTC and the system date and time is in local time. Where as before both values were in local time.

Also see header values

Sources category

Line 6 the srce category contains information about acquisition sources

TODO: describe what a source is in the context of EnCase.

Line 7 consists of 2 values, namely the values are “0 1”.

The 8th line consist of the following tab (0x09) separated values.

Identifier number	Character in 8rd line	Meaning
1	p
2	n
3	id	Identifier, which contains an integer identifying the source
4	ev	Evidence number, which contains a string
5	tb	Total bytes, which contains an integer
6	lo	Logical offset, which contains an integer which is -1 when value is not set
7	po	Physical offset, which contains an integer which is -1 when value is not set
8	ah	Unknown (MD5?), which contains a string
9	sh	Unknown (SHA1?), which contains a string (Introduced in linen 6.19 or earlier)
10	gu	Device GUID, which contains a string with a GUID or "0" if not set
11	aq	Acquisition date and time, which contains an integer with a POSIX timestamp

Line 9 consists of 2 values, namely the values are “0 0”.

Line 10 contains the values defined by line 8.

Note the default values of some of these values has changed around linen 6.19 or earlier.

Subjects category

Line 12 the sub category contains information about subjects.

TODO: describe what a subject is in the context of EnCase.

Line 13 consists of 2 values, namely the values are “0 1”.

The 14th line consist of the following tab (0x09) separated values.

Identifier number	Character in 14rd line	Meaning
1	p
2	n
3	id	Identifier, which contains an integer identifying the subject
4	nu	Unknown (Number)
5	co	Unknown (Comment)
6	gu	Unknown (GUID)

Line 15 consists of 2 values, namely the values are “0 0”.

Line 16 contains the values defined by line 14.

Note the default values of some of these values has changed around linen 6.19 or earlier.

FTK Imager (EWF-E01)

Some aspects of this section are:

In FTK Imager (EWF-E01) the same header section defined twice.
It is the first and second section of the first segment file. It is not found in subsequent segment files.
The header data itself is compressed using zlib. Note that the compression level can be none and therefore the header looks uncompressed.
In FTK Imager the end of line character(s) is a newline (0x0a).

The 3rd and the 4th line consist of the following tab (0x09) separated values:

Identifier number	Character in 3rd line	Value in 4th line
1	c	Case number
2	n	Evidence number
3	a	Unique description
4	e	Examiner name
5	t	Notes
6	av	Version, which contains the FTK Imager version used to acquire the media
7	ov	Platform, which contains the platform/operating system used to acquire the media
8	m	Acquisition date and time
9	u	System date and time
10	p	Password hash
11	r	Compression level

Also see header values

EnCase 5 to 7 (EWF-L01)

The EnCase 4 to 7 (EWF-E01) header section specification is also used for the EnCase 5 to 7 (EWF-L01) format, with the following aspects:

In EnCase 5 both the acquired and system date and time are set to 0.
In EnCase 6 and 7 both the acquired and system date and time are set to Jan 1, 1970 00:00:00 (the time is dependent on the local timezone and daylight savings)

Header values

Identifier	Description	Notes
a	Unique description	Free form string. Note that EnCase might not respond when this value is large e.g. >= 1 MiB
av	Version	Free form string. EnCase limits this string to 12 - 1 characters
c	Case number	Free form string. EnCase limits this string to 3000 - 1 characters
dc	Unknown
e	Examiner name	Free form string. EnCase limits this string to 3000 - 1 characters
ext	Extents	Extents header value
l	Device label	Free form string
m	Acquisition date and time	Contains a date and time header value
md	Model	Free form string. EnCase limits this string to 3000 - 1 characters
n	Evidence number	Free form string. EnCase limits this string to 3000 - 1 characters
ov	Platform	Free form string. EnCase limits this string to 24 -1 characters
pid	Process identifier	String containing the process identifier (pid) number
p	Password hash	String containing the password hash. If no password is set it should be simply the character '0'
r	Compression level	Compression header value
sn	Serial Number	Free form string. EnCase limits this string to 3000 - 1 characters
t	Notes	Free form string. EnCase limits this string to 3000 - 1 characters
u	Systemdate and time	Contains a date and time header value

Note the restrictions were tested with EnCase 7.02.01, older versions could have a restriction of 40 characters instead of 3000 characters.

Date and time header value

In EnCase a date and time contains a string of individual values separated by a space, e.g. “2002 3 4 10 19 59”, which represents March 4, 2002 10:19:59.

In linen a date and time contains a string with a POSIX 32-bit epoch timestamp, e.g. “1142163845” which represents the date: March 12 2006, 11:44:05

Extents header value

An extents header value consist of:

number of entries
entries that consist of: S <1> <2> <3>

Compression header value

A compression header value consist of a single character that represent the compression level.

Character value	Meaning
b	Best compression is used
f	Fastest compression is used
n	No compression is used

Notes

There should not be a tab, carriage return and newline characters within the text in the 4th line. Or is there a method to escape these characters?

ASR Data - E01 Compression Format states that these characters should not be used in the free form text. Need to confirm this, the specification only speaks of a newline character.

Currently the password has no a additional value than allow an application check it. The data itself is not protected using the password. The password hashing algorithm is unknown. Need to find out. And does the algorithm differ per EnCase version? probably not. The algorithm does not differ in EnCase 1 to 7. FTK Imager does not bother with a password.

Volume section

The volume section is identified in the section data type field as “volume”. Some aspects of this section are:

Defined in ASR Data - E01 Compression Format
Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)
Found after the header section of the first segment file. Not found in subsequent segment files.

In the next paragraphs the various versions of the volume section are described.

EWF specification

The specification according to ASR Data - E01 Compression Format.

The volume section data is 94 bytes in size and consists of:

Offset	Size	Value	Description
0	4	0x01	Unknown (Reserved)
4	4		The number of chunks within the all segment files
8	4		The number of sectors per chunk, which contains 64 per default
12	4		The number of bytes per sectors, which contains 512 per default
16	4		The sectors count, the number of sectors within all segment files
20	20	0x00	Unknown (Reserved)
40	45	0x00	Unknown (Padding)
85	5		Signature, which contains the EWF file header signature
90	4		Checksum, which contains an Adler-32 of all the previous data within the volume section data

The number of chunks is a 32-bit value this means it maximum of addressable chunks would be: 4294967295 (= 2^32 - 1). For a chunk size of 32768 x 4294967295 = about 127 TiB. The maximum segment file amount is 2^16 - 1 = 65535. This allows for an equal number of storage if a segment file is filled to its maximum number of chunks.

However Keramics is restricted at 14295 segment files, due to the extension naming schema of the segment files.

SMART (EWF-S01)

The SMART format uses the EWF specification for this section.

In SMART the signature (reverse) value is the string “SMART” (0x53 0x4d 0x41 0x52 0x54) instead of the file header signature.

FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01)

The specification for FTK Imager, EnCase 1 to 7 and linen 5 to 7.

The volume section data is 1052 bytes in size and consists of:

Offset	Size	Value	Description
0	1		Media type
1	3	0x00	Unknown (empty values)
4	4		The number of chunks within the all segment files
8	4		The number of sectors per chunk (or block size), which contains 64 per default. EnCase 5 is the first version which allows this value to be different than 64
12	4		The number of bytes per sector
16	8		The sectors count, which contains the number of sectors within all segment files. This value probably has been changed in EnCase 6 from a 32-bit value to a 64-bit value to support media >2TiB
24	4		The number of cylinders of the C:H:S value, which most of the time this value is empty (0x00)
28	4		The number of heads of the C:H:S value, which most of the time this value is empty (0x00)
32	4		The number of sectors of the C:H:S value, which most of the time this value is empty (0x00)
36	1		Media flags
37	3	0x00	Unknown (empty values)
40	4		PALM volume start sector
44	4	0x00	Unknown (empty values)
48	4		SMART logs start sector, which contains an offset relative from the end of media, e.g. a value of 10 would refer to sector = number of sectors - 10
52	1		Compression level (Introduced in EnCase 5)
53	3	0x00	Unknown (empty values, these values seem to be part of the compression level)
56	4		The sector error granularity, which contains the error block size (Introduced in EnCase 5)
60	4	0x00	Unknown (empty values)
64	16		Segment file set identifier, which contains a GUID/UUID generated on the acquiry system probably used to uniquely identify a set of segment files (Introduced in EnCase 5)
80	963	0x00	Unknown (empty values)
1043	5	0x00	Unknown (Signature)
1048	4		Checksum, which contains an Adler-32 of all the previous data within the volume section data

TODO: a value that could be in the volume is the RAID stripe size

Note that EnCase requires for media that contains no partition table that the is physical media flag is not set and vice versa. Other tools like FTK check the actual storage media data.

EnCase 5 to 7 (EWF-L01)

The EWF-L01 format uses the EnCase 5 (EWF-E01) volume section specification. However:

the volume type contains 0x0e
the number of chunks is 0
the number of bytes per sectors is some kind of block size value (4096), perhaps the source file system block size
the sectors count, represents some other value because (sector_size x sector_amount != total_size). The total size is in the ltree section.

Media type

Value	Identifier	Description
0x00		A removable storage media device
0x01		A fixed storage media device

0x03		An optical disc (CD/DVD/BD)

0x0e		Logical Evidence (LEF or L01)

0x10		Physical Memory (RAM) or process memory

Note that FTK imager versions, before version 2.9, set the storage media to fixed (0x01). The exact version of FTK imager where this behavior changed is unknown.

Media flags

Value	Identifier	Description
0x01		Is an image file. In FTK Imager, EnCase 1 to 7 this bit is always set, when not set EnCase seems to see the image file as a device
0x02		Is physical device or device type, where 0 represents a non physical device (logical) and 1 represents a physical device
0x04		Fastbloc write blocker used
0x08		Tableau write blocker used. This was added in EnCase 6.13

Note that if both the the Fastbloc and Tableau write blocker media flags are set EnCase only shows the Fastbloc.

Compression level

Value	Identifier	Description
0x00		no compression
0x01		good compression
0x02		best compression

Note that EnCase 7 no longer provides the fast and best compression options.

Disk section

The disk section is identified in the section data type field as “disk”. Some aspects of this section are:

Not defined in ASR Data - E01 Compression Format.
Not found in SMART (EWF-S01).

With a disk section in an FTK Imager 2.3 (EWF-E01) image it was confirmed that the disk section is the same as the volume section.

Note that the disk section was found only in FTK Imager 2.3 when acquiring a physical disk not a floppy. This requires additional research, it is currently assumed that the disk section some old method to differentiate between a partition (volume) image or a physical disk image.

Data section

The data section is identified in the section data type field as “data”. Some aspects of this section are:

Not defined in ASR Data - E01 Compression Format.
Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, and EWF-L01 in EnCase 5 to 7. Not found in SMART (EWF-S01).
For multiple segment files it does not reside in the first segment file. For a single segment file it does.
Found after the last table2 section in a single segment file or for multiple segment files at the start of the segment files, except for the first.
The data section has data it should should contain the same information as the volume section.

The data section is a copy of the volume section.

FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01)

Note that in Logicube products (Talon (firmware predating April 2013) and Forensic dossier (before version 3.3.3RC16)) the checksum is not calculated and set to 0.

Sectors section

The sectors section is identified in the section data type field as “sectors”. Some aspects of this section are:

Not defined in ASR Data - E01 Compression Format.
Found in EWF-E01 in EnCase 2 to 7, or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7. Not found in EnCase 1 (EWF-E01) or SMART (EWF-S01).
The first sectors section can be found after the volume section in the first segment file or at the after the data section in subsequent segment files. Successive sector data sections are found after the sector table2 section.

The sectors section contains the actual chunks of media data.

The sectors section can contain multiple chunks.
The default size of a chunk is 32768 bytes of data (64 standard sectors, with a size of 512 bytes per sector). It is possible in EnCase 5 and 6 and linen 5 and 6 to change the number of sectors per block to 64, 128, 256, 1024, 2048, 4096, 8192, 16384 or 32768. In EnCase 7 and linen 7 this has been reduced to 64, 128, 256, 1024.

Data chunk

The first chunk is often located directly after the section header, although the format does not require this.

When the data is compressed and the compressed data (with checksum) is larger than the uncompressed data (without the checksum) the data chunk is stored uncompressed. The default size of a chunk is 32768 bytes of data (64 standard sectors).

An uncompressed data chunk is of variable size and consists of:

Offset	Size	Value	Description
0	...		Uncompressed chunk data
...	4		Checksum, which contains an Adler-32 of the chunk data

The compressed data chunk consist of zlib compressed data. The checksum of the compressed data chunk is part the zlib compressed data format.

Optical disc images

For a MODE-1 CD-ROM optical disc image EnCase only seems to support 2048 bytes per sector (the data).

The raw sector size of a MODE-1 CD-ROM is 2352 bytes in size and consists of:

Offset	Size	Value	Description
0	16		Synchronization bytes
16	2048		Data
2054	4		Error detection
2058	8	0x00	Unknown (Empty values)
2066	276		Error correction

TODO: add information about Mode-2 and Mode-XA

Table section

The table section is identified in the section data type field as “table”. Some aspects of this section are:

Defined in ASR Data - E01 Compression Format.
Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)

Note that the offsets within the section header are 8 bytes (64 bits) of size while the offsets in the table entry array are 4 bytes (32 bits) in size.

In the next paragraphs the various versions of the table section are described.

EWF specification

Some aspects of the table section according to the EWF specification are:

The first table section resides after the volume section in the first segment file or after the file header in subsequent segment files.
It can be found in every segment file.

The table section consists of:

the table header
an array of table entries
the data chunks

Table header

The table header is 24 bytes in size and consists of:

Offset	Size	Value	Description
0	4		The number of entries
4	16	0x00	Unknown (Padding)
20	4		Checksum, which contains an Adler-32 of all the previous data within the table header data

According to ASR Data - E01 Compression Format

the number of entries, contains 0x01
the table can hold 16375 entries if more entries are required an additional table section should be created.

Table entry

The table entry is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Chunk data offset

The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).

A chunk data offset points to the start of the chunk of media data, which resides in the same table section within the segment file. The offset contains a value relative to the start of the file.

Data chunk

The first chunk is often located directly after the last table entry, although the format does not require this.

A data chunk is always compressed even when no compression is required. This approach provides a checksum for each chunk. The default size of a chunk is 32768 bytes of data (64 standard sectors). The resulting size of the “compressed” chunk can therefore be larger than the default chunk size.

Note that this was deducted from the behavior of FTK Imager for SMART (EWF-S01).

The compressed data chunk consist of zlib compressed data. The checksum of the compressed data chunk is part the zlib compressed data format.

SMART (EWF-S01)

The table section in the SMART (EWF-S01) format is equivalent to that of the EWF specification.

EnCase 1 (EWF-E01)

Some aspects of this section are:

The table section resides after the volume section in the first segment file or after the file header in subsequent segment files.
It can be found in every segment file.

The table section consists of:

the table header
an array of table entries
the table footer
the data chunks

Table header

The table header is 24 bytes in size and consists of:

Offset	Size	Value	Description
0	4		The number of entries
4	16	0x00	Unknown (Padding)
20	4		Checksum, which contains an Adler-32 of all the previous data within the table header data

The table can hold 16375 entries if more entries are required an additional table section should be created.

Table entry

The table entry is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Chunk data offset

The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).

A chunk data offset points to the start of the chunk of media data, which resides in the same table section within the segment file. The offset contains a value relative to the start of the file.

The table footer is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Checksum, which contains an Adler-32 of the offset array

Data chunk

The first chunk is often located directly after the table footer, although the format does not require this.

An uncompressed data chunk is of variable size and consists of:

Offset	Size	Value	Description
0	...		Uncompressed chunk data
...	4		Checksum, which contains an Adler-32 of the chunk data

The compressed data chunk consist of zlib compressed data. The checksum of the compressed data chunk is part the zlib compressed data format.

FTK Imager and EnCase 2 to 5 and linen 5 (EWF-E01)

Some aspects of this section are:

The table section resides after the sectors section.
It can be found in every segment file.
The data chunks are no longer stored in this section but in the sectors section instead.
The table2 section contains a mirror copy of the table section. In EWF-E01 it is always present.

The table section consists of:

the table header
an array of table entries
the table footer

Table header

The sector table header is 24 bytes in size and consists of:

Offset	Size	Value	Description
0	4		The number of entries
4	16	0x00	Unknown (Padding)
20	4		Checksum, which contains an Adler-32 of all the previous data within the table header data

The table section can hold 16375 entries. A new table section should be created to hold more entries. Both FTK Imager and EnCase 5 can handle more than 16375, FTK 1 cannot. To contain more than 16375 chunks new sectors, table and table2 sections need to be created after the table2 section.

Table entry

The table entry is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Chunk data offset

The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).

A chunk data offset points to the start of the chunk of media data, which resides in the preceding sectors section within the segment file. The offset contains a value relative to the start of the file.

The table footer is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Checksum, which contains an Adler-32 of the offset array

EnCase 6 to 7 and linen 6 to 7 (EWF-E01)

Some aspects of this section are:

Every segment file contains its own table section.
It resides after the sectors section.
The data chunks are no longer stored in this section but in the sectors section instead.
The table2 section contains a mirror copy of the table section. In EWF-E01 it is always present.

The table section consists of:

the table header
an array of table entries
the table footer

Table header

The sector table header is 24 bytes in size and consists of:

Offset	Size	Value	Description
0	4		The number of entries
4	4	0x00	Unknown (Padding)
8	8		The table base offset
16	4	0x00	Unknown (Padding)
20	4		Checksum, which contains an Adler-32 of all the previous data within the table header data

As of EnCase 6 the number of entries is no longer restricted to 16375 entries. The new limit seems to be 65534.

Table entry

The table entry is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Chunk data offset

The most significant bit (MSB) in the chunk data offset indicates if the chunk is compressed (1) or uncompressed (0).

In EnCase 6.7.1 the sectors section can be larger than 2048Mb. The table entries offsets are 31 bit values in EnCase6 the offset in a table entry value will actually use the full 32 bit if the 2048Mb has been exceeded. This behavior is no longer present in EnCase 6.8 so it is assumed to be a bug. Libewf currently assumes that the if the 31 bit value overflows the following chunks are uncompressed. This allows EnCase 6.7.1 faulty EWF files to be converted.

The table footer is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Checksum, which contains an Adler-32 of the offset array

EnCase 6 to 7 (EWF-L01)

The EWF-L01 format uses the EnCase 6 to 7 (EWF-E01) table section specification.

Table2 section

The table2 section is identified in the section data type field as “table2”. Some aspects of this section are:

Not defined in ASR Data - E01 Compression Format.
Found in EWF-E01 in EnCase 2 to 7, or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7. Not found in EnCase 1 (EWF-E01) or SMART (EWF-S01).
Uses the same format as the table section.
Resides directly after the table section.

FTK Imager and EnCase 2 to 7 and linen 5 to 7 (EWF-E01)

The table2 section contains a mirror copy of the table section. Probably intended for recovery purposes.

EnCase 5 to 7 (EWF-L01)

The EWF-L01 format uses the EWF-E01 table2 section specification.

Next section

The next section is identified in the section data type field as “next”. Some aspects of this section are:

Defined in ASR Data - E01 Compression Format.
Found in EWF-E01 in EnCase 1 to 7 or linen 5 to 7 or FTK Imager, EWF-L01 in EnCase 5 to 7, and SMART (EWF-S01)
The last section within a segment other than the last segment file.
The offset to the next section in the section header of the next section point to itself (the start of the next section).
It should be the last section in a segment file, other than the last segment file.

SMART (EWF-S01)

It resides after the table or table2 section.

FTK Imager, EnCase and linen (EWF-E01)

It resides after the data section in a single segment file or for multiple segment files after the table2 section.

In the EnCase (EWF-E01) format the size in the section header is 0 instead of 76 (the size of the section header).

Note that FTK imager versions before 2.9 sets the section size to 76. At the moment it is unknown in which version this behavior was changed.

Ltypes section

The ltypes section is identifier in the section data type field as “ltypes”. Some aspects of this section are:

Found in EWF-L01 in of EnCase 7
Found in the last segment file after table2 section before tree section.

The additional ltypes section data is 6 bytes in size and consists of:

Offset	Size	Description
0	2	Unknown
2	2	Unknown
4	2	Unknown

Ltree section

The ltree section is identifier in the section data type field as “ltree”. Some aspects of this section are:

Found in EWF-L01 in of EnCase 5 to 7
Found in the last segment file after ltypes section and before data section.

The ltree section consists of:

ltree header
ltree data

Ltree header

The ltree header is 48 bytes in size and consists of:

Offset	Size	Description
0	16	Integrity hash, which contains the MD5 of the ltree data
16	8	Data size
24	4	Checksum, which contains an Adler-32 of all the data within the ltree header where the checksum value itself is zeroed out
28	20	Unknown (empty values)

Ltree data

The ltree data string consists of an UTF-16 little-endian encoded string without byte order mark. The ltree data is not strict UTF-16 since it allows for unpaired surrogates, such as “U+d800” and “U+dc00”.

Other observed characteristics where the names in the ltree deviate from the original source:

[U+0001-U+0008] were converted to U+00ba
[U+0009, U+000a] were stripped
[U+000b, U+000c] were converted to U+0020
U+000d was converted to U+0002
U+00ba remained the same

Note that this behavior could be related to EnCase as well and might not be specific for EWF-L01.

The ltree data string contains the following information:

Line number	Value	Description
1	5	The number of categories provided
2	rec	Information about unknown, also see Records category
...		(an empty line)
...	perm	Information about file permissions, also see Permissions category
...		(an empty line)
...	srce	Information about acquisition sources, also see sources category
...		(an empty line)
...	sub	Information about unknown, also see subjects category
...		(an empty line)
...	entry	Information about file entries, also see File entries category
...		(an empty line)

The end of line character(s) is a newline (0x0a).

Records category

The rec category contains information about records.

The 1st line of the category contains the string “rec”.

The 2nd line of the category contains tab (0x09) separated type indicators.

Identifier number	Type indicator	Description
1	tb	Total bytes, which contains an integer with size of the logical file data (media data)
2	cl	Unknown (Clusters?)
3	n	Unknown (introduced in EnCase 6.19)
4	fp	Unknown (introduced in EnCase 7)
5	pg	Unknown (introduced in EnCase 7)
6	lg	Unknown (introduced in EnCase 7)
7	ig	Unknown (introduced in EnCase 7)

The 3rd line of the category consist of the tab (0x09) separated values.

Permissions category

The perm category contains information about file permissions.

The 1st line of the category contains the string “perm”.

The 2nd line consists of the following 2 values:

Value number	Value	Description
1		The number of permission groups in the category
2	1	Unknown

The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.

The remaining lines in the category consist of:

category root entry
- zero or more permissions group entries
  - zero or more permission entries

Each entry consist of 2 lines:

Line number	Value	Description
1		Number of entries
2		Tab (0x09) separated values that correspond to the type indicators

The 1st line of the category root entry consists of the following 2 values:

Value number	Value	Description
1	0	Unknown
2		The number of permission groups in the category

The 1st line of the permission group entry consists of the following 2 values:

Value number	Value	Description
1	0	Unknown
2		The number of permissions in the group

The 1st line of the permission entry consists of the following 2 values:

Value number	Value	Description
1	0	Unknown
2	0	Unknown

Permission type indicators

Identifier number	Type indicator	Description
1	p	Is parent, where 1 represents if the entry is a category root or permissions group and 0 represents if the entry is a permission
2	n	Name, which contains a string
3	s	Security identifier, which contains a string with either a Windows NT security identifier (SID) or a POSIX user (uid) or group identifier (gid) in the format " number:" such as " 99:"
4	pr	Property type, also see permission types
5	nta	Access mask
6	nti	Unknown (Windows NT access control entry (ACE) flags?, which contains an integer with a Windows NT access control entry (ACE) flags)
7	nts	Unknown (Permission?) (Removed in EnCase 6)

Permission types

Value	Identifier	Description
(empty)		Owner or category root
1		Group
2		Allow

6		Other

10		Unknown (permissions group?)

Access mask

Access mask seen in combination with property types 0, 1 and 6

Value	Identifier	Description
(empty)		Owner or category root
0x00000001	`[Lst Fldr/Rd Data]`	List folder / Read data
0x00000002	`[Crt Fl/W Data]`	Create file / Write data

0x00000020	`[Trav Fldr/X Fl]`	Traverse folder / Execute file

Access mask seen in combination with property type 2

[0x001200a9] [R&X] [R] [Sync]
[0x001301bf] [M] [R&X] [R] [W] [Sync]
[0x001f01ff] [FC] [M] [R&X] [R] [W] [Sync]

Value	Identifier	Description
(empty)		Owner or category root
0x00000001
0x00000002
0x00000004
0x00000008
0x00000010
0x00000020
0x00000040
0x00000080
0x00000100

0x00010000
0x00020000
0x00040000
0x00080000
0x00100000

Sources category

The srce category contains information about acquisition sources of the file entries.

TODO: describe what an acquisition source is in the context of EnCase.

The 1st line of the category contains the string “srce”.

The 2nd line consists of 2 values.

Value index	Value	Description
1		The number of sources in the category
2	1	Unknown

The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.

The remaining lines in the category consist of:

category root
- zero or more source entries

Each entry consist of 2 lines:

Line number	Value	Description
1		Number of entries
2		Tab (0x09) separated values that correspond to the type indicators

The 1st line of the category root entry consists of the following 2 values:

Value number	Value	Description
1	0	Unknown
2		The number of sources in the category

The 1st line of the source entry consists of the following 2 values:

Value number	Value	Description
1	0	Unknown
2	0	Unknown

Source type indicators

Identifier number	Type indicator	Description
1	p
2	n
3	id	Identifier, which contains an integer identifying the source
4	ev	Evidence number, which contains a string
5	do	Domain, which contains a string (introduced in EnCase 7.9)
6	loc	Location, which contains a string (introduced in EnCase 7.9)
7	se	Serial number, which contains a string (introduced in EnCase 7.9)
8	mfr	Manufacturer, which contains a string (introduced in EnCase 7.9)
9	mo	Model, which contains a string (introduced in EnCase 7.9)
10	tb	Total bytes, which contains an integer
11	lo	Logical offset, which contains an integer which is -1 when value is not set
12	po	Physical offset, which contains an integer which is -1 when value is not set
13	ah	MD5 hash, which contains a string with the MD5 hash of the source
14	sh	SHA1 hash, which contains a string with the SHA1 hash of the source (introduced in EnCase 6.19)
15	gu	Device GUID, which contains a string with a GUID or "0" if not set
16	pgu	Primary device GUID, which contains a string with a GUID or "0" if not set (introduced in EnCase 7)
17	aq	Acquisition date and time, which contains an integer with a POSIX timestamp
18	ip	IP address, which contains a string (introduced in EnCase 7.9)
19	si	Unknown (Static IP address?), Contains 1 if static, empty otherwise (introduced in EnCase 7.9)
20	ma	MAC address, which contains a string without separator characters (introduced in EnCase 7.9)
21	dt	Drive type, which contains a single character (introduced in EnCase 7.9)

The acquisition date and time is in the form of: “1142163845”, which is a POSIX epoch timestamp and represents the date: March 12 2006, 11:44:05.

If the “ma” value contains “000000000000” this means the MAC address is not set.

Drive type

Character value	Meaning
f	Fixed drive

Subjects category

The sub category contains information about TODO

TODO: describe what a subject is in the context of EnCase.

The 1st line of the category contains the string “sub”.

The 2nd line consists of 2 values.

Value index	Value	Description
1		The number of subjects in the category
2	1	Unknown

The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.

The remaining lines in the category consist of:

category root
- zero or more subject entries

Each entry consist of 2 lines:

Line number	Value	Description
1		Number of entries
2		Tab (0x09) separated values that correspond to the type indicators

The 1st line of the category root entry consists of the following 2 values:

Value number	Value	Description
1	0	Unknown
2		The number of subject in the category

The 1st line of the subject entry consists of the following 2 values:

Value number	Value	Description
1	0	Unknown
2	0	Unknown

Subject type indicators

Identifier number	Type indicator	Description
1	p
2	n
3	id	Identifier, which contains an integer identifying the subject
4	nu	Unknown (Number)
5	co	Unknown (Comment)
6	gu	Unknown (GUID)

File entries category

The entry category contains information about the file entries.

The 1st line of the category contains the string “entry”.

The 2nd line consists of 2 values.

Value index	Value	Description
1		The number of file entries in the category or 1 if unknown
2	1	Unknown

The 3rd line of the category contains tab (0x09) separated type indicators. For more information see the sections below.

The remaining lines in the category consist of:

category root
- zero or more file entries
  - zero or more sub file entries
    - …

Each entry consist of 2 lines:

Line number	Value	Description
1		Number of entries
2		Tab (0x09) separated values that correspond to the type indicators

The 1st line of the category root entry consists of the following 2 values:

Value number	Value	Description
1		0 if not set or 26 if Unknown
2		The number of file entries in the category

The 1st line of the file entry consists of the following 2 values:

Value number	Value	Description
1		Number of file entries in the parent file entry or 0 if not set
2		The number of sub file entries in the file entry

EnCase 5 and 6 (EWF-L01) file entry type indicators

Identifier number	Character in 29th line	Meaning
1	p	Is parent, where 1 => if the entry is a directory and (empty) => if the entry is a file
2	n	Name
3	id	Identifier, contains an integer identifying the file entry
4	opr	File entry flags
5	src	Source identifier, which contains an integer that corresponds to an identifier in the Sources category
6	sub	Subject identifier, which contains an integer that corresponds to an identifier in the Subjects category
7	cid	Unknown (record type)
8	jq	Unknown
9	cr	Creation date and time
10	ac	Access date and time, for which currently is assumed the precision is date only
11	wr	(File) modification (last written) date and time
12	mo	(File system) entry modification date and time
13	dl	Deletion date and time
14	aq	Acquisition date and time, which contains an integer with a POSIX timestamp
15	ha	MD5 hash, which contains a string with the MD5 hash of the file data
16	ls	File size in bytes. If the file size is 0 the data size should be 1
17	du	Duplicate data offset, relative from the start of the media data
18	lo	Logical offset, which contains an integer which is -1 when value is not set
19	po	Physical offset, which contains an integer which is -1 when value is not set (or does this value contain the segment file in which the start of the data is stored, -1 for a single segment file?)
20	mid	GUID, which contains a string with a GUID (introduced in EnCase 6.19)
21	cfi	Unknown (introduced in EnCase 6.14)
22	be	Binary extents
23	pm	Permissions group index, which contains an integer that corresponds to an identifier in the Permissions category or -1 if not set. The value is 0 by default
24	lpt	Unknown (introduced in EnCase 6.19)

The creation, access and last written date and time are in the form of: “1142163845”, which is a POSIX epoch timestamp and represents the date: March 12 2006, 11:44:05.

The “ha” value (Hash) consist of a MD5 hash string when file entries are hashed. If the “ha” value contains “00000000000000000000000000000000” this means the MD5 hash is not set.

Ltree file entries

The ltree entries of files and directories consist of entries starting with: 0 followed by the number of sub file entries.

The entries of files and directories:

Line number	Value	Description
1	(empty)	The root directory
2		The target drive/mount point
3		The actual single file entries

EnCase 7 (EWF-L01) file entry type indicators

Identifier number	Character in 29th line	Meaning
1	mid	GUID, which contains a string with a GUID
2	ls	File size, in bytes. If the file size is 0 the data size should be 1
3	be	Binary extents
4	id	Identifier, which contains an integer identifying the file entry
5	cr	Creation date and time
6	ac	Access date and time
7	wr	(File) modification (last written) date and time
8	mo	(File system) entry modification date and time
9	dl	Deletion date and time
10	sig	Unknown (Introduced in EnCase 7)
11	ha	MD5 hash, which contains a string with the MD5 hash of the file data
12	sha	SHA1 hash, which contains a string with the SHA1 hash of the file data. (Introduced in EnCase 7)
13	ent	Unknown, seen "B" (Introduced in EnCase 7.9)
14	snh	Short name (or DOS 8.3 name) (Introduced in EnCase 7.9)
15	p	Is parent, where "1" represents that the entry is a directory and "" (an empty string) that the entry is a file
16	n	Name
17	du	Duplicate data offset, relative from the start of the media data
18	lo	Logical offset, which contains an integer which is -1 when value is not set
19	po	Physical offset, which contains an integer which is -1 when value is not set (or does this value contain the segment file in which the start of the data is stored, -1 for a single segment file?)
20	pm	Permissions group index, which contains an integer that corresponds to an identifier in the Permissions category or -1 if not set. The value is 0 by default
21	oes	Unknown (Original extents?) (Introduced in EnCase 7)
22	opr	File entry flags
23	src	Source identifier, which contains an integer that corresponds to an identifier in the Sources category
24	sub	Subject identifier, which contains an integer that corresponds to an identifier in the Subjects category
25	cid	Unknown (record type?)
26	jq	Unknown
27	alt	Unknown (Introduced in EnCase 7)
28	ep	Unknown (Introduced in EnCase 7)
29	aq	Acquisition date and time, which contains an integer with a POSIX timestamp
30	cfi	Unknown
31	sg	Unknown (Introduced in EnCase 7)
32	ea	Extended attributes (Introduced in EnCase 7.9)
33	lpt	Unknown

File entry name

A file entry name (“n” value):

can contain path segment separator characters like “\” and “/”
uses the “MIDDLE DOT” Unicode character (U+00b7) as a (NTFS) alternative data stream (ADS) name seperator

Note that a regular “MIDDLE DOT” Unicode character will be encoded in the same way so no real way to reliably tell the difference.

An empty name has been observed to be represented as “NoName”.

Short name

The short name (“snh”) value contains 2 values:

Value number	Value	Description
1		The number of characters in the short name including the end-of-string character
2		The short name string, without an end-of-string character

For example: “13 FILE10~1.TXT”

Original extents

TODO: add some text

1 30a555b 30a6000 12011ae00 9008d7 3f 43 1 12011ae00 30a6000 120113 30a6 9008d7 18530

Ltree file entries

The ltree entries of files and directories consist of entries starting with: 26 followed by the number of sub file entries.

The entries of files and directories:

Line number	Value	Description
1	LogicalEntries	The root directory
2		The target drive/mount point
3		The actual single file entries

File entry flags

Value	Identifier	Description
0x00000001		Unknown (Is read-only?)
0x00000002	Hidden	Is hidden
0x00000004	System	Is system
0x00000008	Archive	Is archive
0x00000010	Sym Link	Is symbolic link, junction or reparse point

0x00000080	Deleted	Is deleted

0x00001000	Hard Linked	Is hard link
0x00002000	Stream	Is stream

0x00100000	Internal	Is internal (used in combination with 0x00000006?)

0x00200000	Unallocated Clusters	Unknown
0x00400000		Unknown

0x01000000		Unknown
0x02000000	Folder	Is folder
0x04000000		Data is sparse

If 0x00002000 or 0x02000000 are not set the file entry is of type “File”.

If the sparse data flag is set:

the data size should be 1 and data should consist of a single byte value.
the data size should be equal to the file size and data should be the same.

If the duplicate data offset value is not set the single byte value in the data should be used to reconstruct the file data. E.g. if the file size is 4096 and the data contains the byte value 0x00 the resulting file should consists of 4096 x 0x00 byte values.

If the duplicate data offset value is set the single byte in the data is ignored and the duplicate data offset refers to the location where the data stored.

Binary extents value

The binary extents value contains 3 values separated by a space:

Unknown Offset Size

Where:

unknown always is 1, could this be the number of extents?
extent data offset, relative from the start of the media data
extent data size

The offset and size are specified in hexadecimal values.

Note that the binary extents value contains only 1 value for the first single file entry.

Extended attributes value

The extended attributes value contains base-16 encoded data, which consists of:

Extended attributes header (stored as an extended attribute)
One or more extended attributes

Extended attributes header

The extended attributes header is 37 bytes in size and consists of:

Offset	Size	Value	Description
0	4	0	Unknown (0 => root, 1 => otherwise)
4	1	1	Unknown (0 => is leaf node, 1 => is branch node?)
5	4	11	Number of characters in name string including the end-of-string character
9	4	1	Number of characters in value string including the end-of-string character
13	22	"Attributes\0"	Name string, which contains an UTF-16 little-endian encoded string including end-of-string character
35	2	"\0"	Value string, which contains an UTF-16 little-endian encoded string including end-of-string character

Extended attribute

An extended attributes is of variable size and consists of:

Offset	Size	Description
0	4	Unknown (0 => root, 1 => otherwise)
4	1	Unknown (0 => is leaf node, 1 => is branch node?)
5	4	Number of characters in name string including the end-of-string character
9	4	Number of characters in value string including the end-of-string character
13	...	Name string, which contains an UTF-16 little-endian encoded string including end-of-string character
...	...	Value string, which contains an UTF-16 little-endian encoded string including end-of-string character

TODO: complete section

Note that branch nodes are presuably used to group attributes, however these are not used consistently and are not shown by EnCase 7.

Map section

Some aspects of this section are:

Found in EWF-L01 in of EnCase 7 (First seen in EnCase 7.4.1.10)
Found in the last segment file after data section before done section.

The map consists of:

map string
map entries array

Map string

The map string consists of an UTF-16 little-endian encoded string without the UTF-16 endian byte order mark.

The map string contains the following information:

Line number	Value	Description
1	1	The number of categories provided
2	r	Probably the type of information provided
3	c	Identifier for the values in the 4th line
4		The data for the different identifiers in the 3rd line
5		(an empty line)

Map string values

Identifier number	Character in 29th line	Meaning
1	C	Number of map entries (count)

The number of map entries should match the number of file entries in the ltree.

Map entry

A map entry is 24 bytes in size and consists of:

Offset	Size	Description
0	4	Unknown
4	4	Unknown (empty values or part of previous value)
8	16	Unknown

Session section

The session section is identifier in the section data type field as “session”. Some aspects of this section are:

Not defined in ASR Data - E01 Compression Format.
It is not found in SMART (EWF-S01) and FTK Imager (EWF-E01).
It is found in EnCase 5 and 6 (EWF-E01) files.
It is only added to the last segment file for images of optical disc (CD/DVD/BD) media.
It is found after the data section and before the error2 section.

The session section data consists of:

The session header
The session entries array
The session footer

Session header

The session header is 36 byte in size and consists of:

Offset	Size	Description
0	4	Number of sessions
4	28	Unknown (empty values)
32	4	Checksum, which contains an Adler-32 of all the previous data within the additional session section data

Session entry

A session entry is 32 byte in size and consists of:

Offset	Size	Description
0	4	Flags
4	4	Start sector
8	24	Unknown (empty values)

EnCase stores audio tracks as 0 byte data with a sector size of 2048.

Note that for a CD the first session sector is stored as 16, although the actual session starts at sector 0. Could this value be overloaded to indicate the size of the reserved space between the start of the session and the ISO 9660 volume descriptor.

Session flags

Value	Identifier	Description
0x00000001		If set the track is an audio track otherwise the track is a data track

The session footer is 4 byte in size and consists of:

Offset	Size	Value	Description
0	4		Checksum, which contains an Adler-32 of all the data within the session entries array

Error2 section

The error2 section is identifier in the section data type field as “error2”. Some aspects of this section are:

Not defined in ASR Data - E01 Compression Format.
It is not found in SMART (EWF-S01).
It is found in, EnCase 3 to 7 and linen 5 to 7 (EWF-E01) files.
It is only added to the last segment file when errors were encountered while reading the input.

TODO: check FTK Imager, EnCase 1 and 2 for presence of the error2 section.

It contains the sectors that have read errors. The sector where a read error occurred are filled with zero’s during acquiry by EnCase.

The error2 section data consists of:

The error2 header
The error2 entries array
The error2 footer

Error2 header

The error2 header is 520 byte in size and consists of:

Offset	Size	Description
0	4	Number of entries
4	512	Unknown (empty values)
516	4	Checksum, which contains an Adler-32 of all the previous data within the error2 header data

Error2 entry

An error2 entry is 8 byte in size and consists of:

Offset	Size	Value	Description
0	4		Start sector
4	4		The number of sectors

The error2 footer is 4 byte in size and consists of:

Offset	Size	Value	Description
0	4		Checksum, which contains an Adler-32 of all the data within the error2 entries array

Digest section

The digest section is identified in the section data type field as “digest”. Some aspects of this section are:

It is found in EnCase 6 to 7 files, as of EnCase 6.12 and linen 6.12 (EWF-E01).

The digest section contains a MD5 and/or SHA1 hash of the data within the chunks.

The digest section data is 80 byte in size and consists of:

Offset	Size	Value	Description
0	16		MD5 hash of the media data
16	20		SHA1 hash of the media data
36	40	0x00	Unknown (Padding)
76	4		Checksum, which contains an Adler-32 of all the previous data within the digest section data

Hash section

The hash section is identified in the section data type field as “hash”. Some aspects of this section are:

Defined in ASR Data - E01 Compression Format.
It is found in SMART (EWF-S01) and FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01) files.
It is not found in EnCase 5 (EWF-L01).
The hash section is optional, it does not need to be present. If it does it resides in the last segment file before the done section.

The hash section contains a MD5 hash of the data within the chunks.

The hash section data is 36 byte in size and consists of:

Offset	Size	Description
0	16	MD5 hash of the media data
16	16	Unknown
32	4	Checksum, which contains an Adler-32 of all the previous data within the additional hash section data

Notes

Observations regarding the unknown value:

is zero in SMART
is zero in EnCase 3 and below
in EnCase 4 the first 4 bytes are 0, the next 8 bytes seem random, the last 4 bytes seem fixed
in EnCase 5 and 6 the first 8 bytes seem random, the last 8 bytes equal the file header signature
in linen 5 the first and last set of 4 bytes seem the same, the second set of 4 bytes seem to be random, the third set of 4 bytes seem to contain a piece of the file header signature
in linen 6 the first and third set of 4 bytes seem random, the second and last set of 4 bytes seem to be the same
EnCase5 seems to contain a GUID of the acquired device?

Test with EnCase 4 show that:

The value does not equal the checksum of the media data
Does not differentiate for the same media acquired within the same program session, using different formats, but differ for different media and different program sessions

Done section

The done section is identified in the section data type field as “done”. Some aspects of this section are:

Defined in ASR Data - E01 Compression Format.
It is found in SMART (EWF-S01), FTK Imager, EnCase 1 to 7 and linen 5 to 7 (EWF-E01) and EnCase 5 (EWF-L01) files.
The done section is the last section within the last segment file.
The offset to the next section in the section header of the done section point to itself (the start of the done section).
It should be the last section in the last segment file.

SMART (EWF-S01)

It resides after the table or table2 section.

FTK Imager, EnCase and linen (EWF-E01)

It resides after the data section in a single segment file or for multiple segment files after the table2 section.

In the EnCase (EWF-E01) format the size in the section header is 0 instead of 76 (the size of the section header).

Note that FTK imager versions before 2.9 sets the section size to 76. At the moment it is unknown in which version this behavior was changed.

Incomplete section

The incomplete section is identified in the section data type field as “incomplete”.

This section is seen rarely. It was seen in an EnCase 6.13 (EWF-E01) file as the last last section within the last segment file. The incomplete section was preceded by a hash and digest section, although later in the set of EWF files another hash and digest section were defined.

It is currently assumed that the incomplete section indicates an incomplete image created using remote imaging. The incomplete section contains data but currently there is no indication what purpose the data has.

EWF-X

EWF-X (extended) is an experimental format to enhance the EWF format. EWF-X is based on the EWF-E01 format. EWF-X does not limit the table entries to 16375. EWF-X is not the same as version 2 of EWF.

TODO: add note about the table entry limit.

Sections

Additional sections provided in the EWF-X format are:

xheader
xhash

Xheader

The xheader section contains zlib compressed data containing XML data containing the header values.

<?xml version="1.0" encoding="UTF-8"?>
<xheader>
    <case_number>1</case_number>
    <description>Description</description>
    <examiner_name>John D.</examiner_name>
    <evidence_number>1.1</evidence_number>
    <notes>Just a floppy in my system</notes>
    <acquiry_operating_system>Linux</acquiry_operating_system>
    <acquiry_date>Sat Jan 20 18:32:08 2007 CET</acquiry_date>
    <acquiry_software>ewfacquire</acquiry_software>
    <acquiry_software_version>20070120</acquiry_software_version>
</xheader>

Xhash

The xhash section contains zlib compressed data containing XML data containing the hash values.

<?xml version="1.0" encoding="UTF-8"?>
<xhash>
    <md5>ae1ce8f5ac079d3ee93f97fe3792bda3</md5>
    <sha1>31a58f090460b92220d724b28eeb2838a1df6184</sha1>
</xhash>

GUID

EWF-X uses a random based version of the GUID

Corruption scenarios

This chapter contains several corruption scenarios that have been encountered “in the wild”.

Corrupt uncompressed chunk

TODO: add description

Corrupt compressed chunk

TODO: add description

DEFLATE uncompressed block data with copy of uncompressed data size of 0

Seen in combination with some firmware versions of Tableau TD3 forensic imager.

In this corruption scenarion the copy of uncompressed data size value of the DEFLATE uncompressed block data is set to 0 instead of the 1s complement of the uncompressed data size.

Libewf currently does not handle this corruption scenario.

Corrupt section header

TODO: add description

reading section header from file IO pool entry: 1 at offset: 415912423
type                      : table2
next offset               : 415978027
size                      : 65604
checksum                  : 0xf35f03e0
number of offsets         : 16375
base offset               : 0x00000000
checksum                  : 0x180d0137

reading section header from file IO pool entry: 1 at offset: 415978027
type                      : sectors
next offset               : 415978027
size                      : 0
checksum                  : 0x1ad00464

Corrupt table section

TODO: add description

Scenarios:

with and with out table 2
corruption in number of entries
corruption in entry data

Corrupted segment file header

TODO: add description

Partial segment file

TODO: add description

Missing segment file(s)

TODO: add description

Dual image: section size versus offset

The section headers define both the next section offset and the size of the section. If an implementation reads only one of the two to determine the next section, a dual EWF image can be crafted that consists of two separate images including hashes.

Keramics will mark such an image as corrupted.

Table entries offset overflow

In EnCase 6.7.1 the sectors section can be larger than 2048 MiB. The table entries offsets are 31 bit values in EnCase6 the offset in a table entry value will actually use the full 32 bit if the 2048 MiB has been exceeded. This behavior is no longer present in EnCase 6.8 so it is assumed to be a bug.

Libewf currently assumes that the if the 31 bit value overflows the following chunks are uncompressed. This allows EnCase 6.7.1 faulty EWF files to be converted by Keramics.

Multiple incomplete segment file set identifiers

Although rare it can occur that a set of EWF image files changes its segment file set identifier. This was seen in an image created by EnCase 6.13, presumably using remote imaging. The image contained 3 different segment file set identifiers. The first changes after an incomplete section. The second one changed without any clear indication. The corresponding data section also changed in some extent e.g. compression method and media flags, the is physical flag being dropped. The change was consistent across multiple segment files. It is unlikely that deliberate manipulation is involved. EnCase considers the image as invalid.

Although with some tweaking of the individual segment file sets could be read. In this case the data read from the segment file sets was heavily corrupted. For now Keramics does not support reading multiple segment files sets from a single image, but this might change in the future.

AD encryption

As of version 2.8 FTK Imager supports “AD encryption”. Although the output file uses the EWF extensions the file actually is a AES-256 encrypted container. The EWF can be encrypted using a pass-phrase or a certificate.

TODO: link to format definition

References

Expert Witness Compression Format version 2 (EWF2)

TODO: add description

Mac OS sparse bundle (.sparsebundle) format

The Mac OS sparse bundle (.sparsebundle) format is one of the disk image formats supported natively by Mac OS.

The sparse bundle disk image was introduced in Mac OS X 10.5.

Overview

A sparse bundle consists of a directory (bundle) with the .sparsbundle suffix containing:

“Info.bckup” file
“Info.plist” file
“token” file
“bands” directory containing the band files

Characteristics

Characteristics	Description
Byte order	N/A
Date and time values	N/A
Character strings	N/A

Info.plist and Info.bckup files

The Info.plist and its backup (Info.bckup) contain a XML plist.

This plist is also referred to as “Information Property List” and contains a single dictionary with the following key-value pairs.

Identifier	Value	Description
CFBundleInfoDictionaryVersion	"6.0"	The information property list format version
band-size		The maximum size of a band file in bytes
bundle-backingstore-version	1	Unknown
diskimage-bundle-type	"com.apple.diskimage.sparsebundle"	The bundle type
size		The media size in bytes

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>CFBundleInfoDictionaryVersion</key>
    <string>6.0</string>
    <key>band-size</key>
    <integer>8388608</integer>
    <key>bundle-backingstore-version</key>
    <integer>1</integer>
    <key>diskimage-bundle-type</key>
    <string>com.apple.diskimage.sparsebundle</string>
    <key>size</key>
    <integer>4194304</integer>
</dict>
</plist>

Token file

The token file is empty.

Bands directory

The bands directory contains files containing the actual data of the bands. The files are named using a hexadecimal naming scheme where “0” is the 1st band, “a” the 10th, “f” the 15th, “10” the 16th, etc.

Mac OS sparse image (.sparseimage) format

The Mac OS sparse image (.sparseimage) format is one of the disk image formats supported natively by Mac OS.

Overview

A sparse disk image consists of:

header data
bands data

Characteristics

Characteristics	Description
Byte order	big-endian
Date and time values	N/A
Character strings	N/A

The number of bytes per sector is 512.

Header data

The header data is 4096 bytes in size and consist of:

file header
band numbers array
trailing data, which should be filled with 0-byte values

File header

The file header is 64 bytes in size and consist of:

Offset	Size	Value	Description
0	4	"sprs"	Signature
4	4		Unknown (format version?), seen 3
8	4		Number of sectors per band
12	4		Unknown, seen 1
16	4		The media data size in sectors
20	12	0	Unknown (0-byte values)
32	4		Unknown
36	28	0	Unknown (0-byte values)

Band numbers array

The band numbers array consists of:

one or more band numbers

Band number

A band number is 4 bytes in size and consist of:

Offset	Size	Value	Description
0	4		Band number, where 0 indicates a sparse range and any other value refers to a location in the media data

Where the corresponding media offset can be calculated as following:

media_offset = (band_number - 1) * sectors_per_band * 512

The offset of band data can be calculated as following:

band_data_offset = 4096 + (array_index * sectors_per_band * 512)

For example if the first array entry contains a band number of 4, then the band data is located at offset 4096 and the corresponding media offset is: 3 * sectors_per_band * 512.

Parallels Disk Image (PDI) format

The Parallels Disk Image format used in Parallels virtualization products as one of its image formats. It is both used the store hard disk images and snapshots.

Overview

A Parallels Disk Image consists of a directory, typically named “{NAME}.hdd” containing:

Descriptor file (DiskDescriptor.xml) and backup (DiskDescriptor.xml.Backup)
{NAME}.hdd file
Storage data file ({NAME}.hdd.0.{GUID}.hds)
{NAME}.hdd.drh

Where {NAME} is an arbitrary name and {GUID} is a unique identifier.

Disk types

The Parallels Disk Image format support multiple disk types:

Identifier	Description
Expanding	Disk that consists of a single (dynamic size) sparse storage data file
Plain	Disk that consists of a single single (fixed size) raw storage data file
Split	Disk that consists of a one or more split storage data files, either expanding or plain, holding upto 2G of data

Characteristics

Characteristics	Description
Byte order	little-endian
Character strings	UTF-8 by default, the encoding is defined in the disk descriptor XML file

The number of bytes per sector is 512.

Descriptor file

The DiskDescriptor.xml and its backup (DiskDescriptor.xml.Backup) contain the “Parallels_disk_image” XML element tha consists of the following values:

Identifier	Description
Disk_Parameters	The disk parameters
StorageData	Information about the storage data files
Snapshots	Information about snapshots

<?xml version='1.0' encoding='UTF-8'?>
<Parallels_disk_image Version="1.0">
    <Disk_Parameters>
        <Disk_size>134217728</Disk_size>
        <Cylinders>262144</Cylinders>
        <PhysicalSectorSize>4096</PhysicalSectorSize>
        <LogicSectorSize>512</LogicSectorSize>
        <Heads>16</Heads>
        <Sectors>32</Sectors>
        <Padding>0</Padding>
        <Encryption>
            <Engine>{00000000-0000-0000-0000-000000000000}</Engine>
            <Data></Data>
        </Encryption>
        <UID>{GUID}</UID>
        <Name>{NAME}</Name>
        <Miscellaneous>
            <CompatLevel>level2</CompatLevel>
            <Bootable>1</Bootable>
            <ChangeState>0</ChangeState>
            <SuspendState>0</SuspendState>
        </Miscellaneous>
    </Disk_Parameters>
    <StorageData>
        <Storage>
            <Start>0</Start>
            <End>134217728</End>
            <Blocksize>2048</Blocksize>
            <Image>
                <GUID>{GUID}</GUID>
                <Type>Compressed</Type>
                <File>{NAME}.hdd.0.{GUID}.hds</File>
            </Image>
            ...
        </Storage>
        ...
    </StorageData>
    <Snapshots>
        <Shot>
            <GUID>{GUID}</GUID>
            <ParentGUID>{GUID}</ParentGUID>
        </Shot>
        ...
    </Snapshots>
</Parallels_disk_image>

Disk parameters

The disk parameters are stored in the “Disk_Parameters” XML element and contains the following values.

Identifier	Description
Cylinders	Number of cylinders
Disk_size	Disk size, in number of sectors
Encryption	"Encryption" sub XML element
Heads	Number of heads
Miscellaneous	"Miscellaneous" sub XML element
Name	Name of the disk
LogicSectorSize	Optional logical sector size, which is 512 bytes by default
Padding	Unknown (padding)
PhysicalSectorSize	Optional physical sector size, which is 4096 bytes by default
Sectors	Number of sectors per cylinder
UID	Unknown (identifier)

Encryption

<Encryption>
    <Engine>{00000000-0000-0000-0000-000000000000}</Engine>
    <Data></Data>
    <Salt></Salt>
</Encryption>

Miscellaneous

<Miscellaneous>
    <CompatLevel>level2</CompatLevel>
    <Bootable>1</Bootable>
    <ChangeState>0</ChangeState>
    <SuspendState>0</SuspendState>
    <DupBlocksCnt>0</DupBlocksCnt>
    <CorruptBlocksCnt>0</CorruptBlocksCnt>
    <UnrefBlocksCnt>0</UnrefBlocksCnt>
    <OutOfDiskBlocksCnt>0</OutOfDiskBlocksCnt>
    <BatOverlapBlocksCnt>0</BatOverlapBlocksCnt>
    <BlocksCnt>0</BlocksCnt>
    <TruncatedBlocksCnt>0</TruncatedBlocksCnt>
    <ReferencedBlocksCnt>0</ReferencedBlocksCnt>
    <ShutdownState>0</ShutdownState>
    <GuestToolsVersion>17.1.1-51537</GuestToolsVersion>
</Miscellaneous>

CompatLevel

Seen: level0 and level2

Storage data

The “StorageData” XML element contains the following values.

Identifier	Description
Storage	One or more "Storage" XML sub elements

Note that a split disks contains multiple “Storage” XML sub elements.

Storage

The “Storage” XML element contains the following values.

Identifier	Description
Start	Start sector number of the segment stored in the storage data file
End	End sector number of the segment stored in the storage data file
Blocksize	Block size, in number of sectors
Image	One or more "Image" sub XML elements

Image

The “Image” XML element contains the following values.

Identifier	Description
GUID	Identifier of snapshot (or layer)
Type	Storage data file type
File	Name (or path) of the storage data file

Snapshots data

The “Snapshots” XML element contains the following values.

Identifier	Description
Shot	One or more "Shot" sub XML elements

Shot

The “Shot” XML element contains the following values.

Identifier	Description
GUID	Identifier of snapshot (or layer)
ParentGUID	Identifier of parent snapshot (or layer), which contains "{00000000-0000-0000-0000-000000000000}" if not set

Storage data file

Storage data file types

Value	Description
"Compressed"	Sparse storage data file
"Plain"	Raw storage data file

Raw storage data file

The raw (or plain) storage data file contains the disk image data including free space.

Sparse storage data file

The sparse storage data file contains the actual disk image data without free space.

A sparse storage data file consists of:

file header
block allocation table (BAT)
data blocks

Sparse storage data file header

The sparse storage data file header is 64 bytes in size and consists of:

Offset	Size	Value	Description
0	16	"WithoutFreeSpace" or "WithouFreSpacExt"	Signature
16	4	2	Format version
20	4		Number of heads
24	4		Number of cylinders
28	4		Block size (or number of tracks) in number of sectors
32	4		Number of blocks, which is equivalent to the number of block allocation table entries
36	8		Number of sectors
44	4		Unknown (Creator?), seen: "\x00\x00\x00\x00", "pd17", "pd22"
48	4		Data start sector number, which is relative to the start of the sparse storage data file
52	4		Unknown (Flags?)
56	8		Unknown (Features start sector?)

Block allocation table (BAT)

The block allocation table consists of 32-bit entries. An entry contains the sector number where the data block starts is set to 0 if the block is sparse or stored in the parent disk image.

For example block allocation table entry 0 corresponds to disk image offset 0. If contains a value of 0x800 the corresponding data block is stored at file offset 0x100000 (0x800 x 512).

QEMU Copy-On-Write (QCOW) image file format

The QEMU Copy-On-Write (QCOW) image file format is used by the QEMU Open Source Process Emulator to store disk images (storage media)

Overview

A QCOW image file consists of:

the file header
- optional file header extensions
the level 1 table (cluster aligned)
the reference count table (cluster aligned)
reference count blocks
snapshot headers (8-byte aligned on cluster boundary)
clusters containing:
- level 2 tables
- storage media data

The storage media data is stored in clusters. Each cluster is a multitude of 512 bytes. The level 1 (L1) table contains level 1 reference of level 2 (L2) tables. The level 2 tables contain level 2 references of the storage media clusters.

There are multiple versions of the QCOW image file format. QCOW (version 1) and QCOW2 (version 2 and later) are sometimes considered even as separate image formats. Version 3 is considered as an extended version of QCOW2.

Characteristics

Characteristics	Description
Byte order	big-endian in most cases, note that some values are in little-endian
Date and time values	Number of seconds since Jan 1, 1970 00:00:00 UTC (POSIX epoch)
Character strings	UTF-8

Note that this docuement assumes that character strings are stored in UTF-8

The number of bytes per sector is 512.

Encryption

The QCOW image format can encrypted the media data stored in the image format. Currently supported encryption methods are:

AES-CBC 128-bit
Linux Unified Key Setup (LUKS)

If no encryption is used the encryption method in the file header is set to none (0).

Note it is currently unknown if the format supports compression and encryption at the same time. It does not appear to be supported by qemu-img.

AES-CBC 128-bit

Both encryption and decryption use:

AES-CBC with a 128-bits key decryption of sector data

The key is direct copy of the first 16 characters of a user provided (narrow character) password. If the password is smaller than 16 characters. The remaining key data is set to 0-byte values.

Note that it is currently unclear which character sets are allowed and how characters outside the 7-bit ASCII set should be handled.

The initialization vector of the AES-CBC is using media data sector number (relative to the start of the disk) in little-endian format as the first 64 bits of the 128 bit initialization vector. The remaining initialization vector data is set to 0-byte values. The first sector number is 0 and the bytes per sector are 512.

Linux Unified Key Setup (LUKS)

TODO: complete section

File header

File header – version 1

The file header - version 1 is 48 bytes in size and consist of:

Offset	Size	Value	Description
0	4	"QFI\xfb" or "\x51\x46\x49\xfb"	The signature
4	4	1	Format version
8	8		Backing file name offset
16	4		Backing file name size
20	4		Modification date and time, which contains a POSIX timestamp
24	8		Storage media size
32	1		Number of cluster block bits
33	1		Number of level 2 table bits
34	2		[yellow-background]Unknown (empty values)
36	4		Encryption method
40	8		Level 1 table offset

The cluster block size is calculated as:

cluster_block_size = 1 << number_of_cluster_block_bits

The level 2 table size is calculated as:

level2_table_size = (1 << number_of_level2_table_bits) * 8

The level 1 table size is calculated as:

level1_table_entry_size = cluster_block_size * (1 << number_of_level2_table_bits)

level1_table_size = media_size / level1_table_entry_size
if media_size % level1_table_entry_size != 0:
    level1_table_size += 1

level1_table_size *= 8

The backing file name is set in snapshot image files and is normally stored after the file header.

File header – version 2

The file header - version 2 is 72 bytes in size and consist of:

Offset	Size	Value	Description
0	4	"QFI\xfb" or "\x51\x46\x49\xfb"	The signature
4	4	2	Format version
8	8		Backing file name offset
16	4		Backing file name size
20	4		Number of cluster block bits
24	8		Storage media size
32	4		Encryption method
36	4		Number of level 1 table references
40	8		Level 1 table offset
48	8		Reference count table offset
56	4		Reference count table clusters
60	4		Number of snapshots
64	8		Snapshots offset

The cluster block size is calculated as:

cluster_block_size = 1 << number_of_cluster_block_bits

The number of level 2 table bits is calculated as:

number_of_level2_table_bits = number_of_cluster_block_bits - 3

The level 2 table size is calculated as:

level_table2_size = (1 << number_of_level2_table_bits) * 8

The level 1 table size is calculated as:

level1_table_size = number_of_level1_table_references * 8

The backing file name is set in snapshot image files and is normally stored after the file header.

File header – version 3

The file header - version 3 is 104 or 112 bytes in size and consist of:

Offset	Size	Value	Description
0	4	"QFI\xfb" or "\x51\x46\x49\xfb"	The signature
4	4	3	Format version
8	8		Backing file name offset
16	4		Backing file name size
20	4		Number of cluster block bits
24	8		Storage media size
32	4		Encryption method
36	4		Number of level 1 table references
40	8		Level 1 table offset
48	8		Reference count table offset
56	4		Reference count table clusters
60	4		Number of snapshots
64	8		Snapshots offset
72	8		Incompatible feature flags
80	8		Compatible feature flags
88	8		Auto-clear feature flags
96	4		Reference count order
100	4	104 or 112	File header size, which contains the size of the file header, this value does not include the size of the file header extensions
If file header size equals 112
104	1		Compression method
105	7		Unknown (padding)

The cluster block size is calculated as:

cluster_block_size = 1 << number_of_cluster_block_bits

The number of level 2 table bits is calculated as:

number_of_level2_table_bits = number_of_cluster_block_bits - 3

The level 2 table size is calculated as:

level_table2_size = (1 << number_of_level2_table_bits) * 8

The level 1 table size is calculated as:

level1_table_size = number_of_level1_table_references * 8

The backing file name is set in snapshot image files and is normally stored after the file header.

Encryption methods

Value	Identifier	Description
0	QCOW_CRYPT_NONE	No encryption
1	QCOW_CRYPT_AES	AES-CBC 128-bits encryption
2	QCOW_CRYPT_LUKS	Linux Unified Key Setup (LUKS) encryption

Incompatible feature flags

Value	Identifier	Description
0x00000001	QCOW2_INCOMPAT_DIRTY
0x00000002	QCOW2_INCOMPAT_CORRUPT
0x00000004	QCOW2_INCOMPAT_DATA_FILE
0x00000008	QCOW2_INCOMPAT_COMPRESSION
0x00000010	QCOW2_INCOMPAT_EXTL2

Compatible feature flags

Value	Identifier	Description
0x00000001	QCOW2_COMPAT_LAZY_REFCOUNTS

Auto-clear feature flags

Value	Identifier	Description
0x00000001	QCOW2_AUTOCLEAR_BITMAPS
0x00000002	QCOW2_AUTOCLEAR_DATA_FILE_RAW

Compression methods

Value	Identifier	Description
0		ZLIB compression

File header extensions

A file header extension consist of:

file header extension header
file header extension data

File header extension header

The file header extension header is 8 bytes in size and consist of:

Offset	Size	Value	Description
0	4		The extension type (signature)
4	4		The extension data size

File header extension types

Value	Identifier	Description
0x0537be77	QCOW2_EXT_MAGIC_CRYPTO_HEADER	Crypto header
0x23852875	QCOW2_EXT_MAGIC_BITMAPS	Bitmaps
0x44415441 or "DATA"	QCOW2_EXT_MAGIC_DATA_FILE	Data-file
0x6803f857	QCOW2_EXT_MAGIC_FEATURE_TABLE	Feature table
0xe2792aca	QCOW2_EXT_MAGIC_BACKING_FORMAT	Backing format

Backing format file header extension

The backing format file header extension header is of variable size and consist of:

Offset	Size	Value	Description
0	...		Backing format identifier, which contains an UTF-8 string without end-of-string character

Bitmaps file header extension

TODO: complete section

Crypto header file header extension

The crypto header file header extension header is 16 bytes in size and consist of:

Offset	Size	Value	Description
0	8		The crypto data offset
8	8		The crypto data size

Data-file file header extension

The data-file file header extension header is of variable size and consist of:

Offset	Size	Value	Description
0	...		Data-file file name, which contains an UTF-8 string without end-of-string character

Feature table file header extension

TODO: complete section

Level 1 table

The level 1 table contains level 2 table references.

A reference value of 0 represents unused or unallocated and is considered as sparse or stored in a corresponding backing file.

Level 2 table reference – version 1

The level 2 table reference is 8-bytes in size and consists of:

Offset	Size	Value	Description
0.0	63 bits		Level 2 table offset, which contains an offset relative from the start of the file
7.7	1 bit	QCOW_OFLAG_COMPRESSED	Is compressed flag

Level 2 table reference – version 2 or 3

The level 2 table reference is 8-bytes in size and consists of:

Offset	Size	Value	Description
0.0	62 bits		Level 2 table offset, which contains an offset relative from the start of the file
7.6	1 bit	QCOW_OFLAG_COMPRESSED	Is compressed flag
7.7	1 bit	QCOW_OFLAG_COPIED	Is copied flag

The is copied flag indicates that the reference count of the corresponding level 2 table is exactly one.

Level 2 table

The level 2 table contains cluster block references.

The level 2 table size is calculated as:

level2_table_size = (1 << number_of_level2_table_bits) * 8

A reference value of 0 represents unused or unallocated and is considered as sparse or stored in a corresponding backing file.

Cluster block reference – version 1

The cluster block reference - version 1 is 8-bytes in size and consists of:

Offset	Size	Value	Description
0.0	63 bits		Cluster block offset, which contains an offset relative to the start of the cluster block
7.7	1 bit	QCOW_OFLAG_COMPRESSED	Is compressed flag

Cluster block reference – version 2 or 3

The cluster block reference - version 2 or 3 is 8-bytes in size and consists of:

Offset	Size	Value	Description
0.0	62 bits		Cluster block offset, which contains an offset relative to the start of the cluster block
7.6	1 bit	QCOW_OFLAG_COMPRESSED	Is compressed flag
7.7	1 bit	QCOW_OFLAG_COPIED	Is copied flag

The is copied flag indicates that the reference count of the corresponding cluster block is exactly one.

Reference count table

The cluster data blocks are referenced counted. For every cluster data block a 16-bit reference count is stored in the reference count table.

The reference count table is stored in cluster block sizes. The file header contains the number of blocks (or reference count table clusters).

TODO: complete section

Notes

reference count cluster block offset = cluster data block offset /
reference count table offset = cluster data block /

In order to obtain the reference count of a given cluster, you split the
cluster offset into a refcount table offset and refcount block offset.

Since a refcount block is a single cluster of 2 byte entries, the lower
cluster_size - 1 bits is used as the block offset and the rest of the bits are
used as the table offset.

One optimization is that if any cluster pointed to by an L1 or L2 table entry
has a refcount exactly equal to one, the most significant bit of the L1/L2
entry is set as a "copied" flag. This indicates that no snapshots are using
this cluster and it can be immediately written to without having to make a copy
for any snapshots referencing it.

Cluster data block

To retrieve a cluster data block corresponding a certain storage media offset:

Determine the level 1 table index from the offset:

level1_table_index_bit_shift = number_of_cluster_block_bits + number_of_level2_table_bits

For version 1:

level1_table_index = (offset & 0x7fffffffffffffff) >> level1_table_index_bit_shift

For version 2 and 3:

level1_table_index = (offset & 0x3fffffffffffffff) >> level1_table_index_bit_shift

Retrieve the level 2 table offset from the level 1 table. If the level 2 table offset is 0 and the image has a backing file the cluster data block is stored in the backing file otherwise the cluster block is considered sparse.

Read the corresponding level 2 table.

Determine the level 2 table index from the offset:

level2_table_index_bit_mask = ~(0xffffffffffffffff << number_of_level2_table_bits)

level2_table_index = (offset >> number_of_cluster_block_bits) >> level2_table_index_bit_mask

Retrieve the cluster block offset from the level 2 table. If the cluster block offset is 0 and the image has a backing file the cluster data block is stored in the backing file otherwise the cluster block is considered sparse.

Uncompressed cluster data block

If the is compressed flag (QCOW_OFLAG_COMPRESSED) is not set:

cluster_block_bit_mask = ~(0xffffffffffffffff << number_of_cluster_block_bits)

cluster_block_data_offset = (offset & cluster_block_bit_mask) + cluster_block_offset

Note that in version 2 or 3 the last cluster block in the file can be smaller than the cluster block size defined by the number of cluster block bits in the file header. This does not seem to be the case for version 1.

Compressed cluster data block

If the is compressed flag (QCOW_OFLAG_COMPRESSED) is set the cluster block data is stored using the compression method defined by the file header or DEFLATE by default.

Multiple compressed cluster data blocks are stored together in cluster block sizes. The compressed cluster data blocks are sector (512 bytes) aligned.

The compressed data uses a DEFLATE (inflate) window bits value of -12

Compressed chunk data block – version 1

compressed_size_bit_shift = 63 - number_of_cluster_block_bits

compressed_block_size = (
    (cluster_block_offset & 0x7fffffffffffffff) >> compressed_size_bit_shift)

compressed_block_offset &= ~(0xffffffffffffffff << compressed_size_bit_shift)

Compressed chunk data block – version 2 or 3

compressed_size_bit_shift = 62 - (number_of_cluster_block_bits – 8)

According to “the QCOW2 Image Format” the compressed block size is calculated as following:

compressed_block_size = (
    (((cluster_block_offset & 0x3fffffffffffffff) >> compressed_size_bit_shift) + 1) * 512)

Since the compressed block size is stored in 512 byte sectors this value does not contain the exact byte size of the compressed cluster block data. It sometimes lacks the size of the last partially filled sector and one sector should be added if possible within the bounds of the cluster blocks size and the file size.

cluster_block_offset &= ~(0xffffffffffffffff << compressed_size_bit_shift)

Snapshots

As of version 1 QCOW can use the backing file name in the file header to point to a backing file (or parent image) that contains the snapshot image where the current image only contains the modifications. Version 2 adds support to store snapshot inside the image.

Snapshot header - version 2 or 3

An in-image snapshot is created by adding a snapshot header, copying the L1 table and incrementing the reference counts of all L2 tables and data clusters referenced by the L1 table.

The snapshot header is of variable size and consists of:

Offset	Size	Description
0	8	Level 1 table offset
8	4	Level 1 size
12	2	Identifier string size
14	2	Name size
16	4	Date in seconds
20	4	Date in nano seconds
24	8	VM clock in nano seconds
32	4	VM state size
36	4	Extra data size
40	...	Extra data
...	...	Identifier string size
...	...	Name

TODO: complete section

References

The QCOW Image Format, by Mark McLoughlin
The QCOW2 Image Format, by Mark McLoughlin

Universal Disk Image Format (UDIF)

The Universal Disk Image Format (UDIF) (.dmg) is one of the disk image formats supported natively by Mac OS.

Overview

Known UDIF image types are:

Identifier	Description
UDBZ	bzip2 compressed UDIF
UDCO	Apple Data Compression (ADC) compressed UDIF
UDIF	Read-write uncompressed UDIF
UDRO	Read-only uncompressed UDIF
UDxx	Uncompressed UDIF
UDZO	zlib/DEFLATE compressed UDIF
ULFO	LZFSE compressed UDIF
ULMO	LZMA compressed UDIF

UDIF images are either uncompressed or compressed.

Uncompressed image format

An uncompressed UDIF image consist of:

data
optional file footer

Note that an uncompressed UDIF image without file footer is equivalent to a RAW storage media image (CRawDiskImage).

Compressed image format

A compressed UDIF image consist of:

Data fork
Optional resource fork
Optional XML plist
File footer the end of the image file

Characteristics

Characteristics	Description
Byte order	big-endian
Date and time values	N/A
Character strings	N/A

The number of bytes per sector is 512.

File footer

The file footer (also known as resource file or metadata) is 512 bytes in size and consists of:

Offset	Size	Value	Description
0	4	"koly"	Signature
4	4	4	Format version
8	4	512	File footer size in bytes
12	4		Image flags
16	8		Unknown (RunningDataForkOffset)
24	8		Data fork offset, where the offset is relative from the start of the image file
32	8		Data fork size
40	8		Resource fork offset, where the offset is relative from the start of the image file
48	8		Resource fork size
56	4		Unknown (SegmentNumber)
60	4		Number of segments, which contains 0 if not set
64	16		Segment identifier, which contains an UUID
80	4		Data checksum type
84	4		Data checksum size, as number of bits
88	128		Data checksum
216	8		XML plist offset, where the offset is relative from the start of the image file
224	8		XML plist size
232	120		Unknown (Reserved)
352	4		Master checksum type
356	4		Master checksum size, as number of bits
360	128		Master checksum
488	4		Image type (or variant)
492	8		Number of sectors
500	4		Unknown (reserved)
504	4		Unknown (reserved)
508	4		Unknown (reserved)

Note that the XML plist size can be 0, such as in an UDIF stub (UDxx) image.

Image flags

Value	Identifier	Description
0x00000001	kUDIFFlagsFlattened	Unknown (flattened?)

0x00000004	kUDIFFlagsInternetEnabled	Unknown (internet enabled?)

Image types

Value	Identifier	Description
1	kUDIFDeviceImageType	Device image
2	kUDIFPartitionImageType	Paritition image

XML plist

TODO: complete section

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>resource-fork</key>
    <dict>
        <key>blkx</key>
        <array>
            <dict>
                <key>Attributes</key>
                <string>0x0050</string>
                <key>CFName</key>
                <string>Protective Master Boot Record (MBR : 0)</string>
                <key>Data</key>
                <data>
                bWlzaAAAAAEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAA
                AAgIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAIAAAAgQfL6MwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAACgAAABQAAAAMAAAAAAAAAAAAAAAAAAAABAAAA
                AAAAIA0AAAAAAAAAH/////8AAAAAAAAAAAAAAAEAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAA=
                </data>
                <key>ID</key>
                <string>-1</string>
                <key>Name</key>
                <string>Protective Master Boot Record (MBR : 0)</string>
            </dict>
            ...
        </array>
        <key>plst</key>
        <array>
            <dict>
                <key>Attributes</key>
                <string>0x0050</string>
                <key>Data</key>
                <data>
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAQAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
                AAAAAAAAAAAA
                </data>
                <key>ID</key>
                <string>0</string>
                <key>Name</key>
                <string></string>
            </dict>
        </array>
    </dict>
</dict>
</plist>

The XML plist contains the following key-value pairs:

Identifier	Description
resource-fork	dictionary

XML plist resource-fork dictionary

The resource-fork dictionary contains the following key-value pairs:

Identifier	Description
blkx	array of dictionaries
plst	array of dictionaries

XML plist blkx array entry

A blkx array entry contains the following key-value pairs:

Identifier	Description
Attributes	string that contains a hexadecimal formatted integer value
CFName	string
Data	string that contains base-64 encoded data of a block table
ID	string that contains a decimal formatted integer value
Name	string

Block table

The block table (BLKXTable) is of variable size and consists of:

block table header
block table entries

The block table header

The block table header is 204 bytes in size and consists of:

Offset	Size	Value	Description
0	4	"mish"	Signature
4	4	1	Format version
8	8		Start sector, which contains the sector number relative to the start of the media data
16	8		Number of sectors
24	8		Unknown (DataOffset), which seems to be always 0
32	4		Unknown (BuffersNeeded)
36	4		Unknown (BlockDescriptors). Does this value correspond to the number of block table entries?
40	4	0	Unknown (reserved)
44	4	0	Unknown (reserved)
48	4	0	Unknown (reserved)
52	4	0	Unknown (reserved)
56	4	0	Unknown (reserved)
60	4	0	Unknown (reserved)
64	4		Checksum type
68	4		Checksum size
72	128		Checksum
200	4		Number of entries

Block table entry

The block table entry (BLKXChunkEntry) is 40 bytes in size and consists of:

Offset	Size	Description
0	4	Entry type
4	4	Unknown (comment)
8	8	Start sector, which contains the sector number relative to the start of the start sector of the block table
16	8	Number of sectors
24	8	Data offset, which contains the byte offset relative to the start of the UDIF image file
32	8	Data size, which contain the number of bytes of data stored, which is 0 for sparse data

UDIF block table entry types

Value	Identifier	Description
0x00000000		Unknown (sparse)
0x00000001		Uncompressed (raw) data
0x00000002		Sparse (used for Apple_Free)

0x7ffffffe		Comment

0x80000004		ADC compressed data
0x80000005		zlib compressed data
0x80000006		bzip2 compressed data
0x80000007		LZFSE compressed data
0x80000008		LZMA compressed data

0xffffffff		Block table entries terminator

UDIF comment

TODO: complete section

UDIF data fork

TODO: complete section

UDIF resource fork

TODO: complete section

Notes

Is the maximum compressed chunk size 2048 sectors?

Comment seems to reference compressed data but has no size or number of sectors value.

Virtual Hard Disk (VHD) image format

The Virtual Hard Disk (VHD) format is used by Microsoft vitualization products as one of its image formats. It is both used the store hard disk images and snapshots.

Overview

There are multiple types of VHD images, namely:

Fixed-size VHD image
Dynamic-size (or sparse) VHD image
Differential (or differencing) VHD image

Fixed-size hard disk image

A fixed-size VHD image consists of:

data
file footer

Note that a fixed-size VHD image is equivalent to a raw storage media image with an additional footer.

Dynamic-size (or sparse) hard disk image

A dynamic-size (or sparse) VHD image consists of:

copy of file footer
dynamic disk header
block allocation table
data in blocks
file footer

Differential hard disk image

A differential (or differencing) VHD image consists of:

copy of file footer
dynamic disk header
block allocation table
data in blocks
file footer

Characteristics

Characteristics	Description
Byte order	big-endian
Date and time values	Number of seconds since January 1, 2000 00:00:00 UTC
Character strings	UCS-2 big-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00"

The number of bytes per sector is 512.

Undo disk image

Virtual PC has a feature to create “Undo Disks”. This undo disk feature stores a differential hard disk image in files named something similar like:

VirtualPCUndo_<name>_0_0_hhmmssMMDDYYYY.vud

Where the date and time seems to be stored in UTC and <name> represents the name of the parent image.

File footer

The file footer is 512 bytes in size and consists of:

Offset	Size	Value	Description
0	8	"conectix"	Signature (also referred to as cookie)
8	4		Features
12	4	0x00010000	Format version, where the upper 16-bit are the major version and the lower 16-bit the minor version
16	8		Next offset, which contains the offset to the next (metadata) structure. The offset is relative from the start of the file. It should only be set in dynamic and differential disk images. In fixed disk images it should be set to 0xffffffffffffffff (-1)
24	4		Modification time, which contains the number of seconds since January 1, 2000 00:00:00 UTC
28	4		Creator application
32	4		Creator version, where the upper 16-bit are the major version and the lower 16-bit the minor version
36	4		Creator (host) operating system
40	8		Disk size, which contains the size of the disk in bytes
48	8		Data size, which contains the size of the data in bytes
56	4		Disk geometry
60	4		Disk type
64	4		Checksum, which contains a one's complement of the sum of the file footer excluding the checksum itself
68	16		Identifier, which contains a big-endian UUID
84	1		Saved state, which contains a flag to indicate the image is in saved state
85	427	0	Unknown (Reserved should contain 0-byte values)

Features

Offset	Size	Description
0.0	1 bit	Is temporary disk, which indicates that this disk is a candidate for deletion on shutdown
0.1	1 bit	Unknown (Reserved, must be set to 1)
0.2	30 bits	Unknown (Reserved, must be set to 0)

A value of 0 represents no features are enabled.

Creator application

Value	Identifier	Description
"d2v\x00"		Disk2vhd
"qemu"		Qemu
"vpc\x20"		Virtual PC
"vs\x20\x20"		Virtual Server
"win\x20"		Windows (Disk Management)

Creator host operating system

Value	Identifier	Description
"Mac\x20"		Macintosh
"Wi2k"		Windows

Disk geometry

The disk geometry is 4 bytes in size and consists of:

Offset	Size	Description
0	2	Number of cylinders
2	1	Number of heads
3	1	Number of sectors per track (cylinder)

Disk type

Value	Identifier	Description
0		None
1		Unknown (Deprecated)
2		Fixed hard disk
3		Dynamic hard disk
4		Differential hard disk
5		Unknown (Deprecated)
6		Unknown (Deprecated)

Dynamic disk header

The dynamic disk header is 1024 bytes in size and consists of:

Offset	Size	Value	Description
0	8	"cxsparse"	Signature (Cookie)
8	8		Next offset, which contains the offset to the next (metadata) structure. The offset is relative from the start of the file. Currently this is unused and should be set to 0xffffffffffffffff (-1)
16	8		Block allocation table offset, whic contains the offset to the block allocation table structure. The offset is relative from the start of the file
24	4	0x00010000	Format version, where the upper 16-bit are the major version and the lower 16-bit the minor version
28	4		Number of blocks, which is equivalent to the number of block allocation table entries
32	4		Block size. The block size must be a power-of-two multitude of the sector size and does not include the size of the sector bitmap. The default block size is 4096 x 512-byte sectors (2 MiB)
36	4		Checksum, which contains a one's complement of the sum of the dynamic disk header excluding the checksum itself
40	16		Parent identifier, which contains a big-endian UUID that identifies the parent image. Only used by differential hard disk images
56	4		Parent last modification time, which contains the number of seconds since January 1, 2000 00:00:00 UTC. Only used by differential hard disk images
60	4	0	Unknown (Reserved should contain 0-byte values)
64	512		Parent name, which contains an UCS-2 big-endian string. Only used by differential hard disk images
576	8 x 24 = 192		Array of parent locator entries. Only used by differential hard disk images
768	256	0	Unknown (Reserved should contain 0-byte values)

The maximum number of block allocation table entries should match the maximum possible number of blocks in the disk.

Note that the parent name can also contain a full path, e.g. in .avhd files. The part segments are separated by the \ character.

Parent locator entry

The parent locator entry is 24 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Locator platform code
4	4		Platform data space, which contains the number of 512-byte sectors needed to store the parent hard disk locator
8	4		Locator data size
12	4	0	Unknown (Reserved should contain 0-byte values)
16	8		Locator data offset, which contains the offset to the locator data. The offset is relative from the start of the file

Locator platform code

Value	Identifier	Description
0		None

"Mac\x20"		Mac OS alias stored as a blob
"MacX"		File URL with UTF-8 encoding conforming to RFC 2396

"W2ku"		Absolute Windows path, which contains an UCS-2 big-endian string
"W2ru"		Windows path relative to the differential image, which contains an UCS-2 big-endian string
"Wi2k"		Unknown (Deprecated)
"Wi2r"		Unknown (Deprecated)

Block allocation table

The block allocation table is only used in dynamic and differential disk images.

The block allocation table consists of 32-bit entries. An entry contains the sector number where the data block starts or is set to 0xffffffff (-1) if the block is sparse or stored in the parent disk image.

if block_allocation_table_entry == 0xffffffff:
    block is sparse or stored in parent
else:
    file_offset = (block_allocation_table_entry * 512 ) + sector_bitmap_size

Unused block in a dynamic disk are sparse and should be filled with zero byte values. In a differential disk the block is stored in the parent disk image.

Data blocks

Data blocks are only used in dynamic and differential disk images.

A data block consists of:

sector bitmap
sector data

size_of_bitmap (in bytes) = block_size / (512 * 8)

The size of the bitmap is rounded up to the next multitude of the sector size.

Sector bitmap

In dynamic disk images the sector bitmap indicates which sectors contain data (bit set to 1) or are sparse (bit set to 0).

In differential disk images the sector bitmap indicates which sectors are stored within the image (bit set to 1) or in the parent (bit set to 0).

The bitmap is padded to a 512-byte sector boundary.

The bitmap is stored on a per-byte basis with the MSB represents the first bit in the bitmap.

References

VHD Specifications, by Microsoft

Virtual Hard Disk version 2 (VHDX) image format

The Virtual Hard Disk version 2 (VHDX) format is used by Microsoft vitualization products as one of its image formats. It is both used the store hard disk images and snapshots.

Overview

A VHDX image file consist of:

file header
2x image headers
2x region tables
log or metadata journal
block allocation table (BAT) region
metadata region
- metadata table
- metadata items
image (content) data

The elements are stored in 64 KiB (65536 bytes) aligned blocks

Characteristics

Characteristics	Description
Byte order	little-endian
Date and time values	N/A
Character strings	UCS-2 little-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00"

The number of bytes per sector is 512 or 4096 depending on the logical sector size.

File hader

The file header of (file type identifier) is 64 KiB (65536 bytes) in size and consists of:

Offset	Size	Value	Description
0	8	"vhdxfile"	Signature
8	512		Creator application and version, with contains an UCS-2 little-endian string with end-of-string character
520	65016		Unknown (reserved)

Image header

The image header is 4 KiB (4096 bytes) in size and consists of:

Offset	Size	Value	Description
0	4	"head"	Signature
4	4		Checksum
8	8		Sequence number
16	16		File write identifier, which contains a GUID
32	16		Data write identifier, which contains a GUID
48	16		Log identifier, which contains a GUID
64	2		Log format version
66	2	1	Format version
68	4		Log size, which according to MS-VHDX this value must be a multitude of 1 MiB
72	8		Log offset, which according to MS-VHDX this value must be a multitude of 1 MiB and greater than or equal to 1 MiB
80	4016	0	Unknown (reserved), which according to MS-VHDX this value must be set to 0

Checksum calculation

The CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over the 4 KiB bytes of data of the image header, where the image header checkum value is considered to be 0 during calculation.

Region table

The region table is stored in a block of 64 KiB (65536 bytes) and consists of:

region table header
0 or more region table entries
Unknown (reserved)

TODO: determine if 0 entries is actually supported

Region table header

The region table header is 16 bytes in size and consists of:

Offset	Size	Value	Description
0	4	"regi"	Signature
4	4		Checksum
8	4		Number of table entries, which according to MS-VHDX this value must be less than or equal to 2047
12	4	0	Unknown (reserved), which according to MS-VHDX this value must be set to 0

The CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over the 64 KiB bytes of data of the region table where the image header checkum value is considered to be 0 during calculation.

Region table entry

The region table entry is 32 bytes in size and consists of:

Offset	Size	Description
0	16	Region type identifier, which contains a GUID
16	8	Region data offset, which contains an offset relative to the start of the file. According to MS-VHDX this value must be a multitude of 1 MiB and greater than or equal to 1 MiB
24	4	Region data size, which according to MS-VHDX this value must be a multitude of 1 MiB
28	4	Is required flag, which contains 1 to indicate the region type needs to be supported

Region type identifiers

Value	Identifier	Description
2dc27766-f623-4200-9d64-115e9bfd4a08		Block allocation table (BAT) region
8b7ca206-4790-4b9a-b8fe-575f050f886e		Metadata region

Metadata region

The metadata region contains:

metadata table
metadata items

Metadata table

The metadata table is stored in a block of 64 KiB (65536 bytes) and consists of:

metadata table header
0 or more metadata table entries
Unknown (reserved)

TODO: determine if 0 entries is actually supported

Metadata table header

The metadata table header is 32 bytes in size and consists of:

Offset	Size	Value	Description
0	8	"metadata"	Signature
8	2	0	Unknown (reserved), which according to MS-VHDX this value must be set to 0
10	2		Number of table entries, which according to MS-VHDX this value must be less than or equal to 2047
12	20	0	Unknown (reserved), which according to MS-VHDX this value must be set to 0

Metadata table entry

The metdata table entry is 32 bytes in size and consists of:

Offset	Size	Description
0	16	Metadata item identifier, which contains a GUID
16	4	Metadata item offset, which contains an offset relative to the start of the metadata region. According to MS-VHDX this value must be greater than 64 KiB
20	4	Metadata item size
24	8	Unknown

TODO: describe last 8 bytes

Value	Identifier	Description
0x00000001	IsUser
0x00000002	IsVirtualDisk
0x00000004	IsRequired

Metadata items

Metadata item identifiers

Value	Identifier	Description
2fa54224-cd1b-4876-b211-5dbed83bf4b8		Virtual disk size
8141bf1d-a96f-4709-ba47-f233a8faab5f		Logical sector size
a8d35f2d-b30b-454d-abf7-d3d84834ab0c		Parent locator
beca12ab-b2e6-4523-93ef-c309e000c746		Virtual disk identifier
caa16737-fa36-4d43-b3b6-33f0aa44e76b		File parameters
cda348c7-445d-4471-9cc9-e9885251c556		Physical sector size

File parameters metadata item

The file parameters metadata item is 8 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Block size, which according to MS-VHDX this value must be a power of 2 and greater than or equal to 1 MiB and not greater than 256 MiB
4.0	1 bit		Blocks remain allocated flag, which is used to indicate the file is a fixed-size image
4.1	1 bit		Has parent flag, which indicates if the VHDX file contains a differential image that has a parent image
4.2	30 bits	0	Unknown (reserved), which according to MS-VHDX this value must be set to 0

Logical sector size metadata item

The logical sector size metadata item is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Logical sector size, which according to MS-VHDX this value must be either 512 or 4096

Parent locator metadata item

The parent locator metadata item is of variable size and consits of:

parent locator header
0 or more parent locator entry
parent locator key and value data

TODO: determine if 0 entries is actually supported

Parent locator header

The parent locator header is 20 bytes in size and consists of:

Offset	Size	Value	Description
0	16		Parent locator type indicator, which contains the GUID: b04aefb7-d19e-4a81-b789-25b8e9445913
16	2	0	Unknown (reserved), which according to MS-VHDX this value must be set to 0
18	2		Number of entries (or key-value pairs)

Parent locator entry

The parent locator entry is 12 bytes in size and consists of:

Offset	Size	Description
0	4	Key data offset, which contains the offset relative from the start of the parent locator header
4	4	Value data offset, which contains the offset relative from the start of the parent locator header
8	2	Key data size
10	2	Value data size

Parent locator key and value data

A parent locator key or value is stored as UCS-2 little-endian string without end-of-string character.

Known keys are:

Value	Description
absolute_win32_path	The value contains an absolute drive Windows path "\?\c:\file.vhdx"
parent_linkage	The value contains a string of a GUID. This GUID should correspond to the data write identifier of the parent image
parent_linkage2	The value contains a string of a GUID
relative_path	The value contains a relative Windows path "..\file.vhdx"
volume_path	The value contains an absolute volume Windows path with "\?\Volume{%GUID%}\file.vhdx"

Physical sector size metadata item

The physical sector size metadata item is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Physical sector size, which according to MS-VHDX this value must be either 512 or 4096

Virtual disk identifier metadata item

The virtual disk identifier metadata item is 16 bytes in size and consists of:

Offset	Size	Value	Description
0	16		Virtual disk identifier, which contains a GUID

Note that in contrast to VHD (version 1) the virtual disk identifier does not change between a differential image and its parent. The data write identifier seems to be used instead.

Virtual disk size metadata item

The virtual disk size metadata item is 8 bytes in size and consists of:

Offset	Size	Value	Description
0	8		Virtual disk size

Block allocation table (BAT) region

The block allocation table (BAT) region contains the block allocation table. The entries of this table describe the location of either blocks containing image content data (or payload blocks) or blocks containing a sector bitmap.

The size of an individual sector bitmap block is 1 MiB which allows for 2^23 sectors to be represented by the bitmap.

Block allocation table (BAT) entries are grouped in chunks. The size of a chunk can be calculated as following:

number_of_entries_per_chunk = (2^23 * logical_sector_size) / block_size

The block allocation table (BAT) consists of:

one or more chunks containing:
- number of entries per chunk x BAT entry describing image content data
- 1 x BAT entry describing the a sector bitmap

Unused BAT entries are filled with 0-byte values.

The block allocation table (BAT) of:

a fixed-size image does not contain sector bitmap entries;
a dynamic-size image does contain sector bitmap entries, although according to MS-VHDX are not used;
a differential image does contain sector bitmap entries.

Block allocation table (BAT) entry

The block allocation table (BAT) entry is 64 bits in size and consists of:

Offset	Size	Value	Description
0.0	3 bits		Block state
0.3	17 bits	0	Unknown (reserved), which according to MS-VHDX this value must be set to 0
2.4	44 bits		Block offset, which contains the offset relative from the start of the file as a multitude of 1 MiB

Block states

Payload block states

Value	Identifier	Description
0	PAYLOAD_BLOCK_NOT_PRESENT	Block is new and therefore not (yet) stored in the file
1	PAYLOAD_BLOCK_UNDEFINED	Block is not stored in the file
2	PAYLOAD_BLOCK_ZERO	Block is sparse and therefore filled with 0-byte values
3	PAYLOAD_BLOCK_UNMAPPED	Block has been unmapped

6	PAYLOAD_BLOCK_FULLY_PRESENT	Block is stored in the file
7	PAYLOAD_BLOCK_PARTIALLY_PRESENT	Block is stored in the parent

Sector bitmap block states

Value	Identifier	Description
0	SB_BLOCK_NOT_PRESENT	Block is new and therefore not (yet) stored in the file

6	SB_BLOCK_PRESENT	Block is stored in the file

Sector bitmap

In differential disk images the sector bitmap indicates which sectors are stored within the image (bit set to 1) or in the parent (bit set to 0).

The bitmap is stored in a 1 MiB block.

The bitmap is stored on a per-byte basis with the LSB represents the first bit in the bitmap.

Log (metadata journal)

TODO: complete section

The log serves as metadata journal is of variable size and consist of contiguous circular (ring) buffer that contains log entries.

Log entry

TODO: complete section

4 KiB (4096 bytes) in size

Log entry header

TODO: complete section

Zero descriptor

TODO: complete section

Data descriptor

TODO: complete section

Data sector

TODO: complete section

References

MS-VHDX: Virtual Hard Disk v2 (VHDX) File Format, by Microsoft

VMware Virtual Disk (VMDK) format

The VMware Virtual Disk (VMDK) format is used by VMware virtualization products as one of its image format.

Overview

A VMDK disk image can consist of multiple files, such as:

descriptor file
extent data files
raw extent data file
VMDK sparse extent data file
COWD sparse extent data file

Characteristics

Characteristics	Description
Byte order	little-endian
Date and time values
Character strings	narrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a codepage defined in the descriptor file

The number of bytes per sector is 512.

Disk types

There are multiple types of VMKD images, namely:

The 2GbMaxExtentFlat (or twoGbMaxExtentFlat) disk image, which consists of:

a descriptor file (<name>.vmdk)
raw data extent files (<name>-f###.vmdk), where ### is contains a decimal value starting with 1.

The 2GbMaxExtentSparse (or twoGbMaxExtentSparse) disk image, which consists of:

a descriptor file (<name>.vmdk)
VMDK sparse data extent files (<name>-s###.vmdk), where ### is contains a decimal value starting with 1.

The monolithicFlat disk image, which consists of:

a descriptor file (<name>.vmdk)
raw data extent file (<name>-f001.vmdk)

The monolithicSparse disk image, which consists of:

VMDK sparse data extent file (<name>.vmdk) also contains the descriptor file data.

The vmfs disk image, which consists of:

a descriptor file (<name>.vmdk)
raw data extent file (<name>-flat.vmdk)

The vmfsSparse differential disk image, which consists of:

a descriptor file (<name>.vmdk)
COWD sparse data extent files (<name>-delta.vmdk)

TODO: describe more disk types

Delta links

A delta link is similar to a differential image where the image contains the changes (or delta) in comparison of a parent image. According to the Virtual Disk Format 5.0 specification one delta image can chain to another delta image.

TODO: Name <name>-delta.vmdk

Descriptor file

The descriptor file is a case-insensitive text based file that contains the following information:

optional comment and empty lines
header
extent descriptions
optional change tracking file
disk data base (DDB)

Note that the descriptor file can contains leading and trailing whitespace. Lines are separated by a line feed character (0x0a). And leading comment (starting with #) and empty lines.

The header of a descriptor file looks similar to the data below.

# Disk DescriptorFile
version=1
CID=12345678
parentCID=ffffffff
createType="twoGbMaxExtentSparse"

The header consists of the following values:

Value	Description
"# Disk DescriptorFile"	Section header (or file signature)
version	Format version
encoding	Encoding
CID	Content identifier, which contains a random 32-bit value updated the first time the content of the virtual disk is modified after the virtual disk is opened
parentCID	The content identifier of the parent, which contains a 32-bit value identifying the parent content, where a value of 'ffffffff' (-1) represents no parent content
isNativeSnapshot	TODO: add description. A value of "no" has been observed in a VMWare Player 9 descriptor file
createType	Disk type
parentFileNameHint	Contains the path to the parent image, which is only present if the image is a differential image (delta link)

TODO: confirm if a content identifier of ‘fffffffe’ (-2) represents that the long content identifier should be used

Format versions

Value	Description
1	TODO: add description
2	TODO: add description
3	TODO: add description

Encodings

Note that it is currently unknown which encodings are supported, currently it is assumed that at least the Windows codepages are supported and that the default is UTF-8.

Value	Description
Big5	Big5 assumed to be equivalent to Windows codepage 950
GBK	GBK assumed to be equivalent to Windows codepage 936, which was observed in VMWare Workstation for Windows, Chinese edition
Shift_JIS	Shift_JIS assumed to be equivalent to Windows codepage 932, which was observed in VMWare Workstation for Windows, Japanese edition
UTF-8	UTF-8

windows-949-2000	Windows codepage 949, 2000 version
windows-1252	Windows codepage 1252, which was observed in VMWare Player 9 descriptor file

Disk types

Value	Description
2GbMaxExtentFlat, twoGbMaxExtentFlat	The disk is split into fixed-size extents of maximum 2 GB, which consists of raw extent data files
2GbMaxExtentSparse, twoGbMaxExtentSparse	The disk is split into sparse (dynamic-size) extents of maximum 2 GB, which consists of VMDK sparse extent data files
custom	TODO: add description. Descriptor file with arbitrary extents, used to mount v2i-format
fullDevice	The disk uses a full physical disk device
monolithicFlat	The disk is a single raw extent data file
monolithicSparse	The disk is a single VMDK sparse extent data file
partitionedDevice	The disk uses a full physical disk device, using access per partition
streamOptimized	The disk is a single compressed VMDK sparse extent data file
vmfs	The disk is a single raw extent data file, which is similar to the "monolithicFlat"
vmfsEagerZeroedThick	The disk is a single raw extent data file
vmfsPreallocated	The disk is a single raw extent data file
vmfsRaw	The disk uses a full physical disk device
vmfsRDM, vmfsRawDeviceMap	The disk uses a full physical disk device, which is also referred to as Raw Device Map (RDM)
vmfsRDMP, vmfsPassthroughRawDeviceMap	The disk uses a full physical disk device, which is similar to the Raw Device Map (RDM), but sends SCSI commands to underlying hardware
vmfsSparse	The disk is split into COWD sparse (dynamic-size) extents
vmfsThin	The disk is split into COWD sparse (dynamic-size) extents

Extent descriptions

The extent descriptions of a descriptor file looks similar to the data below.

# Extent description
RW 4192256 SPARSE "test-s001.vmdk"

# Extent description
RW 1048576 FLAT "test-f001.vmdk" 0

The extent descriptions consists of the following values:

Value	Description
"# Extent description"	Section header
	Extent descriptors

Extent descriptor

The extent descriptor consists of the following values:

Value	Description
1st	Access mode
2nd	The number of sectors
3rd	Extent type
If extent type is not ZERO
4th	Path of the VMDK extent data file, relative to the location of the VMDK descriptor file
Optional
5th	The extent start sector
Seen in VMWare Player 9 in combination with a physical device extent on Windows
6th and 7th	"partitionUUID" followed by a device identifier

The extent offset is specified only for flat extents and corresponds to the offset in the file or device where the extent data is located. For device-backed virtual disks (physical or raw disks) the extent offset can be non-zero. For raw extent data files the extent offset should be zero.

Extent access mode

The extent access mode consists of the following values:

Value	Description
NOACCESS	No access
RDONLY	Read only
RW	Read write

Extent types

The extent type consists of the following values:

Value	Description
FLAT	raw extent data file
SPARSE	VMDK sparse extent data file
ZERO	Sparse extent that consists of 0-byte values
VMFS	raw extent data file
VMFSSPARSE	COWD sparse extent data file
VMFSRDM	Unknown (Physical disk device that uses RDM?)
VMFSRAW	Unknown (Physical disk device?)

Note that VMWare Player 9 has been observed to use “FLAT” for Windows devices

Change tracking file section

The change tracking file section was introduced in version 3 and looks similar to:

# Change Tracking File
changeTrackPath="test-flat.vmdk"

The change tracking file section consists of the following values:

Value	Description
"# Change Tracking File"	Section header
changeTrackPath	Unknown (The path to the change tracking file?)

Disk database

The disk data base of a descriptor file looks similar to the data below.

# The Disk Data Base
#DDB

ddb.virtualHWVersion = "4"
ddb.geometry.cylinders = "16383"
ddb.geometry.heads = "16"
ddb.geometry.sectors = "63"
ddb.adapterType = "ide"
ddb.toolsVersion = "0"

The disk data base consists of the following values:

Value	Description
"# The Disk Data Base"	Section header
"#DDB"	Currently assumed to be part of the section header
ddb.deletable	Unknown (seen: "true")
ddb.virtualHWVersion	The virtual hardware version. For VMWare Player and Workstation this seems to correspond with the application version
ddb.longContentID	The long content identifier, which contains a 128-bit base16 encoded value, without spaces
ddb.uuid	UUIDm which contains a 128-bit base16 encoded value, with spaces between bytes
ddb.geometry.cylinders	The number of cylinders
ddb.geometry.heads	The number of heads
ddb.geometry.sectors	The number of sectors
ddb.geometry.biosCylinders	The number of cylinders as reported by the BIOS
ddb.geometry.biosHeads	The number of heads as reported by the BIOS
ddb.geometry.biosSectors	The number of sectors as reported by the BIOS
ddb.adapterType	Disk adapter type
ddb.toolsVersion	String containing the version of the installed VMWare tools version
ddb.thinProvisioned	Unknown (seen: "1")

VirtualBox has been observed to use a different case for “disk” in the section header:

# The disk Data Base

Virtual hardware version

Value	Description
4	TODO: add description

6	TODO: add description
7	TODO: add description

9	VMWare Player/Workstation 9.0

Disk adapter types

Value	Description
ide	TODO: add description
buslogic	TODO: add description
lsilogic	TODO: add description
legacyESX	TODO: add description

The buslogic and lsilogic values are for SCSI disks and show which virtual SCSI adapter is configured for the virtual machine. The legacyESX value is for older ESX Server virtual machines when the adapter type used in creating the virtual machine is not known.

The raw extent data file

The raw extent data file contains the actual disk data. The raw extent data file can be a file or a device.

This type of extent data file is also known as “Simple” or “Flat Extent”.

The VMDK sparse extent data file

The VMDK sparse extent data file contains the actual disk data. A VMDK sparse extent data file consists of:

file header
optional embedded descriptor file
optional secondary grain directory
- optional secondary grain tables
(primary) grain directory
- (primary) grain tables
grains
optional backup file header

This type of extent data file is also known as “Hosted Sparse Extent” or “Stream-Optimized Compressed Sparse Extent” when markers are used.

Note that the actual layout can vary per file, Stream-Optimized Compressed Sparse Extent have been observed to use secondary file headers.

Changes in format version 2:

added encrypted disk support (though this feature never seem to never have been implemented).

Changes in format version 3:

the size of extent files is no longer limited to 2 GiB;
added support for persistent changed block tracking (CBT).

Note that “CBT”, the changeTrackPath value in the descriptor file references a file that describes changed areas on the virtual disk.

File header

The file header is 512 bytes in size and consists of:

Offset	Size	Value	Description
0	4	"KDMV"	Signature
4	4	1, 2 or 3	Format version
8	4		Flags
12	8		Maximum data number of sectors (capacity)
20	8		Sectors per grain, which must be a power of 2 and > 8
28	8		Embedded descriptor file start sector, which is relative from the start of the file or 0 if not set
36	8		Embedded descriptor file size in sectors
44	4	512	The number of grains table entries
48	8		Secondary grain directory start sector, which is relative from the start of the file or 0 if not set
56	8		Primary grain directory start sector, which is relative from the start of the file, 0 if not set or 0xffffffffffffffff (GD_AT_END) if relative from the end of the file
64	8		Metadata size in sectors
72	1		Value to determine if the extent data file was cleanly closed (or dirty flag)
73	1	'\n'	Single end of line character
74	1	' '	Non end of line character
75	1	'\r'	First double end of line character
76	1	'\n'	Second double end of line character
77	2		Compression method
79	433	0	Unknown (Padding)

The end of line characters are used to detect corruption due to file transfers that alter line end characters.

According to Virtual Disk Format 5.0 specification the maximum data number of sectors (capacity) should be a multitude of the sectors per grain. Note that it has been observed that this is not always the case.

If the primary grain directory start sector is 0xffffffffffffffff (GD_AT_END) in a Stream-Optimized Compressed Sparse Extent there should be a secondary file header stored at offset -1024 relative from the end of the file (stream) that contains the correct grain directory start sector.

Flags

The flags consist of the following values:

Value	Identifier	Description
0x00000001		Valid new line detection test
0x00000002		Use secondary grain directory. The secondary (redundant) grain directory should be used instead of the primary grain directory
As of format version 2
0x00000004		Use zeroed-grain table entry. The zeroed-grain table entry overloads grain data sector number 1 to indicate the grain is sparse
Common
0x00010000		Has compressed grain data
0x00020000		Contains metadata, where the file contains markers to identify metadata or data blocks

Compression method

The compression method consist of the following values:

Value	Identifier	Description
0x00000000	COMPRESSION_NONE	No compression
0x00000001	COMPRESSION_DEFLATE	Compression using Deflate (RFC1951)

Markers

The markers are used in Stream-Optimized Compressed Sparse Extents. The corresponding flag must be set for markers to be present. An example of the layout of a Stream-Optimized Compressed Sparse Extent that uses markers is:

file header
embedded descriptor
compressed grain markers
grain table marker
grain table
grain directory marker
grain directory
footer marker
secondary file header
end-of-stream marker

The marker

The marker is 512 bytes in size and consists of:

Offset	Size	Value	Description
0	8		Value
8	4		Marker data size
If marker data size equals 0
12	4		Marker type
16	496	0	Unknown (Padding)
If marker data size > 0
12	...		Compressed grain data

If the marker data size > 0 the marker is a compressed grain marker.

Marker types

Value	Identifier	Description
0x00000000	MARKER_EOS	End-of-stream marker
0x00000001	MARKER_GT	Grain table (metadata) marker
0x00000002	MARKER_GD	Grain directory (metadata) marker
0x00000003	MARKER_FOOTER	Footer (metadata) marker

Compressed grain marker

The compressed grain marker indicates that compressed data follows.

Offset	Size	Value	Description
Compressed grain header
0	8	0	Logical sector number
8	4		Compressed data size

12	...		Compressed data, which contains Deflate compressed data

Note that the compressed grain data can be larger than the grain data size.

End of stream marker

The end-of-stream marker indicates the end of the virtual disk. Basically the end-of-stream marker is an empty sector block.

Offset	Size	Value	Description
0	8	0	Value
8	4	0	Marker data size
12	4	MARKER_EOS	Marker type
16	496	0	Unknown (Padding)

Grain table marker

The grain table marker indicates that a grain table follows the marker sector block.

Offset	Size	Value	Description
0	8	0	Value
8	4	0	Marker data size
12	4	MARKER_GT	Marker type
16	496	0	Unknown (Padding)
512	...		Grain table

Grain directory marker

The grain directory marker indicates that a grain directory follows the marker sector block.

Offset	Size	Value	Description
0	8	0	Value
8	4	0	Marker data size
12	4	MARKER_GD	Marker type
16	496	0	Unknown (Padding)
512	...		Grain directory

The footer marker indicates that a footer follows the marker sector block.

Offset	Size	Value	Description
0	8	0	Value
8	4	0	Marker data size
12	4	MARKER_FOOTER	Marker type
16	496	0	Unknown (Padding)
512	...		Footer

Grain directory

The grain directory is also referred to as level 0 metadata.

The size of the grain directory is dependent on the number of grains in the extent data file. The number of entries in the grain directory can be determined as following:

grain table size = number of grain table entries * grain size

number of grain directory entries = maximum data size / grain table size
if maximum data size % grain table size > 0:
	number of grain directory entries += 1

The grain directory consists of 32-bit grain table offsets:

Offset	Size	Value	Description
0	4		Grain table start sector, which is relative from the start of the file or 0 if sparse or the sector is stored in the parent image

The grain directory is stored in a multitude of 512 byte sized blocks.

Note that as of VMDK sparse extent data file version 2 if the “use zeroed-grain table entry” flag is set, a start sector of 1 indicates the grain table is sparse.

Grain table

The grain table is also referred to as level 1 metadata.

The size of the grain table is of variable size. The number of entries in the grain table is stored in the file header. Note that the number of entries in the last grain table is dependent on the maximum data size and not necessarily the same as the value stored in the file header.

The grain directory consists of 32-bit grain table offsets:

Offset	Size	Value	Description
0	4		Grain data sector number, which is relative from the start of the file or 0 if sparse or the sector is stored in the parent image

The number of entries in a grain table and should be 512, therefore the size of the grain table is 512 x 4 = 2048 bytes.

The grain table is stored in a multitude of 512 byte sized blocks.

Note that as of VMDK sparse extent data file version 2 if the “use zeroed-grain table entry” flag is set, a sector number of 1 indicates the grain table is sparse.

Grain data

In an uncompressed sparse extent data file the data is stored at the grain data sector number.

In a compressed sparse extent data file every non-sparse grain is assumed to be stored compressed.

Compressed grain data

The compressed grain data is of variable size and consists of:

Offset	Size	Value	Description
Compressed grain header
0	8	0	Logical sector number
8	4		Compressed data size

12	...		Compressed data, which contains zlib compressed data
...	...		Unknown (Padding)

The uncompressed data size should be the grain size or less for the last grain.

Footer

The footer is only used in Stream-Optimized Compressed Sparse Extents. The footer is the same as the file header. The footer should be the last block of the disk and immediately followed by the end-of-stream marker so that they together make up the last two sectors of the disk.

The header and footer differ in that the grain directory offset value in the header is set to 0xffffffffffffffff (GD_AT_END) and in the footer to the correct value.

Changed block tracking (CBT)

TODO: complete section

The COWD sparse extent data file

The copy-on-write disk (COWD) sparse extent data file contains the actual disk data. The COW sparse extent data file consists of:

file header
grain directory
grain tables
grains

This type of extent data file is also known as ESX Server Sparse Extent.

File header

The file header is 2048 bytes in size and consists of:

Offset	Size	Value	Description
0	4	"COWD"	Signature
4	4	1	Format version
8	4	0x00000003	Unknown (Flags)
12	4		Maximum data number of sectors (capacity)
16	4		Sectors per grain
20	4	4	Grain directory start sector, which is relative from the start of the file or 0 if not set
24	4		Number of grain directory entries
28	4		The next free sector
In root extent data file
32	4		The number of cylinders
36	4		The number of heads
40	4		The number of sectors
44	1016		Unknown (Empty values)
In child extent data files
32	1024		Parent file name
1056	4		Parent generation
Common
1060	4		Generation
1064	60		Name
1124	512		Description
1636	4		Saved generation
1640	8		Unknown (Reserved)
1648	4		Value to determine if the extent data file was cleanly closed (or dirty flag)
1652	396		Unknown (Padding)

Note that the parent file name seems not to be set in recent delta sparse extent files.

Grain directory

The grain directory is also referred to as level 0 metadata.

The size of the grain directory is dependent on the number of grains in the extent data file. The number of entries in the grain directory is stored in the file header.

The grain directory consists of 32-bit grain table offsets:

Offset	Size	Value	Description
0	4		Grain table start sector, which is relative from the start of the file or 0 if not set

The grain directory is stored in a multitude of 512 byte sized blocks. Unused bytes are set to 0.

Grain table

The grain table is also referred to as level 1 metadata.

The size of the grain table is of variable size. The number of entries in a grain table is the fixed value of 4096.

The grain directory consists of 32-bit grain table offsets:

Offset	Size	Value	Description
0	4		Grain sector number, which is relative from the start of the file or 0 if not set

The grain table is stored in a multitude of 512 byte sized blocks. Unused bytes are set to 0.

Change tracking file

TODO: complete section

Offset	Size	Value	Description
0	4	"\xa2\x72\x19\xf6"	Unknown (signature?)
4	4	1	Unknown (version?)
8	4		Unknown (empty values)
12	4	0x200	Unknown
16	8		Unknown
24	8		Unknown
32	4		Unknown
36	4		Unknown
40	4		Unknown
44	16		Unknown (UUID?)
60	...		Unknown (empty values?)

Corruption scenarios

The total size specified by the number of grain table entries is lager than size specified by the maximum number of sectors. Seen in VMDK images generated by qemu-img.

Notes

The markers can be used to scan for the individual parts of the VMDK sparse extent data file if the stream has been truncated, but not that this can be very expensive process IO-wise.

References

Virtual Disk Format 5.0, by VMWare

Volume system formats

A volume (or logical drive) is a single continous accessible storage area, typically containing a file system. A volume system format is used to manage the storage of one or more volumes.

Although related, a volume is a different concept as a partition.

Apple Partition Map (APM) format

The Apple Partition Map (APM) format is used on Motorola based Macintosh computers. On Intel based Macintosh computers the GUID Partition Table (GPT) format is used.

Overview

An Apple Partition Map (APM) consists of:

a drive descriptor
partition map entry of type “Apple_partition_map”
zero or more partition map entries

Characteristics

Characteristics	Description
Byte order	big-endian
Date and time values	N/A
Character strings	ASCII

Terminology

Term	Description
Physical block	A fixed location on the storage media defined by the storage media
Logical block	An abstract location on the storage media defined by software

The drive descriptor

The driver descriptor identifies the device drivers installed on a storage medium. The driver descriptor can contain refer to multiple device drivers. Every device driver is stored in a separate partition.

The drive descriptor is situated in the first block of the storage medium. This block is referred to as the device driver block. The driver descriptor block is not considered part of any partition.

The drive descriptor is 512 bytes in size and consists of:

Offset	Size	Value	Description
0	2	"\x45\x52" or "ER"	Signature
2	2		The block size of the device in bytes
4	4		The number of blocks on the device
8	2		Device type (Reserved)
10	2		Device identifier (Reserved)
12	4		Device data (Reserved)
16	2		The number of driver descriptors
18	8		The first device driver descriptor
26	484		Additional driver descriptors, where unused entries are 16-bit integer values filled with 0

The device driver descriptor

The device driver descriptor is 8 bytes in size and consists of:

Offset	Size	Description
0	4	Start block of the device driver
4	2	Device driver number of blocks
6	2	Operating system type, where is 1 represents "Mac OS"

The partition map

The partition map is stored after the drive descriptor. The partition map consists of multiple entries that must be stored continuously. The partition map itself is considered a partition therefore the first entry in the partition map describes the partition map itself.

The partition map entry

A partition map entry is 512 bytes in size and consists of:

Offset	Size	Value	Description
0	2	"\x50\x4d" or "PM"	Signature
2	2	0x00	Unknown (Reserved)
4	4		Total number of entries in the partition map
8	4		Partition start sector
12	4		Partition number of sectors
16	32		Partition name, which contains an ASCII string
48	32		Partition type, which contains an ASCII string
80	4		Data area start sector
84	4		Data area number of sectors
88	4		Status flags
92	4		Boot code start sector
96	4		Boot code number of sectors
100	4		Boot code address
104	4		Unknown (Reserved)
108	4		Boot code entry point
112	4		Unknown (Reserved)
116	4		Boot code checksum
120	16		Processor type
136	188 x 2 = 376	0x00	Unknown (Reserved)

Note that the partition name can be empty.

Partition types

The partition types consist of the following values:

Value	Identifier	Description
"Apple_Boot"
"Apple_Boot_RAID"
"Apple_Bootstrap"
"Apple_Driver"
"Apple_Driver43"
"Apple_Driver43_CD"
"Apple_Driver_ATA"
"Apple_Driver_ATAPI"
"Apple_Driver_IOKit"
"Apple_Driver_OpenFirmware"
"Apple_Extra"
"Apple_Free"
"Apple_FWDriver"
"Apple_HFS"
"Apple_HFSX"
"Apple_Loader"
"Apple_MDFW"
"Apple_MFS"
"Apple_partition_map"
"Apple_Patches"
"Apple_PRODOS"
"Apple_RAID"
"Apple_Rhapsody_UFS"
"Apple_Scratch"
"Apple_Second"
"Apple_UFS"
"Apple_UNIX_SVR2"
"Apple_Void"
"Be_BFS"
"MFS"

Status flags

The partition status flags consist of the following values:

Value	Identifier	Description
0x00000001		Is valid
0x00000002		Is allocated
0x00000004		Is in use
0x00000008		Contains boot information
0x00000010		Is readable
0x00000020		Is writable
0x00000040		Boot code is position independent

0x00000100		Contains a chain-compatible driver
0x00000200		Contains a real driver
0x00000400		Contains a chain driver

0x40000000		Automatic mount at startup
0x80000000		Is startup partition

Note that the “is in use” status flags does not appear to be used consistently.

GUID Partition Table (GPT) format

The GUID Partition Table (GPT) is a partitioning schema that is the successor to the Master Boot Record (MBR) Partition Table for Intel x86 based computers.

Overview

A GUID Partition Table (GPT) consists of:

A protective or hybrid Master Boot Record (MBR) stored in block (LBA) 0
A GPT partition table header stored in block (LBA) 1
GPT partition entries stored in blocks (LBA) 2 - 33
paritions area
- GPT partitions
- MBR partitions if hybrid MBR/GPT
backup GPT partition entries (typically stored the blocks (LBA) before the last block -33 - -2)
A backup GPT partition table header (typically stored in the last block (LBA) -1)

The GPT partition table header signature can be used to determine the block (LBA) (or sector) size.

Characteristics

Characteristics	Description
Byte order	little-endian
Date and time values	N/A
Character strings	UTF-16 little-endian without byte order mark (BOM)

Master Boot Record (MBR)

Hybrid Master Boot Record (MBR)

In hybrid configuration both GPT and MBR are used concurrently. Depending on the operating system one might have precedence over the other.

Protective Master Boot Record (MBR)

The Protective Master Boot Record (MBR) is an MBR with a single partition of type “EFI GPT protective partition” (0xee) that allocated as much of the drive as possible.

GPT partition table header

The GPT partition table header is 92 bytes in size and consists of:

Offset	Size	Value	Description
0	8	"EFI PART"	Signature
8	2	0	Minor format version
10	2	1	Major format version
12	4	92	Header data size, which contains the size of the GPT partition table header data
16	4		Header data checksum
20	4	0	Unknown (Reserved)
24	8		Partition header block number (LBA)
32	8		Backup partition header block number (LBA)
40	8		Partitions area start block number (LBA)
48	8		Partitions area end block number (LBA), where the block number is included in the partitions area block range
56	16		Disk identifier (GUID)
72	8		Partition entries start block number (LBA)
80	4		Number of partition entries
84	4	128	Partition entry data size
88	4		Partition entries data checksum
92	...	0	Unknown (Reserved)

The partition entries start block number (LBA) of the backup GPT partition table header points to backup GPT partition entries.

Note that the number of partition entries value contains the number of available partition entries not the number of used partition entries. Empty partition entries have a unused entry partition type identifier.

Checksum calculation

The CRC-32 algorithm with polynominal 0x04c11db7 and initial value of 0 is used to calculate the checksums.

The checksum is calculated over the 92 bytes of the table header data, where the header data checkum value is considered to be 0 during calculation.

GPT partition entries

GPT Partition entry

The GPT partition entry is 128 bytes in size and consists of:

Offset	Size	Description
0	16	Partition type identifier (GUID)
16	16	Partition identifier (GUID)
32	8	Partition start block number (LBA)
40	8	Partition end block number (LBA), where the block number is included in the partition block range
48	8	Attribute flags
56	72	Partition name, which contains a UTF-16 little-endian string

Partition types

Value	Identifier	Description
00000000-0000-0000-0000-000000000000		Unused entry
024dee41-33e7-11d3-9d69-0008c781f39f		MBR partition scheme
c12a7328-f81f-11d2-ba4b-00a0c93ec93b		EFI System
21686148-6449-6e6f-744e-656564454649		BIOS boot partition
d3bfe2de-3daf-11df-ba40-e3a556d89593		Intel Fast Flash (iFFS) partition (for Intel Rapid Start technology)
f4019732-066e-4e12-8273-346c5641494f		Sony boot partition
bfbfafe7-a34f-448a-9a5b-6213eb736c22		Lenovo boot partition
Windows
e3c9e316-0b5c-4db8-817d-f92df00215ae		Microsoft reserved
ebd0a0a2-b9e5-4433-87c0-68b6b72699c7		(Microsoft) Basic data
5808c8aa-7e8f-42e0-85d2-e1e90434cfb3		Logical Disk Manager (LDM) metadata partition
af9b60a0-1431-4f62-bc68-3311714a69ad		Logical Disk Manager data partition
de94bba4-06d1-4d40-a16a-bfd50179d6ac		Windows recovery environment
37affc90-ef7d-4e96-91c3-2d7ae055b174		IBM General Parallel File System (GPFS) partition
e75caf8f-f680-4cee-afa3-b001e56efc2d		Storage Spaces partition
HP-UX
75894c1e-3aeb-11d3-b7c1-7b03a0000000		Data partition
e2a1e728-32e3-11d6-a682-7b03a0000000		Service Partition
Linux
0fc63daf-8483-4772-8e79-3d69d8477de4		Linux filesystem data
a19d880f-05fc-4d3b-a006-743f0f84911e		RAID partition
44479540-f297-41b2-9af7-d131d5f0458a		Root partition (x86)
4f68bce3-e8cd-4db1-96e7-fbcaf984b709		Root partition (x86-64)
69dad710-2ce4-4e3c-b16c-21a1d49abed3		Root partition (32-bit ARM)
b921b045-1df0-41c3-af44-4c6f280d3fae		Root partition (64-bit ARM/AArch64)
0657fd6d-a4ab-43c4-84e5-0933c84b4f4f		Swap partition
e6d6d379-f507-44c2-a23c-238f2a3df928		Logical Volume Manager (LVM) partition
933ac7e1-2eb4-4f13-b844-0e14e2aef915		/home partition
3b8f8425-20e0-4f3b-907f-1a25a76f98e8		/srv (server data) partition
7ffec5c9-2d00-49b7-8941-3ea10a5586b7		Plain dm-crypt partition
ca7d7ccb-63ed-4c53-861c-1742536059cc		LUKS partition
8da63339-0007-60c0-c436-083ac8230908		Reserved
FreeBSD
83bd6b9d-7f41-11dc-be0b-001560b84f0f		Boot partition
516e7cb4-6ecf-11d6-8ff8-00022d09712b		Data partition
516e7cb5-6ecf-11d6-8ff8-00022d09712b		Swap partition
516e7cb6-6ecf-11d6-8ff8-00022d09712b		Unix File System (UFS) partition
516e7cb8-6ecf-11d6-8ff8-00022d09712b		Vinum volume manager partition
516e7cba-6ecf-11d6-8ff8-00022d09712b		ZFS partition
Darwin / Mac OS
48465300-0000-11aa-aa11-00306543ecac		Hierarchical File System Plus (HFS+) partition
7c3457ef-0000-11aa-aa11-00306543ecac		Apple APFS
55465300-0000-11aa-aa11-00306543ecac		Apple UFS container
6a898cc3-1dd2-11b2-99a6-080020736631		ZFS
52414944-0000-11aa-aa11-00306543ecac		Apple RAID partition
52414944-5f4f-11aa-aa11-00306543ecac		Apple RAID partition, offline
426f6f74-0000-11aa-aa11-00306543ecac		Apple Boot partition (Recovery HD)
4c616265-6c00-11aa-aa11-00306543ecac		Apple Label
5265636f-7665-11aa-aa11-00306543ecac		Apple TV Recovery partition
53746f72-6167-11aa-aa11-00306543ecac		Apple Core Storage (i.e. Lion FileVault) partition
b6fa30da-92d2-4a9a-96f1-871ec6486200		SoftRAID_Status
2e313465-19b9-463f-8126-8a7993773801		SoftRAID_Scratch
fa709c7e-65b1-4593-bfd5-e71d61de9b02		SoftRAID_Volume
bbba6df5-f46f-4a89-8f59-8765b2727503		SoftRAID_Cache
Solaris / illumos
6a82cb45-1dd2-11b2-99a6-080020736631		Boot partition
6a85cf4d-1dd2-11b2-99a6-080020736631		Root partition
6a87c46f-1dd2-11b2-99a6-080020736631		Swap partition
6a8b642b-1dd2-11b2-99a6-080020736631		Backup partition
6a898cc3-1dd2-11b2-99a6-080020736631		/usr partition
6a8ef2e9-1dd2-11b2-99a6-080020736631		/var partition
6a90ba39-1dd2-11b2-99a6-080020736631		/home partition
6a9283a5-1dd2-11b2-99a6-080020736631		Alternate sector
6a8d2ac7-1dd2-11b2-99a6-080020736631		Reserved partition
6a945a3b-1dd2-11b2-99a6-080020736631		Reserved partition
6a96237f-1dd2-11b2-99a6-080020736631		Reserved partition
6a9630d1-1dd2-11b2-99a6-080020736631		Reserved partition
6a980767-1dd2-11b2-99a6-080020736631		Reserved partition
NetBSD
49f48d32-b10e-11dc-b99b-0019d1879648		Swap partition
49f48d5a-b10e-11dc-b99b-0019d1879648		FFS partition
49f48d82-b10e-11dc-b99b-0019d1879648		LFS partition
49f48daa-b10e-11dc-b99b-0019d1879648		RAID partition
2db519c4-b10f-11dc-b99b-0019d1879648		Concatenated partition
2db519ec-b10f-11dc-b99b-0019d1879648		Encrypted partition
Chrome OS
fe3a2a5d-4f32-41a7-b725-accc3285a309		Chrome OS kernel
3cb8e202-3b7e-47dd-8a3c-7ff2a13cfcec		Chrome OS rootfs
2e0a753d-9e48-43b0-8337-b15192cb1b5e		Chrome OS future use
Container Linux by CoreOS
5dfbf5f4-2848-4bac-aa5e-0d9a20b745a6		/usr partition (coreos-usr)
3884dd41-8582-4404-b9a8-e9b84f2df50e		Resizable rootfs (coreos-resize)
c95dc21a-df0e-4340-8d7b-26cbfa9a03e0		OEM customizations (coreos-reserved)
be9067b9-ea49-4f15-b4f6-f36f8c9e1818		Root filesystem on RAID (coreos-root-raid)
Haiku
42465331-3ba3-10f1-802a-4861696b7521		Haiku BFS
MidnightBSD
85d5e45e-237c-11e1-b4b3-e89a8f7fc3a7		Boot partition
85d5e45a-237c-11e1-b4b3-e89a8f7fc3a7		Data partition
85d5e45b-237c-11e1-b4b3-e89a8f7fc3a7		Swap partition
0394ef8b-237e-11e1-b4b3-e89a8f7fc3a7		Unix File System (UFS) partition
85d5e45c-237c-11e1-b4b3-e89a8f7fc3a7		Vinum volume manager partition
85d5e45d-237c-11e1-b4b3-e89a8f7fc3a7		ZFS partition
Ceph
45b0969e-9b03-4f30-b4c6-b4b80ceff106		Journal
45b0969e-9b03-4f30-b4c6-5ec00ceff106		dm-crypt journal
4fbd7e29-9d25-41b8-afd0-062c0ceff05d		OSD
4fbd7e29-9d25-41b8-afd0-5ec00ceff05d		dm-crypt OSD
89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be		Disk in creation
89c57f98-2fe5-4dc0-89c1-5ec00ceff2be		dm-crypt disk in creation
cafecafe-9b03-4f30-b4c6-b4b80ceff106		Block
30cd0809-c2b2-499c-8879-2d6b78529876		Block DB
5ce17fce-4087-4169-b7ff-056cc58473f9		Block write-ahead log
fb3aabf9-d25f-47cc-bf5e-721d1816496b		Lockbox for dm-crypt keys
4fbd7e29-8ae0-4982-bf9d-5a8d867af560		Multipath OSD
45b0969e-8ae0-4982-bf9d-5a8d867af560		Multipath journal
cafecafe-8ae0-4982-bf9d-5a8d867af560		Multipath block
7f4a666a-16f3-47a2-8445-152ef4d03f6c		Multipath block
ec6d6385-e346-45dc-be91-da2a7c8b3261		Multipath block DB
01b41e1b-002a-453c-9f17-88793989ff8f		Multipath block write-ahead log
cafecafe-9b03-4f30-b4c6-5ec00ceff106		dm-crypt block
93b0052d-02d9-4d8a-a43b-33a3ee4dfbc3		dm-crypt block DB
306e8683-4fe2-4330-b7c0-00a917c16966		dm-crypt block write-ahead log
45b0969e-9b03-4f30-b4c6-35865ceff106		dm-crypt LUKS journal
cafecafe-9b03-4f30-b4c6-35865ceff106		dm-crypt LUKS block
166418da-c469-4022-adf4-b30afd37f176		dm-crypt LUKS block DB
86a32090-3647-40b9-bbbd-38d8c573aa86		dm-crypt LUKS block write-ahead log
4fbd7e29-9d25-41b8-afd0-35865ceff05d		dm-crypt LUKS OSD
OpenBSD
824cc7a0-36a8-11e3-890a-952519ad3f61		Data partition
QNX
cef5a9ad-73bc-4601-89f3-cdeeeee321a1		Power-safe (QNX6) file system
Plan 9
c91818f9-8025-47af-89d2-f030d7000c2c		Plan 9 partition
VMware ESX
9d275380-40ad-11db-bf97-000c2911d1b8		vmkcore (coredump partition)
aa31e02a-400f-11db-9590-000c2911d1b8		VMFS filesystem partition
9198effc-31c0-11db-8f78-000c2911d1b8		VMware Reserved
Android-IA
2568845d-2332-4675-bc39-8fa5a4748d15		Bootloader
114eaffe-1552-4022-b26e-9b053604cf84		Bootloader2
49a4d17f-93a3-45c1-a0de-f50b2ebe2599		Boot
4177c722-9e92-4aab-8644-43502bfd5506		Recovery
ef32a33b-a409-486c-9141-9ffb711f6266		Misc
20ac26be-20b7-11e3-84c5-6cfdb94711e9		Metadata
38f428e6-d326-425d-9140-6e0ea133647c		System
a893ef21-e428-470a-9e55-0668fd91a2d9		Cache
dc76dda9-5ac1-491c-af42-a82591580c0d		Data
ebc597d0-2053-4b15-8b64-e0aac75f4db1		Persistent
c5a0aeec-13ea-11e5-a1b1-001e67ca0c3c		Vendor
bd59408b-4514-490d-bf12-9878d963f378		Config
8f68cc74-c5e5-48da-be91-a0c8c15e9c80		Factory
9fdaa6ef-4b3f-40d2-ba8d-bff16bfb887b		Factory (alt)
767941d0-2085-11e3-ad3b-6cfdb94711e9		Fastboot / Tertiary
ac6d7924-eb71-4df8-b48d-e267b27148ff		OEM
Android 6.0+ ARM
19a710a2-b3ca-11e4-b026-10604b889dcf		Android Meta
193d1ea4-b3ca-11e4-b075-10604b889dcf		Android EXT
Open Network Install Environment (ONIE)
7412f7d5-a156-4b13-81dc-867174929325		Boot
d4e6e2cd-4469-46f3-b5cb-1bff57afc149		Config
PowerPC
9e1a2d38-c612-4316-aa26-8b49521e5a8b		PReP boot
freedesktop.org OSes (Linux, etc.)
bc13c2ff-59e6-4262-a352-b275fd6f7172		Shared boot loader configuration
Atari TOS
734e5afe-f61a-11e6-bc64-92361f002671		Basic data partition (GEM, BGM, F32)

Partition attribute flags

Offset	Size	Description
0.0	1 bit	Partition is required by the platform, e.g. an OEM partition
0.1	1 bit	EFI firmware should ignore the content of the partition
0.2	1 bit	Partition contains bootable legacy BIOS, equivalent to MBR active flag
0.3	45 bits	Unknown (Reserved)
6.0	16 bits	Flags specific to the partition type

Microsoft basic partition type attribute flags

Offset	Size	Description
7.4	1 bit	Partition is read-only
7.5	1 bit	Partition is a shadow copy (of another partition)
7.6	1 bit	Partition is hidden
7.7	1 bit	Partition should not have a drive letter assigned (no auto-mount)

ChromeOS partition type attribute flags

Offset	Size	Description
6.0	4 bits	Priority, where 15 is thehighest priority, 1 is the lowest and 0 indicates the partition is not bootable
6.4	4 bits	Number of tries to attempt to boot from the partition
7.0	1 bit	Partition was previously successfully booted from

Master Boot Record (MBR) partition table format

The Master Boot Record (MBR) partition table is mainly used on the family of Intel x86 based computers.

Overview

A MBR partition table consists of:

Master Boot Record (MBR)
Extended Partition Records (EPRs)

Characteristics

Characteristics	Description
Byte order	little-endian
Date and time values	N/A
Character strings	N/A

Terminology

Term	Description
Physical block	A fixed location on the storage media defined by the storage media
Logical block	An abstract location on the storage media defined by software

Sector size(s)

Traditionally the size of sector is 512 bytes, but modern hard disk drives use 4096 bytes. The linux fdisk utility supports sector sizes of: 512, 1024, 2048 and 4096.

The location of of the “boot signature” of the MBR does not indicate the sector size. Methods to derive the sector size from the data:

check the “boot signature” of the first EPR, if present
check the content of well known partition types

Cylinder Head Sector (CHS) address

The Cylinder Head Sector (CHS) address is 24 bits in size and consists of:

Offset	Size	Description
0.0	8 bits	Head
1.0	6 bits	Sector
1.5	10 bits	Cylinder

The logical block address (LBA) can be determined from the CHS with the following calculation:

lba = (((cylinder * heads_per_cylinder) + head) * sectors_per_track) + sector - 1

The Master Boot Record (MBR)

The Master Boot Record (MBR) is a data structure that describes the properties of the storage medium and its partitions.

The classical MBR can only contain 4 partition table entries. Additional partition entries must be stored using extended partition records (EPR). The classical MBR has evolved into different variants like:

The modern MBR
The Advanced Active Partitions (AAP) MBR
The NEWLDR MBR
The AST/NEC MS-DOS and SpeedStor MBR
The Disk Manager MBR

The classical MBR

The classical MBR is 512 bytes in size and consists of:

Offset	Size	Value	Description
0	446		The boot (loader) code
446	16		Partition table entry 1
462	16		Partition table entry 2
478	16		Partition table entry 3
494	16		Partition table entry 4
510	2	"\x55\xaa"	The (boot) signature

The modern MBR

The modern MBR is 512 bytes in size and consists of:

Offset	Size	Value	Description
0	218		The first part of the boot (loader) code
Disk timestamp used by Microsoft Windows 95, 98 and ME
218	2	0x0000	Unknown (Reserved)
220	1		Unknown (Original physical drive), which contains a value that ranges from 0x80 to 0xff, where 0x80 is the first drive, 0x81 the second, etc.
221	1		Seconds, which contains a value that ranges from 0 to 59
222	1		Minutes, which contains a value that ranges from 0 to 59
223	1		Hours, which contains a value that ranges from 0 to 23
Without disk identity
224	222		The second part of the boot (loader) code
With disk identity, used by UEFI, Microsoft Windows NT or later
224	216		The second part of the boot (loader) code
440	4		Disk identity (signature)
444	2	0x0000 or 0x5a5a	copy-protection marker
Common
446	16		Partition table entry 1
462	16		Partition table entry 2
478	16		Partition table entry 3
494	16		Partition table entry 4
510	2	"\x55\xaa"	The (boot) signature

The extended partition record

The extended partition record (EPR) (also referred to as extended boot record (EBR)) starts with a 64 byte (extended) partition record (EPR) like the MBR. This partition table contains information about the logical partition (volume) and additional extended partition tables.

Offset	Size	Value	Description
0	446	0x00	Unknown (Unused), which should contain zero bytes
446	16		Partition table entry 1
462	16		Partition table entry 2, which should contain an extended partition
478	16	0x00	Partition table entry 3, which should be unused and contain zero bytes
494	16	0x00	Partition table entry 4, which should be unused and contain zero bytes
510	2	"\x55\xaa"	Signature

The second partition entry contains an extended partition which points to the next EPR. The LBA addresses in the EPR are relative to the start of the first EPR.

The first EPR typically has a partition type of 0x05 but certain version of Windows are known to use a partition type 0x0f, such as Windows 98.

The partition table entry

The partition table entry is 16 bytes in size and consists of:

Offset	Size	Description
0	1	Partition flags
1	3	The partition start address, which contains a CHS relative from the start of the harddisk
4	1	Partition type
5	3	The partition end address, which contains a CHS relative from the start of the harddisk
8	4	The partition start address, which contains a LBA (sectors) relative from the start of the harddisk
12	4	Size of the partition in number of sectors

Partition flags

The partition flags consist of the following values:

Value	Identifier	Description
0x80		Partition is boot-able

Partition types

The partition types consist of the following values:

Value	Identifier	Description
0x00		Empty
0x01		FAT12 (CHS)
0x02		XENIX root
0x02		XENIX user
0x04		FAT16 (16 MiB -32 MiB CHS)
0x05		Extended (CHS)
0x06		FAT16 (32 MiB - 2 GiB CHS)
0x07		HPFS/NTFS
0x08		AIX
0x09		AIX bootable
0x0a		OS/2 Boot Manager
0x0b		FAT32 (CHS)
0x0c		FAT32 (LBA)

0x0e		FAT16 (32 MiB - 2 GiB LBA)
0x0f		Extended (LBA)
0x10		OPUS
0x11		Hidden FAT12 (CHS)
0x12		Compaq diagnostics

0x14		Hidden FAT16 (16 MiB - 32 MiB CHS)

0x16		Hidden FAT16 (32 MiB - 2 GiB CHS)
0x17		Hidden HPFS/NTFS
0x18		AST SmartSleep

0x1b		Hidden FAT32 (CHS)
0x1c		Hidden FAT32 (LBA)

0x1e		Hidden FAT16 (32 MiB - 2 GiB LBA)

0x24		NEC DOS

0x27		Unknown (PackardBell recovery/installation partition)

0x39		Plan 9

0x3c		PartitionMagic recovery

0x40		Venix 80286
0x41		PPC PReP Boot
0x42		SFS or LDM: Microsoft MBR (Dynamic Disk)

0x4d		QNX4.x
0x4e		QNX4.x 2nd part
0x4f		QNX4.x 3rd part
0x50		OnTrack DM
0x51		OnTrack DM6 Aux1
0x52		CP/M
0x53		OnTrack DM6 Aux3
0x54		OnTrackDM6
0x55		EZ-Drive
0x56		Golden Bow

0x5c		Priam Edisk

0x61		SpeedStor

0x63		GNU HURD or SysV
0x64		Novell Netware 286
0x65		Novell Netware 386

0x70		DiskSecure Multi-Boot

0x75		PC/IX

0x78		XOSL

0x80		Old Minix
0x81		Minix / old Linux
0x82		Solaris x86 or Linux swap
0x83		Linux
0x84		Hibernation or OS/2 hidden C: drive
0x85		Linux extended
0x86		NTFS volume set
0x87		NTFS volume set

0x8e		Linux LVM

0x93		Amoeba
0x94		Amoeba BBT

0x9f		BSD/OS
0xa0		IBM Thinkpad hibernation
0xa1		Hibernation

0xa5		FreeBSD
0xa6		OpenBSD
0xa7		NeXTSTEP
0xa8		Mac OS X
0xa9		NetBSD

0xab		Mac OS X Boot

0xaf		Mac OS X

0xb7		BSDI
0xb8		BSDI swap

0xbb		Boot Wizard hidden

0xc1		DRDOS/sec (FAT-12)

0xc4		DRDOS/sec (FAT-16 < 32M)

0xc6		DRDOS/sec (FAT-16)
0xc7		Syrinx

0xda		Non-FS data
0xdb		CP/M / CTOS / ...

0xde		Dell Utility
0xdf		BootIt

0xe1		DOS access

0xe3		DOS R/O
0xe4		SpeedStor

0xeb		BeOS

0xee		EFI GPT protective partition
0xef		EFI system partition (FAT)
0xf0		Linux/PA-RISC boot
0xf1		SpeedStor
0xf2		DOS secondary

0xf4		SpeedStor

0xfb		VMWare file system
0xfc		VMWare swap
0xfd		Linux RAID auto-detect
0xfe		LANstep
0xff		BBT

File system formats

A file system format is used to manage the storage of files.

Terminology

File entry (file system entry): an object that represent an element within the file system, such as a file or directory. A file system typically stores metadata of a file entry, such as the name, size, permissions, date and time values, and location of the content.
Data fork (or data stream): a file system object that represents the content of a file entry. NTFS and HFS support multiple data forks (or data streams) for an individual file entry.
Extended attribute: A file system object that represents additional (or extended) metadata of an individual file entry.
Reparse point: a file system object that redirects to another location or implementation (filter driver), such as Windows Overlay Filter (WOF) compression. NTFS and ReFS support reparse points.

Formats

Apple File System (APFS)
Extended File System (ext)
Extensible File Allocation Table (exFAT)
File Allocation Table (FAT)
Hierarchical File System (HFS)
Macintosh File System (MFS)
New Technologies File System (NTFS)

Apple File System (APFS)

TODO: add description

Apple File System Compression (decmpfs)

Hierarchical File System (HFS) and Apple File System (APFS) use Apple File System Compression (decmpfs) to compress file contents.

Overview

An Apple File System Compression (decmpfs) compressed file consists of:

an extended attribute named “com.apple.decmpfs”

Characteristics

Characteristics	Description
Byte order	little-endian

decmpfs extended attribute

The decmpfs extended attribute consists of:

decmpfs header
optional compressed data

decmpfs header

The decmpfs header is 16 bytes in size and consists of:

Offset	Size	Value	Description
0	4	"fpmc"	Signature
4	4		Compression method
8	8		Uncompressed data size

Note that the signature is likely stored in little-endian and represents “cmpf”.

Compression methods

Value	Identifier	Description
1	CMP_Type1	Unknown (uncompressed extended attribute data)

3		ZLIB (DEFLATE) compressed extended attribute data, where the compressed data is stored in the extended attribute after the compressed data header
4		64k chunked ZLIB (DEFLATE) compressed resource fork, where the compressed data is stored in the resource fork
5		Unknown (sparse compressed extended attribute data), where the uncompressed data contains 0-byte values
6		Unknown (unused)
7		LZVN compressed extended attribute data, where the compressed data is stored in the extended attribute after the compressed data header
8		64k chunked LZVN compressed resource fork, where the compressed data is stored in the resource fork
9		Unknown (uncompressed extended attribute data, different than CMP_Type1)
10		Unknown (64k chunked uncompressed data resource fork), where the compressed data is stored in the resource fork
11		LZFSE compressed extended attribute data, where the compressed data is stored in the extended attribute after the compressed data header
12		64k chunked LZFSE compressed resource fork, where the compressed data is stored in the resource fork

0x80000001		Unknown (faulting file)

Note that if the ZLIB (DEFLATE) compressed data starts with 0xff the data is stored uncompressed after the first compressed data byte.

Note that if the LZVN compressed data starts with 0x06 (end of stream oppcode) the data is stored uncompressed after the first compressed data byte.

Extended File System (ext) format

The Extended File System (ext) is one of the more common file system used in Linux.

There are multiple version of ext.

Version	Remarks
1	Introduced in April 1992
2	Introduced in January 1993
3	Introduced in November 2001, which featured journaling, dynamic growth and large directory indexing (HTree)
4	Introduces in October 2006 as unstable and becmae stable in October 2008, which featured extents and improved timestamps

Overview

An Extended File System (ext) consists of:

one or more block groups

Characteristics

Characteristics	Description
Byte order	little-endian, with the exception of UUID values that are stored in big-endian
Date and time values	number of seconds since January 1, 1970 00:00:00 (POSIX epoch), disregarding leap seconds. Or number of nanoseconds, when extra precision is enabled. Date and time values are stored in UTC
Character strings	UTF-8 or a narrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a system defined codepage

Block group

A block group consists of:

optional 1024 bytes of boot code or zero bytes (at offset: 0)
optional superblock
optional group descriptor table
block bitmap
inode bitmap
allocated and unallocated blocks

The primary superblock is stored at offset 1024 relative from the start of the volume. Backup superblocks are stored at offset 1024 relative from the start of the block group if block size <= 1024 or otherwise at offset 0 from the start of the block group.

The group descriptor table is stored in the block after the superblock.

An ext2 file system with revision 0 stores a copy at the start of every block group, along with backups of the group descriptor table. Later revisions reduce the number of backup copies by only putting backups in specific groups (sparse superblock feature EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER).

Not all values in a backup superblock and backup group descriptor tables match those of the primary superblock and group descriptor table.

Note that backup superblocks can be empty (filled with 0-byte values) or contain remnant data on an Android ext file system with sparse_super.

Flex block groups

Flex (or flexible) block groups are a set of block groups that treated as a single logical block group. Metadata such as the superblock, group descriptors, data block bitmaps spans the entire logical block group and not the individual block groups part of the set.

Meta block groups

Meta block groups (META_BG) are a set (or cluster) of block groups, for which its group descriptor structures can be stored in a single block.

The first meta block group value in the superblock indicates what the first

meta block group value is 256, and the number of group descriptors that can be stored in a single block 64, then the group descriptors for the block groups [0, 16383] are stored in the group descriptor table after the primary superblock and corresponding locations of backups.

Successive group descriptor tables, for example [16384, 16447], are stored in the first block group of a meta block group and backups in the second and last block groups of the meta block group.

Blocks

The volume is devided in blocks:

block offset = block number * block size

The block size is defined in the superblock.

Note that mke2fs indicates the maximum block size is 65536.

The superblock

The ext2 superblock

The ext2 superblock is 208 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Number of inodes
4	4		Number of blocks
8	4		Number of reserved blocks. Reserved blocks are used to prevent the file system from filling up
12	4		Number of unallocated blocks
16	4		Number of unallocated inodes
20	4		First data block number. The block number is relative from the start of the volume
24	4		Block size, which contains the number of bits to shift 1024 to the MSB (left)
28	4		Fragment size, which contains the number of bits to shift 1024 to the MSB (left)
32	4		Number of blocks per block group
36	4		Number of fragments per block group
40	4		Number of inodes per block group
44	4		Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
48	4		Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
52	2		The (current) mount count
54	2		Maximum mount count
56	2	"\x53\xef"	Signature
58	2		File system state flags
60	2		Error-handling status
62	2		Minor format revision
64	4		Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
68	4		Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
72	4		Creator operating system
76	4		Format revision
80	2		Reserved block owner (or user) identifier (UID)
82	2		Reserved block group identifier (GID)
Dynamic inode information, if major version is EXT2_DYNAMIC_REV
84	4		First non-reserved inode
88	2		Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024
90	2		Block group, which contains a block group number
92	4		Compatible feature flags
96	4		Incompatible feature flags
100	4		Read-only compatible feature flags
104	16		File system identifier, which contains a big-endian UUID
120	16		Volume label, which contains a narrow character string without end-of-string character
136	64		Last mount path, which contains a narrow character string without end-of-string character
200	4		Algorithm usage bitmap
Performance hints, if EXT2_COMPAT_PREALLOC is set
204	1		Number of pre-allocated blocks per file
205	1		Number of pre-allocated blocks per directory
206	2		Unknown (padding)

The ext3 superblock

The ext3 superblock is 336 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Number of inodes
4	4		Number of blocks
8	4		Number of reserved blocks. Reserved blocks are used to prevent the file system from filling up
12	4		Number of unallocated blocks
16	4		Number of unallocated inodes
20	4		First data block number. The block number is relative from the start of the volume
24	4		Block size, which contains the number of bits to shift 1024 to the MSB (left)
28	4		Fragment size, which contains the number of bits to shift 1024 to the MSB (left)
32	4		Number of blocks per block group
36	4		Number of fragments per block group
40	4		Number of inodes per block group, which can be 0 in combination with EXT3_FEATURE_INCOMPAT_JOURNAL_DEV
44	4		Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
48	4		Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
52	2		The (current) mount count
54	2		Maximum mount count
56	2	"\x53\xef"	Signature
58	2		File system state flags
60	2		Error-handling status
62	2		Minor format revision
64	4		Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
68	4		Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
72	4		Creator operating system
76	4		Format revision
80	2		Reserved block owner (or user) identifier (UID)
82	2		Reserved block group identifier (GID)
Dynamic inode information, if major version is EXT2_DYNAMIC_REV
84	4		First non-reserved inode
88	2		Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024
90	2		Block group, which contains a block group number
92	4		Compatible feature flags
96	4		Incompatible feature flags
100	4		Read-only compatible feature flags
104	16		File system identifier, which contains a big-endian UUID
120	16		Volume label, which contains a narrow character string without end-of-string character
136	64		Last mount path, which contains a narrow character string without end-of-string character
200	4		Algorithm usage bitmap
Performance hints, if EXT2_COMPAT_PREALLOC is set
204	1		Number of pre-allocated blocks per file
205	1		Number of pre-allocated blocks per directory
206	2		Unknown (padding)
Journalling support, if EXT3_FEATURE_COMPAT_HAS_JOURNAL is set
208	16		Journal identifier, which contains a big-endian UUID
224	4		Journal inode
228	4		Unknown (Journal device)
232	4		Unknown (Head of orphan inode list). The orphan inode list is a list of inodes to delete
236	4 x 4		hash-tree seed
252	1		Default hash version
253	1		Journal backup type
254	2		Group descriptor size
256	4		Default mount options
260	4		First meta block group (or metablock)
264	4		File system creation time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
268	17 x 4		Backup journal inodes

The ext4 superblock

The superblock is 1024 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Number of inodes
4	4		Number of blocks, which contains the lower 32-bit of the value
8	4		Number of reserved blocks, which contains the lower 32-bit of the value. Reserved blocks are used to prevent the file system from filling up
12	4		Number of unallocated blocks, which contains the lower 32-bit of the value
16	4		Number of unallocated inodes, which contains the lower 32-bit of the value
20	4		Root group block number. The block number is relative from the start of the volume
24	4		Block size, which contains the number of bits to shift 1024 to the most-significant-bit (MSB)
28	4		Fragment size, which contains the number of bits to shift 1024 to the most-significant-bit (MSB)
32	4		Number of blocks per block group
36	4		Number of fragments per block group
40	4		Number of inodes per block group, which can be 0 in combination with EXT4_FEATURE_INCOMPAT_JOURNAL_DEV
44	4		Last mount time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
48	4		Last written time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
52	2		The (current) mount count
54	2		Maximum mount count
56	2	"\x53\xef"	Signature
58	2		File system state flags
60	2		Error-handling status
62	2		Minor format revision
64	4		Last consistency check time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
68	4		Consistency check interval, which which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
72	4		Creator operating system
76	4		Format revision
80	2		Reserved block owner (or user) identifier (UID)
82	2		Reserved block group identifier (GID)
Dynamic inode information, if major version is EXT2_DYNAMIC_REV
84	4		First non-reserved inode
88	2		Inode size. Note that the inode size must be a power of 2 larger or equal to 128, the maximum supported by mke2fs is 1024
90	2		Block group
92	4		Compatible feature flags
96	4		Incompatible feature flags
100	4		Read-only compatible feature flags
104	16		File system identifier, which contains a big-endian UUID
120	16		Volume label, which contains a narrow character string without end-of-string character
136	64		Last mount path, which contains a narrow character string without end-of-string character
200	4		Algorithm usage bitmap
Performance hints, if EXT2_COMPAT_PREALLOC is set
204	1		Number of pre-allocated blocks per file
205	1		Number of pre-allocated blocks per directory
206	2		Unknown (padding)
Journalling support, if EXT3_FEATURE_COMPAT_HAS_JOURNAL is set
208	16		Journal identifier, which contains a big-endian UUID
224	4		Journal inode
228	4		Unknown (Journal device)
232	4		Unknown (Head of orphan inode list). The orphan inode list is a list of inodes to delete
236	4 x 4		hash-tree seed
252	1		Default hash version
253	1		Journal backup type
254	2		Group descriptor size
256	4		Default mount options
260	4		First meta block group (or metablock)
264	4		File system creation time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
268	17 x 4		Backup journal inodes
If 64-bit support (EXT4_FEATURE_INCOMPAT_64BIT) is enabled
336	4		Number of blocks, which contains the upper 32-bit of the value
340	4		Number of reserved blocks, which contains the upper 32-bit of the value
344	4		Number of unallocated blocks, which contains the upper 32-bit of the value
348	2		Minimum inode size
350	2		Reserved inode size
352	4		Miscellaneous flags
356	2		RAID stride
358	2		Multiple mount protection (MMP) update interval in seconds
360	8		Block for multi-mount protection
368	4		Unknown (blocks on all data disks (N*stride))
372	1		Number of block groups per flex block group, which is stored as: 2 ^ value
373	1		Checksum type
374	1		Unknown (encryption level)
375	1		Unknown (padding)
376	8		Unknown (s_kbytes_written)
384	4		Inode number of active snapshot
388	4		Identifier of active snapshot
392	8		Unknown (reserved s_snapshot_r_blocks_count)
400	4		Inode number of snapshot list head
404	4		Unknown (s_error_count)
408	4		Unknown (s_first_error_time)
412	4		Unknown (s_first_error_ino)
416	8		Unknown (s_first_error_block)
424	32		Unknown (s_first_error_func)
456	4		Unknown (s_first_error_line)
460	4		Unknown (s_last_error_time)
464	4		Unknown (s_last_error_ino)
468	4		Unknown (s_last_error_line)
472	8		Unknown (s_last_error_block)
480	32		Unknown (s_last_error_func)
512	64		Unknown (s_mount_opts)
576	4		Unknown (s_usr_quota_inum)
580	4		Unknown (s_grp_quota_inum)
584	4		Unknown (s_overhead_clusters)
588	2 x 4		Unknown (s_backup_bgs)
596	4		Unknown (s_encrypt_algos)
600	16		Unknown (s_encrypt_pw_salt)
616	4		Unknown (s_lpf_ino)
620	4		Unknown (s_prj_quota_inum)
624	4		Metadata checksum seed
628	1		Unknown (s_wtime_hi)
629	1		Unknown (s_mtime_hi)
630	1		Unknown (s_mkfs_time_hi)
631	1		Unknown (s_lastcheck_hi)
632	1		Unknown (s_first_error_time_hi)
633	1		Unknown (s_last_error_time_hi)
634	1		Unknown (s_first_error_errcode)
635	1		Unknown (s_last_error_errcode)
636	2		Unknown (s_encoding)
638	2		Unknown (s_encoding_flags)
640	4		Unknown (s_orphan_file_inum)
644	94 x 4 = 376		Unknown (reserved)
1020	4		Checksum

If checksum type is CRC-32C, the checksum is stored as 0xffffffff - CRC-32C.

Note that some versions of mkfs.ext set the file system creation time even for ext2 and when EXT3_FEATURE_COMPAT_HAS_JOURNAL is not set.

TODO: Is the only way to determine the file system version the compatibility and equivalent flags?

Checksum calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over the 1020 bytes of data of the suberblock.

Metadata checksum seed calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over:

the 16 byte file system identifier in the superblock

If EXT4_FEATURE_INCOMPAT_CSUM_SEED is set the metadata checksum seed value stored in the superblock should be used instead of calculating it based on the file system identifier.

If checksum type is CRC-32C, the metadata checksum seed is stored as 0xffffffff - CRC-32C.

File system state flags

Value	Identifier	Description
0x0001		Is clean
0x0002		Has errors
0x0004		Recovering orphan inodes

Error-handling status

Value	Identifier	Description
1		Continue
2		Remount as read-only
3		Panic

Creator operating system

Value	Identifier	Description
0		Linux
1		GNU Hurd
2		Masix
3		FreeBSD
4		Lites

Format revision

Value	Identifier	Description
0	EXT2_GOOD_OLD_REV	Original version with a fixed inode size of 128 bytes
1	EXT2_DYNAMIC_REV	Version with dynamic inode size support

Compatible feature flags

Value	Identifier	Description
0x00000001	EXT2_COMPAT_PREALLOC	Pre-allocate directory blocks, which is intended to reduce fragmentation
0x00000002	EXT2_FEATURE_COMPAT_IMAGIC_INODES	Has AFS server inodes
0x00000004	EXT3_FEATURE_COMPAT_HAS_JOURNAL	Has a journal
0x00000008	EXT2_FEATURE_COMPAT_EXT_ATTR	Has extended attributes
0x00000010	EXT2_FEATURE_COMPAT_RESIZE_INO, EXT2_FEATURE_COMPAT_RESIZE_INODE	Is resizeable, the file system has reserved GDT blocks for expansion, which also requires RO_COMPAT_SPARSE_SUPER
0x00000020	EXT2_FEATURE_COMPAT_DIR_INDEX	Has indexed directories
0x00000040	COMPAT_LAZY_BG	Unknown (Lazy block group)
0x00000080	COMPAT_EXCLUDE_INODE	Unknown (Exclude inode), which is not yet implemented and intended for a future file system snapshot feature
0x00000100	COMPAT_EXCLUDE_BITMAP	Unknown (Exclude bitmap), which is not yet implemented and intended for a future file system snapshot feature
0x00000200	EXT4_FEATURE_COMPAT_SPARSE_SUPER2	Has sparse superblock version 2
0x00000400	EXT4_FEATURE_COMPAT_FAST_COMMIT	Unknown (fast commit)
0x00000800	EXT4_FEATURE_COMPAT_STABLE_INODES	Unknown (stable inodes)
0x00001000	EXT4_FEATURE_COMPAT_ORPHAN_FILE	Has orphan file

Note that EXT2_FEATURE_COMPAT_, EXT3_FEATURE_COMPAT_, EXT4_FEATURE_COMPAT_ and COMPAT_ can be used interchangeably.

Incompatible feature flags

Value	Identifier	Description
0x00000001	EXT2_FEATURE_INCOMPAT_COMPRESSION	Has compression, which is not yet implemented
0x00000002	EXT2_FEATURE_INCOMPAT_FILETYPE	Directory entry has file type
0x00000004	EXT3_FEATURE_INCOMPAT_RECOVER	Needs recovery
0x00000008	EXT3_FEATURE_INCOMPAT_JOURNAL_DEV	Journal device
0x00000010	EXT2_FEATURE_INCOMPAT_META_BG	Has meta (or metadata) block groups

0x00000040	EXT4_FEATURE_INCOMPAT_EXTENTS	Has extents
0x00000080	EXT4_FEATURE_INCOMPAT_64BIT	Has 64-bit support, which supports more than 2^32 blocks
0x00000100	EXT4_FEATURE_INCOMPAT_MMP	Multiple mount protection
0x00000200	EXT4_FEATURE_INCOMPAT_FLEX_BG	Has flex (or flexible) block groups
0x00000400	EXT4_FEATURE_INCOMPAT_EA_INODE	Has large inodes, which are larger than 128 bytes

0x00001000	EXT4_FEATURE_INCOMPAT_DIRDATA	Data in directory entry, which is not yet implemented
0x00002000	EXT4_FEATURE_INCOMPAT_CSUM_SEED, EXT4_FEATURE_INCOMPAT_BG_USE_META_CSUM	Initial metadata checksum value (or seed) is stored in the superblock
0x00004000	EXT4_FEATURE_INCOMPAT_LARGEDIR	Large directory >2GB or 3-level hash tree (HTree)
0x00008000	EXT4_FEATURE_INCOMPAT_INLINE_DATA	Has data stored in inode
0x00010000	EXT4_FEATURE_INCOMPAT_ENCRYPT	Has encrypted inodes
0x00020000	EXT4_FEATURE_INCOMPAT_CASEFOLD	Hash case folding

Note that EXT2_FEATURE_INCOMPAT_, EXT3_FEATURE_INCOMPAT_, EXT4_FEATURE_INCOMPAT_ and INCOMPAT_ can be used interchangeably.

Read-only compatible feature flags

Value	Identifier	Description
0x00000001	EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER	Has sparse superblocks and group descriptor tables. If set a superblock is stored in block groups 0, 1 and those that are powers of 3, 5 and 7. If not set a superblock is stored in every block group
0x00000002	EXT2_FEATURE_RO_COMPAT_LARGE_FILE	Contains large files
0x00000004	EXT2_FEATURE_RO_COMPAT_BTREE_DIR	Intended for hash-tree directory (or directory B-tree), which is not yet implemented
0x00000008	EXT4_FEATURE_RO_COMPAT_HUGE_FILE	Has huge file support
0x00000010	EXT4_FEATURE_RO_COMPAT_GDT_CSUM	Has group descriptors with checksums
0x00000020	EXT4_FEATURE_RO_COMPAT_DIR_NLINK	The ext3 32000 subdirectory limit does not apply. A directory's number of links will be set to 1 if it is incremented past 64999
0x00000040	EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE	Has large inodes. The size of an inode can be larger than 128 bytes
0x00000080	EXT4_FEATURE_RO_COMPAT_HAS_SNAPSHOT	Has snapshots, which is not yet implemented and intended for a future file system snapshot feature
0x00000100	EXT4_FEATURE_RO_COMPAT_QUOTA	Quota is handled transactionally with the journal
0x00000200	EXT4_FEATURE_RO_COMPAT_BIGALLOC	Has big block allocation bitmaps. Block allocation bitmaps are tracked in units of clusters (of blocks) instead of blocks
0x00000400	EXT4_FEATURE_RO_COMPAT_METADATA_CSUM	File system metadata has checksums
0x00000800	EXT4_FEATURE_RO_COMPAT_REPLICA	Supports replicas
0x00001000	EXT4_FEATURE_RO_COMPAT_READONLY	Read-only file system image
0x00002000	EXT4_FEATURE_RO_COMPAT_PROJECT	File system tracks project quotas
0x00004000	EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS	File system has (read-only) shared blocks
0x00008000	EXT4_FEATURE_RO_COMPAT_VERITY	Unknown (Verity inodes may be present on the filesystem)
0x00010000	EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT	Orphan file may be non-empty

EXT2_FEATURE_RO_COMPAT_, EXT3_FEATURE_RO_COMPAT_, EXT4_FEATURE_RO_COMPAT_ and RO_COMPAT_ are used interchangeably.

Note that in some ext file systems used by ChromeOS it has been observed that the upper 8-bits of the read-only compatible feature flags are set as in 0xff000003. debugfs identifies these as FEATURE_R24 - FEATURE_R31.

Checksum types

Value	Identifier	Description
1	EXT4_CRC32C_CHKSUM	CRC-32C (or CRC32-C), which uses the Castagnoli polynomial (0x1edc6f41)

The group descriptor table

The group descriptor table is stored in the block following the superblock.

The group descriptor table consist of:

one or more group descriptors

The ext2 and ext3 group descriptor

The ext2 and ext3 group descriptor is 32 bytes in size and consists of:

Offset	Size	Description
0	4	Block bitmap block number. The block number is relative from the start of the volume
4	4	Inode bitmap block number. The block number is relative from the start of the volume
8	4	Inode table block number. The block number is relative from the start of the volume
12	2	Number of unallocated blocks
14	2	Number of unallocated inodes
16	2	Number of directories
18	2	Unknown (padding)
20	3 x 4	Unknown (reserved)

Note that it has been observed that implementations that support ext4 can set a value in the padding. It is currently assumed that this value contains block group flags.

The ext4 group descriptor

The ext4 group descriptor is 68 bytes in size and consists of:

Offset	Size	Description
0	4	Block bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
4	4	Inode bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
8	4	Inode table block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
12	2	Number of unallocated blocks, which contains the lower 16-bit of the value
14	2	Number of unallocated inodes, which contains the lower 16-bit of the value
16	2	Number of directories, which contains the lower 16-bit of the value
18	2	Block group flags
20	4	Exclude bitmap block number, which contains the lower 32-bit of the value. The block number is relative from the start of the volume
24	2	Block bitmap checksum, which contains the lower 16-bit of the value
26	2	Inode bitmap checksum, which contains the lower 16-bit of the value
28	2	Number of unused inodes, which contains the lower 16-bit of the value
30	2	Checksum
If 64-bit support (EXT4_FEATURE_INCOMPAT_64BIT) is enabled and group descriptor size > 32
32	4	Block bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
36	4	Inode bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
40	4	Inode table block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
44	2	Number of unallocated blocks, which contains the upper 16-bit of the value
46	2	Number of unallocated inodes, which contains the upper 16-bit of the value
48	2	Number of directories, which contains the upper 16-bit of the value
50	2	Number of unused inodes, which contains the upper 16-bit of the value
52	4	Exclude bitmap block number, which contains the upper 32-bit of the value. The block number is relative from the start of the volume
56	2	Block bitmap checksum, which contains the upper 16-bit of the value
60	2	Inode bitmap checksum, which contains the upper 16-bit of the value
64	4	Unknown (padding)

If checksum type is CRC-32C, the checksum is stored as the lower 16-bits of 0xffffffff - CRC-32C, otherwise the checksum is stored as a CRC-16.

Checksum calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated over:

the 16 byte file system identifier in the superblock
the group number as a 32-bit little-endian integer
the data of the group descriptor with the checksum set to 0-byte values

TODO: describe the block bitmap checksum calculation: crc32c(s_uuid+grp_num+bbitmap)

TODO: describe the inode bitmap checksum calculation: crc32c(s_uuid+grp_num+ibitmap)

Block group flags

Value	Identifier	Description
0x0001	EXT4_BG_INODE_UNINIT	The inode table and bitmap are not initialized
0x0002	EXT4_BG_BLOCK_UNINIT	The block bitmap is not initialized
0x0004	EXT4_BG_INODE_ZEROED	The inode table is filled with 0

Direct and indirect blocks

Direct blocks are blocks that part of the data stream of a file entry.

A direct block number is 0 that is part of the data stream represents a sparse data block.

Indirect blocks are blocks that refer to blocks containing direct or indirect block numbers. There are multiple levels of indirect block:

indirect blocks (level 1), that refer to direct blocks
double indirect blocks (level 2), that refer to indirect blocks
triple indirect blocks (level 3), that refer to double indirect blocks

An indirect block number is 0 that is part of the data stream represents sparse data blocks.

Extents

Extents were introduced in ext4 and are controlled by EXT4_FEATURE_INCOMPAT_EXTENTS.

Extents form an extent B-Tree, where:

extent indexes are stored in the branch nodes and
extent descriptors are stored in the leaf nodes.

An extents B-tree node consists of:

extents header
extents entries
extents footer

Note that inodes can have an implicit last sparse extent if the the inode data size is greater than the total data size defined by the extent descriptors.

The ext4 extents header

The ext4 extents header (ext4_extent_header) is 12 bytes in size and consists of:

Offset	Size	Value	Description
0	2	"\x0a\xf3"	Signature
2	2		Number of entries
4	2		Maximum number of entries
6	2		Depth, where 0 reprensents a leaf node and 1 to 5 different levels of branch nodes
8	4		Generation, which is used by Lustre, but not by standard ext4

The ext4 extent descriptor

The ext4 extent descriptor (ext4_extent) is 12 bytes in size and consists of:

Offset	Size	Description
0	4	Logical block number
4	2	Number of blocks
6	2	Upper 16-bits of physical block number
8	4	Lower 32-bits of physical block number

If number of blocks > 32768 the extent is considered “uninitialized” which is (as far as currently known) comparable to extent being sparse. The number of blocks of the sparse extent can be determined as following:

sparse_number_of_blocks = number_of_blocks - 32768

Sparse extents can exist between the extent descriptors. In such a case the logical block number will not align with the information from the previous extent descriptors.

Note that the native Linux ext implementation expects the extents to be stored in order of logical block number.

The ext4 extents index

The ext4 extent index (ext4_extent_idx) is 12 bytes in size and consists of:

Offset	Size	Description
0	4	Logical block number, which contains the first logical block number of next depth extents block
4	4	Lower 32-bits of physical block number, which contains the block number of the next depth extents block
8	2	Upper 16-bits of physical block number, which contains the block number of the next depth extents block
10	2	Unknown (unused)

The ext4 extents footer (ext4_extent_tail) is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Checksum of an extents block, which contains a CRC32

The inode

The size of the inode is defined in the superblock when dynamic inode information is present.

Note that the ext4 inode format can be used on ext2 formatted file system. This was observed in combination with format revision 1 and inode size > 128 created by mkfs.ext2.

The ext2 inode

The ext2 inode is 128 bytes in size and consists of:

Offset	Size	Description
0	2	File mode, which contains file type and permissions
2	2	Lower 16-bits of owner (or user) identifier (UID)
4	4	Data size
8	4	(last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
12	4	(last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
16	4	(last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
20	4	Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
24	2	Lower 16-bits of group identifier (GID)
26	2	Number of (hard) links
28	4	Numer of blocks
32	4	Flags
36	4	Unknown (reserved)
40	12 x 4	Array of direct block numbers. A block number is relative from the start of the volume
88	4	Indirect block number. A block number is relative from the start of the volume
92	4	Double indirect block number. A block number is relative from the start of the volume
96	4	Triple indirect block number. A block number is relative from the start of the volume
100	4	NFS generation number
104	4	File ACL (or extended attributes) block number
108	4	Unknown (Directory ACL)
112	4	Fragment block address
116	1	Fragment block index
117	1	Fragment size
118	2	Unknown (padding)
120	2	Upper 16-bits of owner (or user) identifier (UID)
122	2	Upper 16-bits of group identifier (GID)
124	4	Unknown (reserved)

Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.

If the inode size is larger than 128 bytes, the additional data can be stored using an ext4 inode extension.

The ext3 inode

The ext3 inode is 132 bytes in size and consists of:

Offset	Size	Description
0	2	File mode, which contains file type and permissions
2	2	Lower 16-bits of owner (or user) identifier (UID)
4	4	Data size
8	4	(last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
12	4	(last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
16	4	(last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
20	4	Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
24	2	Lower 16-bits of group identifier (GID)
26	2	Number of (hard) links
28	4	Numer of blocks
32	4	Flags
36	4	Unknown (reserved)
40	12 x 4	Array of direct block numbers. A block number is relative from the start of the volume
88	4	Indirect block number. A block number is relative from the start of the volume
92	4	Double indirect block number. A block number is relative from the start of the volume
96	4	Triple indirect block number. A block number is relative from the start of the volume
100	4	NFS generation number
104	4	File ACL (or extended attributes) block number
108	4	Unknown (Directory ACL)
112	4	Fragment block address
116	1	Fragment block index
117	1	Fragment size
118	2	Unknown (padding)
120	2	Upper 16-bits of owner (or user) identifier (UID)
122	2	Upper 16-bits of group identifier (GID)
124	4	Unknown (reserved)
Extension (if inode size > 128)
128	2	Extended inode size
130	2	Unknown (padding)

Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.

If the inode size is larger than 128 bytes, the additional data can be stored using an ext4 inode extension.

The ext4 inode

The ext4 inode is 160 bytes in size and consists of:

Offset	Size	Description
0	2	File mode, which contains file type and permissions
2	2	Lower 16-bits of owner (or user) identifier (UID)
4	4	Lower 32-bits of data size
If EXT4_EA_INODE_FL is not set
8	4	(last) access time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
12	4	(last) inode change (or modification) time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
16	4	(last) content modification time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
If EXT4_EA_INODE_FL is set
8	4	Unknown (extended attribute value data checksum)
12	4	Unknown (lower 32-bits of extended attribute reference count)
16	4	Unknown (inode number that owns the extended attribute)
Common
20	4	Deletion time, which contains the number of seconds since January 1, 1970 00:00:00 UTC (POSIX epoch)
24	2	Lower 16-bits of group identifier (GID)
26	2	Number of (hard) links
28	4	Lower 32-bits of number of blocks
32	4	Flags
If EXT4_EA_INODE_FL is not set
36	4	Lower 32-bits of version
If EXT4_EA_INODE_FL is set
36	4	Unknown (upper 32-bits of extended attribute reference count)
If EXT4_EXTENTS_FL and EXT4_INLINE_DATA_FL are not set
40	12 x 4	Array of direct block numbers. A block number is relative from the start of the volume
88	4	Indirect block number. A block number is relative from the start of the volume
92	4	Double indirect block number. A block number is relative from the start of the volume
96	4	Triple indirect block number. A block number is relative from the start of the volume
If EXT4_EXTENTS_FL is set
40	12	Extents header
52	4 x 12	extent descriptors or extents indexes
If EXT4_INLINE_DATA_FL is set
40	60	File content data
Common
100	4	NFS generation number
104	4	Lower 32-bits of file ACL (or extended attributes) block number
108	4	Upper 32-bits of data size
112	4	Fragment block address
116	2	Upper 16-bits of number of blocks
118	2	Upper 16-bits of file ACL (or extended attributes) block number
120	2	Upper 16-bits of owner (or user) identifier (UID)
122	2	Upper 16-bits of group identifier (GID)
124	2	Lower 16-bits of checksum
126	2	Unknown (reserved)
Extension (if inode size > 128)
128	2	Extended inode size, which can vary, values of 4, 28 and 32 have been observed
130	2	Upper 16-bits of checksum
132	4	(last) inode change (or modification) time extra precision
136	4	(last) content modification time extra precision
140	4	(last) access time extra precision
144	4	Creation time
148	4	Creation time extra precision
152	4	Upper 32-bits of version
156	4	Unknown (i_projid)

If checksum type is CRC-32C, the checksum is stored as 0xffffffff - CRC-32C.

Note that for a character and block device the first 2 bytes of the array of direct block numbers contain the minor and major device number respectively.

Checksum calculation

If checksum type is CRC-32C, the CRC32-C algorithm with the Castagnoli polynomial (0x1edc6f41) and initial value of 0 is used to calculate the checksum.

The checksum is calculated from:

the 16 byte file system identifier in the superblock
the inode number as a 32-bit little-endian integer
the NFS generation number in the inode as a 32-bit little-endian integer
the data of the inode with the lower and upper part of the checksum set to 0-byte values.

Extra precision

The ext4 extra precision is 4 bytes in size and consists of:

Offset	Size	Value	Description
0.0	2 bits		Extra epoch value
0.2	30 bits		Fraction of second in nanoseconds

The 34 bits extra precision timestamp (in number of seconds) can be calculated as following:

extra_precision_timestamp = (extra_epoch_value * 0x100000000) + timestamp

Notes

It has been observed that when EXT4_EA_INODE_FL is set the (last) modification time can contain a valid timestamp.

According to The Linux Kernel documentation

For backward compatibility with older versions of this feature, the i_mtime/i_generation may store a back-reference to the inode number and i_generation of the one owning inode (in cases where the EA inode is not referenced by multiple inodes) to verify that the EA inode is the correct one being accessed.

File mode

Value	Identifier	Description
Access other, Bitmask: 0x0007 (S_IRWXO)
0x0001	S_IXOTH	X-access for other
0x0002	S_IWOTH	W-access for other
0x0004	S_IROTH	R-access for other
Access group, Bitmask: 0x0038 (S_IRWXG)
0x0008	S_IXGRP	X-access for group
0x0010	S_IWGRP	W-access for group
0x0020	S_IRGRP	R-access for group
Access owner (or user), Bitmask: 0x01c0 (S_IRWXU)
0x0040	S_IXUSR	X-access for owner (or user)
0x0080	S_IWUSR	W-access for owner (or user)
0x0100	S_IRUSR	R-access for owner (or user)
Other
0x0200	S_ISTXT	Sticky bit
0x0400	S_ISGID	Set group identifer (GID) on execution
0x0800	S_ISUID	Set owner (or user) identifer (UID) on execution
Type of file, Bitmask: 0xf000 (S_IFMT)
0x1000	S_IFIFO	Named pipe (FIFO)
0x2000	S_IFCHR	Character device
0x4000	S_IFDIR	Directory
0x6000	S_IFBLK	Block device
0x8000	S_IFREG	Regular file
0xa000	S_IFLNK	Symbolic link
0xc000	S_IFSOCK	Socket

Inode flags

Value	Identifier	Description
0x00000001	EXT2_SECRM_FL, EXT3_SECRM_FL, EXT4_SECRM_FL, EXT4_INODE_SECRM	Secure deletion
0x00000002	EXT2_UNRM_FL, EXT3_UNRM_FL, EXT4_UNRM_FL, EXT4_INODE_UNRM	Undelete
0x00000004	EXT2_COMPR_FL, EXT3_COMPR_FL, EXT4_COMPR_FL, EXT4_INODE_COMPR	Compressed file, which is not yet implemented
0x00000008	EXT2_SYNC_FL, EXT3_SYNC_FL, EXT4_SYNC_FL, EXT4_INODE_SYNC	Synchronous updates
0x00000010	EXT2_IMMUTABLE_FL, EXT3_IMMUTABLE_FL, EXT4_IMMUTABLE_FL, EXT4_INODE_IMMUTABLE	Immutable file
0x00000020	EXT2_APPEND_FL, EXT3_APPEND_FL, EXT4_APPEND_FL, EXT4_INODE_APPEND	Writes to file may only append
0x00000040	EXT2_NODUMP_FL, EXT3_NODUMP_FL, EXT4_NODUMP_FL, EXT4_INODE_NODUMP	Do not remove (or dump) file
0x00000080	EXT2_NOATIME_FL, EXT3_NOATIME_FL, EXT4_NOATIME_FL, EXT4_INODE_NOATIME	Do not update access time (atime)
0x00000100	EXT2_DIRTY_FL, EXT3_DIRTY_FL, EXT4_DIRTY_FL, EXT4_INODE_DIRTY	Dirty compressed file, which is not yet implemented
0x00000200	EXT2_COMPRBLK_FL, EXT3_COMPRBLK_FL, EXT4_COMPRBLK_FL, EXT4_INODE_COMPRBLK	One or more compressed clusters, which is not yet implemented
0x00000400	EXT2_NOCOMP_FL, EXT3_NOCOMP_FL, EXT4_NOCOMPR_FL, EXT4_INODE_NOCOMPR	Do not compress, which is not yet implemented
ext2 and ext3
0x00000800	EXT2_ECOMPR_FL, EXT3_ECOMPR_FL	Encrypted Compression error
ext4
0x00000800	EXT4_ENCRYPT_FL, EXT4_INODE_ENCRYPT	Encrypted file
Common
0x00001000	EXT2_BTREE_FL, EXT2_INDEX_FL, EXT3_INDEX_FL, EXT4_INDEX_FL, EXT4_INODE_INDEX	Hash-indexed directory (previously referred to as B-tree format)
0x00002000	EXT2_IMAGIC_FL, EXT3_IMAGIC_FL, EXT4_IMAGIC_FL, EXT4_INODE_IMAGIC	AFS directory
0x00004000	EXT2_JOURNAL_DATA_FL, EXT3_JOURNAL_DATA_FL, EXT4_JOURNAL_DATA_FL, EXT4_INODE_JOURNAL_DATA	File data must be written using the journal
0x00008000	EXT2_NOTAIL_FL, EXT3_NOTAIL_FL, EXT4_NOTAIL_FL, EXT4_INODE_NOTAIL	File tail should not be merged, which is not used by ext4
0x00010000	EXT2_DIRSYNC_FL, EXT3_DIRSYNC_FL, EXT4_DIRSYNC_FL, EXT4_INODE_DIRSYNC	Directory entries should be written synchronously (dirsync)
0x00020000	EXT2_TOPDIR_FL, EXT3_TOPDIR_FL, EXT4_TOPDIR_FL, EXT4_INODE_TOPDIR	Top of directory hierarchy
ext4
0x00040000	EXT4_HUGE_FILE_FL, EXT4_INODE_HUGE_FILE	Is a huge file
0x00080000	EXT4_EXTENTS_FL, EXT4_INODE_EXTENTS	Inode uses extents
0x00100000	EXT4_INODE_VERITY	Verity protected inode
0x00200000	EXT4_EA_INODE_FL, EXT4_INODE_EA_INODE	Inode used for large extended attribute
0x00400000	EXT4_EOFBLOCKS_FL, EXT4_INODE_EOFBLOCKS	Blocks allocated beyond EOF

0x01000000	EXT4_SNAPFILE_FL	Inode is a snapshot
0x02000000	EXT4_INODE_DAX	Inode is direct-access (DAX)
0x04000000	EXT4_SNAPFILE_DELETED_FL	Snapshot is being deleted
0x08000000	EXT4_SNAPFILE_SHRUNK_FL	Snapshot shrink has completed
0x10000000	EXT4_INLINE_DATA_FL, EXT4_INODE_INLINE_DATA	Inode has inline data
0x20000000	EXT4_PROJINHERIT_FL, EXT4_INODE_PROJINHERIT	Create sub file entries with the same project identifier
0x40000000	EXT4_INODE_CASEFOLD	Casefolded directory
0x80000000	EXT4_INODE_RESERVED	Unknown (reserved)

Reserved inode numbers

Value	Identifier	Description
1	EXT2_BAD_INO, EXT3_BAD_INO, EXT4_BAD_INO	Bad blocks inode
2	EXT2_ROOT_INO, EXT3_ROOT_INO, EXT4_ROOT_INO	Root inode
3	EXT4_USR_QUOTA_INO	Owner (or user) quota inode
4	EXT4_GRP_QUOTA_INO	Group quota inode
5	EXT2_BOOT_LOADER_INO, EXT3_BOOT_LOADER_INO, EXT4_BOOT_LOADER_INO	Boot loader inode
6	EXT2_UNDEL_DIR_INO, EXT3_UNDEL_DIR_INO, EXT4_UNDEL_DIR_INO	Undelete directory inode
7	EXT3_RESIZE_INO, EXT4_RESIZE_INO	Reserved group descriptors inode
8	EXT3_JOURNAL_INO, EXT4_JOURNAL_INO	Journal inode

Inline data

ext4 supports storing file entry data inline when the inode flag EXT4_INLINE_DATA_FL is set.

Note that inodes can have an implicit last sparse extent if the the inode data size is greater than 60 bytes.

Huge files

TODO: complete section

Directory entries

Directories entries are stored in the data blocks of a directory inode. The directory entries can be stored in multiple ways:

as linear directory entries
as inline data directory entries
as hash-tree directory entries

Linear directory entries

Linear directories entries are stored in a series of allocation blocks.

Linear directory entries contain:

directory entry for “.” (self)
directory entry for “..” (parent)
directory entry for other file system entries

The directory entry

The directory entry is of variable size, at most 263 bytes, and consists of:

Offset	Size	Description
0	4	Inode number
4	2	Directory entry size, which must be a multitude of 4
6	1	Name size, which contains the size of the name without the end-of-string character and has a maximum of 255
7	1	File type
8	...	Name, which contains a narrow character string without end-of-string character

Older directory entry structures considered the name size a 16-bit value, but the upper byte was never used.

The name can contain any character value except the path segment separator (‘/’) and the NUL-character (‘\0’).

File types

Value	Identifier	Description
0	EXT2_FT_UNKNOWN	Unknown
1	EXT2_FT_REG_FILE	Regular file
2	EXT2_FT_DIR	Directory
3	EXT2_FT_CHRDEV	Character device
4	EXT2_FT_BLKDEV	Block device
5	EXT2_FT_FIFO	FIFO queue
6	EXT2_FT_SOCK	Socket
7	EXT2_FT_SYMLINK	Symbolic link

Inline data directory entries

ext4 supports storing the directory entries as inline data when the inode flag EXT4_INLINE_DATA_FL is set.

The inline data directory entries is of variable size, at most 60 bytes, and consists of:

Offset	Size	Value	Description
0	4		Parent inode number
4	...		Array of directory entries

Hash tree directory entries

The data of the hash tree (HTree) is stored in the data blocs or extent defined by the directory inode. The hash-indexed directory entries are read-compatible with the linear directory entry.

Hash tree root

The hash tree root consists of:

dx_root
- directory entry for “.” (self)
- directory entry for “..” (parent)
- dx_root_info
- Array of dx_entry
directory entry for other file system entries

dx_root_info

Offset	Size	Value	Description
0	4	0	Unknown (reserved)
4	1		Hash method (or version)
5	1	8	Root information size
6	1		Number of indirect levels in the hash tree
7	1		Unknown (unused flags)

dx_entry

TODO: complete section

struct dx_entry
{
        __le32 hash;
        __le32 block;
};

Symbolic links

If the target path of a symbolic link is less than 60 characters long, it is stored in the 60 bytes in the inode that are normally used for the 12 direct and 3 indirect block numbers. If the target path is longer than 60 characters, a block is allocated, and the block contains the target path. The inode data size contains the length of the target path.

Extended attributes

Extended attributes can be stored:

in the inode block after the inode data
in the block referenced by the file ACL (or extended attributes) block number, if not 0

Note that both should be read to get the all the extended attributes.

Extended attributes consists of:

An extended attributes header
Extended attributes entries with a terminator

The extended attributes inode header

The extended attributes inode header (ext2_xattr_ibody_header, ext3_xattr_ibody_header, ext4_xattr_ibody_header) is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4	"\x00\x00\x02\xea"	Signature

The extended attributes block header

The ext2 and ext3 extended attributes block header

The ext2 and ext3 extended attributes block header (ext2_xattr_header, ext3_xattr_header) is 32 bytes in size and consists of:

Offset	Size	Value	Description
0	4	"\x00\x00\x02\xea"	Signature
4	4		Unknown (reference count)
8	4		Number of blocks
12	4		Attributes hash
16	4 x 4		Unknown (reserved)

The ext4 extended attributes block header

The ext4 extended attributes block header (ext4_xattr_header) is 32 bytes of size and consists of:

Offset	Size	Value	Description
0	4	"\x00\x00\x02\xea"	Signature
4	4		Unknown (Reference count)
8	4		Number of blocks
12	4		Attributes hash
16	4		Checksum
20	3 x 4		Unknown (reserved)

The extended attributes entry

The extended attributes entry (ext2_xattr_entry, ext3_xattr_entry, ext4_xattr_entry) is of variable size and consists of:

Offset	Size	Description
0	1	Name size, which contains the size of the name without the end-of-string character
1	1	Name index
2	2	Value data offset, which contains the offset of the value data relative from the start of the extended attributes block or after the extended attributes signature in the inode block data
4	4	Value data inode number, which contains the inode number that contains the value data or 0 to indicate the current block
8	4	Value data size
12	4	Unknown (Attribute hash)
16	...	Name string, which contains an ASCII string without end-of-string character and can be empty, for example in combination with a prefix or with an encrypted file
...	...	32-bit alignment padding

The last extended attributes entry has the first 4 values set to 0 (8 bytes) and is used as a terminator.

Note that some implementations of older Android versions of ext appear to only set the first 4 bytes to 0 for the terminator.

The extended attribute name index

The name index indicates the prefix of the extended attribute name.

Name index	Name prefix	Description
0	""	No prefix
1	"user."
2	"system.posix_acl_access"
3	"system.posix_acl_default"
4	"trusted."

6	"security."
7	"system."
8	"system.richacl"

Journal

The journal was introduced in ext3.

TODO: complete section

Exclude bitmap

TODO: complete section

Note that the excluded bitmap is used for snapshots.

Corruption scenarios

File entry with invalid extents header signature

File content inaccessible but file entry metadata and extended attributes accessible.

References

ext4 Data Structures and Algorithms, by the Linux kernel documentation

Extensible File Allocation Table (exFAT) file system format

The Extensible File Allocation Table (exFAT) file system format is a successor of the File Allocation Table (FAT) file system format.

Overview

An exFAT file system consists of:

One or more reserved sectors
- a boot record (or boot sector)
One or more cluster block allocation tables
File and directory data

Characteristics

Characteristics	Description
Byte order	little-endian
Date and time values	FAT date and time
Character strings	UCS-2 little-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00"

Boot record

The boot record is stored in the first sector of the volume.

The boot record is at least 512 bytes in size and consists of:

Offset	Size	Value	Description
0	3	"\xeb\x76\x90"	Boot entry point (JMP +120, NOP)
3	8	"EXFAT\x20\x20\x20"	File system signature (or OEM name)
11	53	0	Unknown (reserved), which must be 0
64	8		Partition offset
72	8		Total number of sectors
80	4		Cluster block allocation table start sector
84	4		Cluster block allocation table size in number of sectors, which must be non 0
88	4		Data cluster start sector
92	4		Total number of data clusters
96	4		Root directory start cluster
100	4		Volume serial number
104	1		Format revision minor number
105	1	1	Format revision major number
106	2		Volume flags
108	1		Bytes per sector, which is stored as 2^n, for example 9 is 2^9 = 512. The bytes per sector value must be 512, 1024, 2048 or 4096
109	1		Sectors per cluster block, which is stored as 2^n, for example 3 is 2^3 = 8. The sectors per cluster block must be 1 upto 32M (2^25)
110	1		Number of cluster block allocation tables
111	1		Drive number
112	1		Unknown (percent in use), which contains the percentage of allocated cluster blocks in the cluster heap of 0xff if not available
113	7		Unknown (reserved)
120	390		Used for boot code
510	2	"\x55\xaa"	Sector signature

Volume flags

Value	Identifier	Description
0x0001	ActiveFat	Active FAT, 0 for the first FAT, 1 for the second FAT
0x0002	VolumeDirty	Is dirty
0x0004	MediaFailure	Has media failures
0x0008	ClearToZero	Must be cleared
0xfff0		Unknown (reserved)

Cluster block allocation table

A cluster block allocation table consists of:

One ore more cluster block allocation table entries

Cluster block allocation table entry

A cluster block allocation table entry is 32 bits in size and consists of:

Offset	Size	Value	Description
0	32 bits		Data cluster number

Where the data cluster number has the following meanings:

Value(s)	Description
0x00000000	Unused (free) cluster
0x00000001	Unknown (invalid)
0x00000002 - 0xffffffef	Used cluster
0xfffffff0 - 0xfffffff6	Reserved
0xfffffff7	Bad cluster
0xfffffff8 - 0xffffffff	End of cluster chain

References

exFAT file system specification, by Microsoft
exFAT, by Wikipedia

File Allocation Table (FAT) file system format

The File Allocation Table (FAT) is widely used a file sytem and is the default file system for DOS and Windows.

There are multiple known variants or derivatives of FAT, such as:

(original) 8-bit FAT
FAT-12
FAT-16
FAT-32
exFAT

Overview

A FAT file system consists of:

One or more reserved sectors
- a boot record (or boot sector)
- file system informartion for FAT-32
One or more cluster block allocation tables
Root directory data for FAT-12 and FAT-16
File and directory data

Note that FAT-32 stores the root directory as part of the file and directory data.

Characteristics

Characteristics	Description
Byte order	little-endian
Date and time values	FAT date and time
Character strings	A narrow character Single Byte Character (SBC) ASCII string

Terminology

Term	Description
Hidden sectors	The sectors stored before the FAT volume, such as those used to store a parition table

Determing the FAT format version

To distinguish between FAT-12, FAT-16 and FAT-32, compute the number of clusters in the data area:

data_area_size = total_number_of_sectors - (number_of_reserved_sectors + (
    number_of_allocation_tables * allocation_table_size) + size_of_root_directory)

number_of_clusters = round down (data_area_size / sectors_per_cluster)

FAT-12 is used if the number of clusters is less than 4085
FAT-16 is used if the number of clusters is less than 65525
FAT-32 is used otherwise

Boot record

The boot record is stored in the first sector of the volume.

FAT-12 and FAT-16 boot record

The FAT-12 and FAT-16 boot record is at least 512 bytes in size and consists of:

Offset	Size	Value	Description
0	3	"\xeb\x3c\x90"	Boot entry point (JMP +62, NOP)
3	8		File system signature (or OEM name)
11	2		Bytes per sector, which must be 512, 1024, 2048 or 4096
13	1		Sectors per cluster block, which must be 1, 2, 4, 8, 16, 32, 64 or 128
14	2		Number of reserved sectors (reserved region), which starts at the first sector of the volume (sector 0) and must be 1 or more (typically 1 or 32)
16	1		Number of cluster block allocation tables, which must be 1 or more (typically 2)
17	2		Number of root directory entries
19	2		Total number of sectors (16-bit)
21	1		Media descriptor
22	2		Cluster block allocation table size (16-bit) in number of sectors
24	2		Number of sectors per track
26	2		Number of heads
28	4		Number of hidden sectors
32	4		Total number of sectors (32-bit)
36	1		Drive number
37	1	0	Unknown (reserved for Windows NT)
38	1		Extended boot signature
If extended boot signature == 0x29
39	4		Volume serial number, which can be derived from the system current date and time
43	11		Volume label, which contains a narrow character string or "NO\x20NAME\x20\x20\x20\x20" if not set
54	8	"FAT12\x20\x20\x20" or "FAT16\x20\x20\x20"	File system hint, which is informational and not required
If extended boot signature != 0x29
39	23		Unknown
Common
62	448		Used for boot code
510	2	"\x55\xaa"	Sector signature

Note that the sector signature must be set at offset 512 but in addition can be set in the last 2 bytes of the sector.

FAT-32 boot record

The FAT-32 boot record is at least 512 bytes in size and consists of:

Offset	Size	Value	Description
0	3	"\xeb\x58\x90"	Boot entry point (JMP +90, NOP)
3	8		File system signature (or OEM name)
11	2		Bytes per sector, which must be 512, 1024, 2048 or 4096
13	1		Sectors per cluster block, which must be 1, 2, 4, 8, 16, 32, 64 or 128
14	2		Number of reserved sectors (reserved region), which starts at the first sector of the volume (sector 0) and must be 1 or more (typically 1 or 32)
16	1		Number of cluster block allocation tables, which must be 1 or more (typically 2)
17	2	0	Number of root directory entries, which must be 0 for FAT-32
19	2	0	Total number of sectors (16-bit), which must be 0 for FAT-32
21	1		Media descriptor
22	2	0	Cluster block allocation table size (16-bit) in number of sectors, which must be 0 for FAT-32
24	2		Number of sectors per track
26	2		Number of heads
28	4		Number of hidden sectors
32	4		Total number of sectors (32-bit)
36	4		Cluster block allocation table size (32-bit) in number of sectors, which must be non 0 for FAT-32
40	2		Extended flags
42	1	0	Format revision minor number
43	1	0	Format revision major number
44	4		Root directory start cluster
48	2		File system information (FSINFO) sector number
50	2		Boot record sector number
52	12	0	Unknown (reserved)
64	1		Drive number
65	1	0	Unknown (reserved for Windows NT)
66	1		Extended boot signature
If extended boot signature == 0x29
67	4		Volume serial number, which can be derived from the system current date and time
71	11		Volume label, which contains a narrow character string or "NO\x20NAME\x20\x20\x20\x20" if not set
82	8	"FAT32\x20\x20\x20"	File system hint, which is informational and not required
If extended boot signature != 0x29
67	23		Unknown
Common
90	420		Used for boot code
510	2	"\x55\xaa"	Sector signature

Note that the sector signature must be set at offset 512 but in addition can be set in the last 2 bytes of the sector.

OEM names

Value	Description
"MSWIN4.1"
"MSDOS 5.0"

Media descriptors

Value	Identifier	Description
0xe5

0xed
0xee
0xef
0xf0		removable media

0xf4
0xf5

0xf8		fixed (non-removable) media
0xf9
0xfa
0xfb
0xfc
0xfd
0xfe
0xff

Cluster block allocation table

A cluster block allocation table consists of:

One ore more cluster block allocation table entries

FAT 12 cluster block allocation table entry

A FAT 12 cluster block allocation table entry is 12 bits in size and consists of:

Offset	Size	Value	Description
0	12 bits		Data cluster number

Where the data cluster number has the following meanings:

Value(s)	Description
0x000	Unused (free) cluster
0x001	Unknown (invalid)
0x002 - 0xfef	Used cluster
0xff0 - 0xff6	Reserved
0xff7	Bad cluster
0xff8 - 0xfff	End of cluster chain

FAT 16 cluster block allocation table entry

A FAT 16 cluster block allocation table entry is 16 bits in size and consists of:

Offset	Size	Value	Description
0	16 bits		Data cluster number

Where the data cluster number has the following meanings:

Value(s)	Description
0x0000	Unused (free) cluster
0x0001	Unknown (invalid)
0x0002 - 0xffef	Used cluster
0xfff0 - 0xfff6	Reserved
0xfff7	Bad cluster
0xfff8 - 0xffff	End of cluster chain

FAT 32 cluster block allocation table entry

A FAT 32 cluster block allocation table entry is 32 bits in size and consists of:

Offset	Size	Value	Description
0	32 bits		Data cluster number

Note that only the lower 28-bits are used

Where the data cluster number has the following meanings:

Value(s)	Description
0x00000000	Unused (free) cluster
0x00000001	Unknown (invalid)
0x00000002 - 0x0fffffef	Used cluster
0x0ffffff0 - 0x0ffffff6	Reserved
0x0ffffff7	Bad cluster
0x0ffffff8 - 0x0fffffff	End of cluster chain
0x10000000 - 0xffffffff	Unknown

References

Microsoft Extensible Firmware Initiative FAT32 File System Specification, by Microsoft
Design of the FAT file system, by Wikipedia
File Allocation Table, by Wikipedia

Hierarchical File System (HFS) format

The Hierarchical File System (HFS) was the default file system for Mac OS after Macintosh File System (MFS) and before Apple File System (APFS).

Note that this document uses Mac OS to refer to the Macintosh Operating System in general, instead of specific versions like Mac OS X or macOS. Mac OS X is used to refer to version of Mac OS 10.0 or later.

There are multiple known variants or derivatives of HFS, such as:

HFS
HFS+ 8.10, used by Mac OS 8.1 to 9.2.2
HFS+ 10.0, introduced in Mac OS 10.0
HFSX, introduced in Mac OS 10.3

Note that HFS can be referred to as “HFS Standard” and HFS+ or HFSX as “HFS Extended”.

HFSX (or HFS/X) is an extension to HFS+ to allow additional features that are incompatible with HFS+. One such feature is case-sensitive file names. A HFSX volume may be either case-sensitive or case-insensitive. Case sensitivity (or lack thereof) applies to all file and directory names on the volume.

Overview

Feature	HFS	HFS+ and HFSX
Maximum file size	231 (2 GiB)	263 (8 EiB)
Maximum file name size	31 characters	255 characters
Maximum number of blocks	216 (65535 bytes)	232 (4294967296 bytes)
Character set	narrow character with codepage	Unicode UTF-16 big-endian
Time stamps	In local time	In UTC
Catalog B-tree file node size	512 bytes	4096 bytes
File attributes	none	Basic and extended

HFS

A HFS file system consists of:

optional MFS boot block
master directory block (MDB)
volume bitmap
extents overflow file
catalog file
optional backup (or alternate) master directory block (MDB)

The backup master directory block (MDB), is stored in the last 2 sectors of the volume.

Characteristics

Characteristics	Description
Byte order	big-endian
Date and time values	HFS timestamp in local time
Character strings	Narrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a system defined codepage

HFS+ and HFSX

A HFS+ or HFSX file system consists of:

reserved (or unused) blocks
volume header
allocation file
extents overflow file
catalog file
optional attributes file
optional startup file
optional backup (or alternate) volume header

The backup volume header, is stored in the last 1024 bytes of the volume.

Characteristics

Characteristics	Description
Byte order	big-endian
Date and time values	HFS timestamp in UTC
Character strings	UTF-16 big-endian

Terminology

Term	Description
Clump size	Size of the group of (allocation) blocks (or clump), in bytes, to avoid fragmentation

Unicode strings

Unicode strings are stored as UTF-16 big-endian in Normalization Form Canonical Decomposition (NFD) based on Unicode 3.2, with exclusions. Unicode values in the ranges U+2000 - U+2FFF, U+F900 - U+FAFF and U+2F800 - U+2FAFF are not decomposed.

On Mac OS 8.1 through 10.2.x decomposition was based on Unicode 2.1.

TODO: determine what the impact of the different Unicode versions is.

Note that based on observations on Mac OS 10.15.7 on HFS+ the range U+1D000 - U+1D1FF is excluded from decomposition and U+2400 is replaced by U+0.

HFS timestamp

Date and time values are stored as an unsigned 32-bit integer containing the number of seconds since January 1, 1904 at 00:00:00 (midnight), where:

MFS and HFS use local time;
HFS+ and HFSX use Coordinated Universal Time (UTC).

This document will refer to both forms as HFS timestamp.

The maximum representable date is February 6, 2040 at 06:28:15 UTC.

The HFS timestamp does not account for leap seconds. It includes a leap day in every year that is evenly divisible by 4. This is sufficient given that the range of representable dates does not contain 1900 or 2100, neither of which have leap days.

File names

TN1150 states that HFS file names are compared in case-insensitive assuming a MacRoman encoding.

Upper case	Lower case
0x41 - 0x5a (A - Z)	0x61 - 0x7a (a - z)
0x80 (Ä)	0x8a (ä)
0x81 (Å)	0x8c (å)
0x82 (Ç)	0x8d (ç)
0x83 (É)	0x8e (é)
0x84 (Ñ)	0x96 (ñ)
0x85 (Ö)	0x9a (ö)
0x86 (Ü)	0x9f (ü)
0xae (Æ)	0xbe (æ)
0xaf (Ø)	0xbf (ø)
0xcb (À)	0x88 (à)
0xcc (Ã)	0x8b (ã)
0xcd (Õ)	0x9b (õ)
0xce (Œ)	0xcf (œ)
0xd9 (Ÿ)	0xd8 (ÿ)
0xe5 (Â)	0x89 (â)
0xe6 (Ê)	0x90 (ê)
0xe7 (Á)	0x87 (á)
0xe8 (Ë)	0x91 (ë)
0xe9 (È)	0x8f (è)
0xea (Í)	0x92 (í)
0xeb (Î)	0x94 (î)
0xec (Ï)	0x95 (ï)
0xed (Ì)	0x93 (ì)
0xee (Ó)	0x97 (ó)
0xef (Ô)	0x99 (ô)
0xf1 (Ò)	0x98 (ò)
0xf2 (Ú)	0x9c (ú)
0xf3 (Û)	0x9e (û)
0xf4 (Ù)	0x9d (ù)

HFS+ allows for the “/” character in file names. On Mac OS, Finder this will be represented as a “/” but in Terminal it is replaced by “:” since the same character is used as path segment separator. A file name with a “:” created in Terminal will be shown as “/” in Finder. Finder does not allow the creation of a file containing “:” in the name. A symbolic link created in Terminal to a file with a “:” in name will not convert the “:” character in the link target data. The Linux HFS+ implementation appears to apply a similar conversion logic as Terminal.

B-tree files

HFS, HFS+ and HFSX use multiple B-trees files.

A B-tree file consists of fixed sized nodes:

header node
map nodes
index (root and branch) nodes
leaf nodes

Note that only the data fork of a B-tree file is used. The resource fork should be unused.

The size of a B-tree file can be calculated in the following manner:

size = number_of_nodes * node_size

Node size

The node size is determined when the B-tree file is created.

Feature	HFS	HFS+ and HFSX
Node size	512 bytes	where the value must be a power of 2 in the range 512 - 32768

In a HFS+ the B-tree node size is stored in the header node.

Default node sizes:

Feature	HFS	HFS+ and HFSX
catalog file	512	4 KiB (8 KiB in Mac OS X)
extents overflow file	512	1 KiB (4 KiB in Mac OS X)
attributes file	N/A	4 KiB

B-tree (file) node

A B-tree file node consists of:

node descriptor
node records
node record offsets

The first node in the file is referenced by node number 0.

The node offset relative to the start of the file and can be calculated in the following manner:

node_offset = node_number * node_size

B-tree node descriptor

The B-tree node descriptor (BTNodeDescriptor) is 14 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Next tree node number (forward link), which contains 0 if empty
4	4		Previous tree node number (backward link), which contains 0 if empty
8	1		Node type, which consists of a signed 8-bit integer
9	1		Node level, which consists of a signed 8-bit integer
10	2		Number of records
12	2	0	Unknown (Reserved), should contain 0

The root node level is 0, with a maximum depth of 8.

B-tree node types

Value	Identifier	Description
-1	kBTLeafNode	leaf node
0	kBTIndexNode	index node
1	kBTHeaderNode	header node
2	kBTMapNode	map node

B-tree node record

The B-tree node record contains (leaf) data or a reference to an index node and consists of:

a key
value data

B-tree record offsets

The B-tree record offsets are an array of 16-bit integers relative from the start of the B-tree node descriptor. The first record offset is found at node size - 2, e.g. 512 - 2 = 510, the second 2 bytes before that, e.g. 508, etc.

An additional record offset is added at the end to signify the start of the free space.

Note that the record offsets are not necessarily stored in linear order.

B-tree header node

The B-tree header node is stored in the first node of the B-tree file and contains 3 records:

the B-tree header record;
the user data record, which consist of 128 bytes (reserved within HFS);
the B-tree map record.

Note that the records in the B-tree header node do not have keys.

B-tree header record

The B-tree header record (BTHeaderRec) is 106 bytes in size and consists of:

Offset	Size	Description
0	2	Depth of the tree
2	4	Root node number
6	4	Number of data records contained in leaf nodes
10	4	First leaf node number
14	4	Last leaf node number
18	2	Node size, in bytes, where the value must be a power of 2 in the range 512 - 32768
20	2	Maximum key size, in bytes
22	4	Number of nodes
26	4	Number of unused nodes
HFS
30	76	Unknown (Reserved)
HFS+/HFSX
30	2	Unknown (Reserved)
32	4	Clump size, in bytes
36	1	B-tree file type
37	1	Key comparision method
38	4	Flags (or attributes)
42	16 x 4 = 64	Unknown (Reserved)

TODO: does the number of data records equal the number of leaf nodes?

File type

Value	Identifier	Description
0x00		Control file
0x80		First user B-tree type
0xff		Reserved B-tree type

Key comparision methodtype

Value	Identifier	Description
0x00		Unknown (not set), observed on HFS standard, HFS+ and an empty HFSX file system
0xbc		Binary compare (case-sensitive)
0xcf		Unicode case folding (case-insensitive)

Flags

Value	Identifier	Description
0x00000001	kBTBadCloseMask	Bad close, which indicates that the B-tree was not closed properly and should be checked for consistency (Not used by HFS+ and HFSX)
0x00000002	kBTBigKeysMask	Big keys, which indicates the key data size value of the keys in index and leaf nodes is 16-bit integer, otherwise, it is an 8-bit integer (Must be set for HFS+ and HFSX)
0x00000004	kBTVariableIndexKeysMask	Variable-size (index) keys, which indicates that the keys in index nodes occupy the number of bytes indicated by their key size; otherwise, the keys in index nodes always occupy maximum key size (must be set for the HFS+ and HFSX Catalog B-tree, and cleared for the HFS+ and HFSX Extents overflow B-tree)

B-tree map record

The B-tree map record contains of a bitmap that indicates which nodes in the B-tree file are used and which are not. If a bit is set, then the corresponding node in the B-tree file is in use.

The bitmap is 256 bytes in size and can represent a maximum of 2048 nodes. If more nodes are needed a map node is used to store additional mappings.

The map node

If a B-tree file contains more than 2048 nodes, which are enough for about 8000 files, a map node is used to store additional node-mapping information.

The next tree node value in the B-tree node descriptor of the header node is used to refer to the first map node.

A map node consists of a B-tree node descriptor and one B-tree map record. The map record is 494 bytes in size 512 - (14 + 2) and can therefore contain mapping information for 3952 nodes.

If a B-tree contains more than 6000 nodes (enough for about 25000 files) a second map node is needed. The next tree node value in the B-tree node descriptor of the first map node is used to refer to the second.

If more map nodes are required, each additional map node is similarly linked to the previous one.

The root node

The root node is the start of the B-tree structure; usually the root node is an index node, but it might be a leaf node if there are no index nodes.

The root node number is stored in the B-tree header record and is 0 if the B-tree is empty.

The index node

The records stored in an index node are called pointer records. A pointer record consists of a key followed by the node number of the corresponding node. The size of the key varies according to the type of B-tree file.

In a catalog file, the search key is a combination of the file or directory name and the parent identifier of that file or directory.
In an extents overflow file, the search key is a combination of that file’s type, its file identifier and the index of the first block in the extent.

The immediate descendants of an index node are called the children of the index node. An index node can have from 1 to 15 children, depending on the size of the pointer records that the index node contains.

The leaf node

The leaf nodes contain data records. The structure of the leaf node data records varies according to the type of B-tree.

In an extents overflow file, the leaf node data records consist of a key and an extent record.
In a catalog file, the leaf node data records can be any one of four kinds of records.

HFS Master Directory Block (MDB)

The primary Master Directory Block (MDB) (or volume information block (VIB)) is located at offset 1024 of the volume.

The MDB is 162 bytes in size and consists of:

Offset	Size	Value	Description
0	2	"BD" (or "\x42\x44")	Volume signature
2	4		Creation time, which contains a HFS timestamp in local time
6	4		(last) modification time, which contains a HFS timestamp in local time
10	2		Volume attribute flags
12	2		Number of files in the root directory
14	2		Volume bitmap block number, contains a block number relative from the start of the volume, where 0 is the first block number, typically 3
16	2		Next allocation search block number
18	2		Number of blocks, where a volume can contain at most 65535 blocks
20	4		Block size, in bytes, must be a multitude of 512
24	4		Clump size, in bytes
28	2		Data area block number, contains a block number relative from the start of the volume, where 0 is the first block number
30	4		Next available catalog node identifier (CNID), which can be a directory or file record identifier
34	2		Number of unused blocks
36	1		Volume label size, with a maximum of 27
37	27		Volume label
64	4		(last) backup time, which contains a HFS timestamp in local time
68	2		Backup sequence number
70	4		Volume write count, which contains the number of times the volume has been written to
74	4		Extents overflow file clump size, in bytes
78	4		Catalog file clump size, in bytes
82	2		Number of sub directories in the root directory
84	4		Total number of files, which does not include file system metadata files
88	4		Total number of directories (folders), which does not include the root folder
92	32		Finder information
124	2		Embedded volume signature (drVCSize)
126	4		Embedded volume extent descriptor (drVBMCSize and drCtlCSize)
130	4		Extents overflow file size
134	12		Extents overflow file extents record
146	4		Catalog file size
150	12		Catalog file extents record

Note that the volume modification time is not necessarily the data and time when the volume was last flushed.

Notes

TODO: check

drVCSize => Volume cache block size (16-bit)
drVBMCSize => Volume bitmap cache block size (16-bit)
drCtlCSize => Common volume cache block size (16-bit)

HFS Volume Bitmap

The volume bitmap is used to keep track of block allocation. The bitmap contains one bit for each block in the volume.

If a bit is set, the corresponding block is currently in use by some file.
If a bit is clear, the corresponding block is not currently in use by any file and is available.

The volume bitmap does not indicate which files occupy which blocks. The actual file-mapping information in maintained in two locations:

in the corresponding catalog entry;
in the corresponding extents overflow file entry.

The size of the volume bitmap depends on the number of blocks in the volume.

A 800 KiB floppy disk with a block size of 512 bytes has a volume bitmap size of:

((800 * 1024) / (512 * 8)) = 1600 bits (200 bytes).

A 32 MiB volume containing 32 MiB with a block size of 512 bytes has a volume bitmap size of:

((32 * 1024 * 1024) / (512 * 8)) = 65536 bits (8192 bytes).

The number of blocks in the volume in the MDB consists of a 16-bit integer, so no more than 65535 blocks can be addressed. The volume bitmap is never larger than 8192 bytes (or 16 physical blocks). For volumes containing more than 32 MiB of space, the block size must be increased.

A volume containing 40 MiB of space must have an block size that is at least 2 x 512 bytes.

A volume containing 80 MiB of space must have an block size that is at least 3 x 512 bytes.

HFS+ and HFSX Volume Header

The volume header (HFSPlusVolumeHeader) replaces the master directory block (MDB). The volume header starts at offset 1024 of the volume.

The block containing the first 1536 bytes (reserved space plus volume header) are marked as used in the allocation file.

The volume header is 512 bytes in size and consists of:

Offset	Size	Value	Description
0	2	"H+" (or "\x48\x2b") or "HX" (or "\x48\x58")	Volume signature, where "H+" (kHFSPlusSigWord) is used for HFS+ and "HX" (kHFSXSigWord) for HFSX
2	2		Format version, where 4 (kHFSPlusVersion) is used for HFS+ and 5 (kHFSXVersion) for HFSX
4	4		Volume attribute flags
8	4		Last mounted version
12	4		Journal information block number, contains a block number relative from the start of the volume
16	4		Creation time, which contains a HFS timestamp in UTC
20	4		(last) content modification time, which contains a HFS timestamp in UTC
24	4		(last) backup time, which contains a HFS timestamp in UTC
28	4		Checked time, which contains a HFS timestamp in UTC
32	4		Total number of files, which does not include file system metadata files
36	4		Total number of directories (folders), which does not include the root folder
40	4		Block size, in bytes
44	4		Total number of blocks
48	4		Number of unused blocks
52	4		Next allocation search block number (nextAllocation)
56	4		Clump size, in bytes, of a resource fork
60	4		Clump size, in bytes, of a data fork
64	4		Next available catalog node identifier (CNID), which can be a directory or file record identifier
68	4		Volume write count, which contains the number of times the volume has been written to
72	8		Encodings bitmap
80	32		Finder information
112	80		Allocation file fork descriptor
192	80		Extents overflow file fork descriptor
272	80		Catalog file fork descriptor
352	80		Attributes file fork descriptor
432	80		Startup file fork descriptor

Total number of blocks

For a disk whose size is an even multiple of the block size, all areas on the disk are included in an block, including the volume header and backup volume header. For a disk whose size is not an even multiple of the block size, only the blocks that will fit entirely on the disk are counted here. The remaining space at the end of the disk is not used by the volume format (except for storing the backup volume header, as described above).

Volume attribute flags

The volume attributes flags are specified as following.

Value	Identifier	Description
0x00000080	kHFSVolumeHardwareLockBit	Volume hardware lock, set if the volume is write-protected due to a hardware setting
0x00000100	kHFSVolumeUnmountedBit	Volume unmounted, set if the volume was correctly flushed before being unmounted or ejected
0x00000200	kHFSVolumeSparedBlocksBit	Volume spared blocks, set if there are any records in the extents overflow file for bad blocks
0x00000400	kHFSVolumeNoCacheRequiredBit	Volume no cache required, set if the blocks from this volume should not be cached
0x00000800	kHFSBootVolumeInconsistentBit	Boot volume inconsistent, set if the volume was mounted for writing
0x00001000	kHFSCatalogNodeIDsReusedBit	Catalog node identifiers reused, set when the next catalog identifier value overflows 32 bits, forcing smaller catalog node identifiers to be reused
0x00002000	kHFSVolumeJournaledBit	Journaled, set if the file system uses a journal
0x00004000	kHFSVolumeInconsistentBit	Unknown (Reserved)
0x00008000	kHFSVolumeSoftwareLockBit	Volume software lock, set if the volume is write-protected due to a software setting

0x40000000	kHFSContentProtectionBit	Unknown (Reserved)
0x80000000	kHFSUnusedNodeFixBit	Unknown (Reserved)

Last mounted version

Value	Identifier	Description
"8.10"		used by Mac OS 8.1 to 9.2.2
"10.0"	kHFSPlusMountVersion	used by Mac OS X
"FSK!" or "fsck"		used by fsck_hfs on Mac OS X
"HFSJ"	kHFSJMountVersion	used by journaled HFS+ or HFSX

Links

TODO: add text about HFS standard

HFS+ supports both hard links and symbolic links.

Hard links to directories are not supported (allowed).

Hard Links

Hard links in HFS+/HFSX are represented by multiple different types of file records:

one indirect node file record, named “iNode#”, where # is the link reference. This file contains the content of the file shared by the hard links.
one or more hard link file records, that reference the indirect node file record.

Indirect node files are stored in a file system metadata directory referred to as the metadata directory with the name “/\u{2400}\u{2400}\u{2400}\u{2400}HFS+ Private Data”.

The link reference corresponds to the catalog node identifier (CNID) of the indirect node file, where 0 is not a valid link reference.

Note that TN1150 states that a new link reference randomly chosen from the range 100 to 1073741923. However link references that fall outside of this range have been observed such as “iNode20”.

The special permission data of the hard link file records contains the link reference if:

the catalog file record flag kHFSHasLinkChainMask is set;
and the first 8 bytes of the file information contains “hlnkhfs+”

Value	Identifier	Description
"hlnk"	kHardLinkFileType	Hard link file type
"hfs+"	kHFSPlusCreator	Hard link file creator

The hard link file’s creation date should be set to the creation date of the metadata directory, but the creation date may also be set to the creation date of the volume’s root directory though this is deprecated.

Device identifier

The Special permission data contains the device identifier. The device identifier can be stored in different formats, such as: “native”, “386bsd”, “4bsd”, “bsdos”, “freebsd”, “hpux”, “isc”, “linux”, “netbsd”, “osf1”, “sco”, “solaris”, “sunos”, “svr3”, “svr4” and “ultrix”.

The “native” and “hpux” device identifier is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	1		Major device number
1	2	0	Unknown
3	1		Minor device number

The “386bsd”, “4bsd”, “freebsd”, “isc”, “linux”, “netbsd”, “sco”, “sunos”, “svr3” and “ultrix” device identifier is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	2	0	Unknown
2	1		Major device number
3	1		Minor device number

The “solaris” and “svr4” device identifier is 4 bytes in size and consists of:

Offset	Size	Value	Description
0.0	18 bits		Minor device number
2.2	14 bits		Major device number

The “bsdos” and “osf1” device identifier is 4 bytes in size and consists of:

Offset	Size	Value	Description
0.0	20 bits		Minor device number
2.4	12 bits		Major device number

The “bsdos” alternative device identifier is 4 bytes in size and consists of:

Offset	Size	Description
0.0	8 bits	Sub unit number
1.0	12 bits	Unit number
2.4	12 bits	Major device number

Symbolic Links

The data fork of a symbolic link contains the path of the directory or file it refers to.

On HFS+/HFSX the symbolic link target contains a POSIX pathname, as used by the Mac OS BSD and Cocoa programming interfaces; not a traditional Mac OS or Carbon, path.

The path is stored as an UTF-8 encoded string without an end-of-string character. The length of the path should be 1024 bytes or less. The path may be full or partial, with or without a leading forward slash.

The first 8 bytes of the file information should contain “slnkrhap”.

Value	Identifier	Description
"slnk"	kSymLinkFileType	Symbolic link file type
"rhap"	kSymLinkCreator	Symbolic link file creator

The resource fork of a symbolic link is reserved and should be 0 bytes in size.

The catalog file

The catalog file is a B-tree file used to maintain information about the hierarchy of files and directories of a volume.

The block number of the first file extent of the catalog file (the header node) is stored in the master directory block (HFS) or the volume header (HFS+). The B-tree structure is described in section: B-tree files.

Each node in the catalog file is assigned a unique catalog node identifier (CNID). The CNID is used for both directory and file identifiers. For any given file or directory the parent identifier is the CNID of the parent directory. The first 16 CNIDs are reserved for use by Apple and include the following standard assignments:

CNID	Identifier	Assignment
0		Unknown (Reserved)
1	kHFSRootParentID	Parent identifier of the root directory (folder)
2	kHFSRootFolderID	Directory identifier of the root directory (folder)
3	kHFSExtentsFileID	Extents overflow file
4	kHFSCatalogFileID	Catalog file
5	kHFSBadBlockFileID	Bad allocation block file
6	kHFSAllocationFileID	Allocation file (HFS+)
7	kHFSStartupFileID	Startup file (HFS+)
8	kHFSAttributesFileID	Attributes file (HFS+)

14	kHFSRepairCatalogFileID	Used temporarily by fsck_hfs when rebuilding the catalog file
15	kHFSBogusExtentFileID	Bogus extent file, which is used temporarily during exchange files operations
16	kHFSFirstUserCatalogNodeID	First available CNID for user's files and folders

Catalog file keys

In a catalog file a key consists of:

parent directory identifier
(optional) file or directory name

The volume reference number is not included in the search key.

Text encoding hint

Encoding type	Value	Encodings bitmap number
MacRoman	0	0
MacJapanese	1	1
MacChineseTrad	2	2
MacKorean	3	3
MacArabic	4	4
MacHebrew	5	5
MacGreek	6	6
MacCyrillic	7	7

MacDevanagari	9	9
MacGurmukhi	10	10
MacGujarati	11	11
MacOriya	12	12
MacBengali	13	13
MacTamil	14	14
MacTelugu	15	15
MacKannada	16	16
MacMalayalam	17	17
MacSinhalese	18	18
MacBurmese	19	19
MacKhmer	20	20
MacThai	21	21
MacLaotian	22	22
MacGeorgian	23	23
MacArmenian	24	24
MacChineseSimp	25	25
MacTibetan	26	26
MacMongolian	27	27
MacEthiopic	28	28
MacCentralEurRoman	29	29
MacVietnamese	30	30
MacExtArabic	31	31

MacSymbol	33	33
MacDingbats	34	34
MacTurkish	35	35
MacCroatian	36	36
MacIcelandic	37	37
MacRomanian	38	38

MacFarsi	140	49

MacUkrainian	152	48

HFS catalog key

The HFS catalog key is of variable size and consists of:

Offset	Size	Description
0	1	Key data size, in bytes, which consists of a signed 8-bit integer
If key data size >= 6
1	1	Unknown (Reserved)
2	4	Parent identifier (CNID)
6	1	Name size without the end-of-string character
7	...	Name string, which contains a narrow character string without end-of-string character
...	...	Unknown (Alignment padding)

Note that a key data size of 0 indicates a records that is no longer in use.

The catalog node name always is stored as 32 bytes and therefore the maximum key size within an index node should be 37. In a leaf node the catalog node name varies in size.

Keys in a leaf node must be stored 16-bit aligned within the node data. The size of the alignment padding is not included in the key data size.

HFS+ and HFSX catalog key

The HFS+ and HFSX catalog key is of variable size and consists of:

Offset	Size	Description
0	2	Key data size, in bytes
If key data size >= 4
2	4	Parent identifier, which contains a CNID
If key data size >= 6
6	2	Number of characters in the name string
8	...	Name string, which contains an UTF-16 big-endian string without end-of-string character

Note that the characters ‘:’ and U+2400 are stored as ‘/’ and U+0 respectively and must be converted before comparision.

The catalog data

A catalog leaf node can contain four different types of records:

a folder record, which contains information about a single directory.
a file record, which contains information about a single file.
a folder thread record, which provides a link between a directory and its parent directory.
a file thread record, which provides a link between a file and its parent directory.

The thread records are used to find the name and directory identifier of the parent of a given file or directory.

Each catalog data record consists of:

the catalog data record header;
the catalog data record data.

The catalog data record header

HFS catalog data record header

The HFS catalog data record header is 2 bytes in size and consists of:

Offset	Size	Value	Description
0	1		Record type, which consists of a signed 8-bit integer
1	1	0x00	Unknown (Reserved), which consists of a signed 8-bit integer

Note that to distinguish between HFS and HFS+ record types, record type should be treated as a 16-bit big-endian value.

HFS+ and HFSX catalog data record header

The HFS+ and HFSX catalog data record header is 2 bytes in size and consists of:

Offset	Size	Value	Description
0	2		Record type

The catalog data record types

Value	Identifier	Description
0x0001	kHFSPlusFolderRecord	HFS+/HFSX Folder record
0x0002	kHFSPlusFileRecord	HFS+/HFSX File record
0x0003	kHFSPlusFolderThreadRecord	HFS+/HFSX Folder thread record
0x0004	kHFSPlusFileThreadRecord	HFS+/HFSX File thread record

0x0100	kHFSFolderRecord (or cdrDirRec)	HFS Folder record
0x0200	kHFSFileRecord (or cdrFilRec)	HFS File record
0x0300	kHFSFolderThreadRecord (or cdrThdRec)	HFS Folder thread record
0x0400	kHFSFileThreadRecord (or cdrFThdRec)	HFS File thread record

The catalog folder record

HFS catalog folder record

The HFS catalog folder record (cdrDirRec, kHFSFolderRecord) is 70 bytes in size and consists of:

Offset	Size	Value	Description
0	2	0x0100	Record type
2	2		Folder flags
4	2		Number of directory entries (valence)
6	4		Identifier (CNID)
10	4		Creation time, which contains a HFS timestamp in local time
14	4		(last) content modification time, which contains a HFS timestamp in local time
18	4		(last) backup time, which contains a HFS timestamp in local time
22	16		Folder information
38	16		Extended folder information
54	4 x 4 = 16		Unknown (Reserved), which consists of an array of 32-bit integer values

HFS catalog folder record flags

Not defined. The HFS catalog folder record appears to always have a corresponding folder thread record.

HFS+ and HFSX catalog folder record

The HFS+ and HFSX catalog folder record (HFSPlusCatalogFolder) is 88 bytes in size and consists of:

Offset	Size	Value	Description
0	2	0x0001	Record type
2	2		Flags
4	4		Number of directory entries (valence)
8	4		Identifier (CNID)
12	4		Creation time, which contains a HFS timestamp in UTC
16	4		(last) content modification time, which contains a HFS timestamp in UTC
20	4		(last) record (or attribute) modification (or change) time, which contains a HFS timestamp in UTC
24	4		(last) access time, which contains a HFS timestamp in UTC
28	4		(last) backup time, which contains a HFS timestamp in UTC
Permissions
32	4		Owner identifier
36	4		Group identifier
40	1		Administration flags
41	1		Owner flags
42	2		File mode
44	4		Special permission data
Folder information
48	16		Folder information
Extended folder information
64	16		Extended folder information

80	4		Text encoding hint
84	4	0x00	Unknown (Reserved)

The catalog file record

HFS catalog file record

The HFS catalog file record (cdrFilRec, kHFSFileRecord) is 102 bytes in size and consists of:

Offset	Size	Value	Description
0	2	0x0200	Record type
2	1		Flags, which consists of a signed 8-bit integer
3	1	0x00	File type, which consists of a signed 8-bit integer and should contain 0
4	16		File information
20	4		Identifier (CNID)
24	2		Data fork block number
26	4		Data fork size
30	4		Data fork allocated size
34	2		Resource fork block number
36	4		Resource fork size
40	4		Resource fork allocated size
44	4		Creation time, which contains a HFS timestamp in local time
48	4		(last) content modification time, which contains a HFS timestamp in local time
52	4		(last) backup time, which contains a HFS timestamp in local time
56	16		Extended file information
72	2		Clump size
74	12		Data fork extents record
86	12		Resource fork extents record
98	4	0x00	Unknown (Reserved)

TODO: determine if the data and resource fork block number values are used

HFS catalog file record flags

Value	Identifier	Description
0x0001		File is locked and cannot be written to
0x0002		Has thread record

0x0080	kHFSHasDateAddedMask	Had added time

HFS+ and HFSX catalog file record

The HFS+ and HFSX catalog file record (kHFSPlusFileRecord) is 248 bytes in size and consists of:

Offset	Size	Value	Description
0	2	0x0002	Record type
2	2		Flags
4	4	0x00	Unknown (Reserved)
8	4		Identifier (CNID)
12	4		Creation time, which contains a HFS timestamp in UTC
16	4		(last) content modification time, which contains a HFS timestamp in UTC
20	4		(last) record (or attribute) modification time, which contains a HFS timestamp in UTC
24	4		(last) access time, which contains a HFS timestamp in UTC
28	4		(last) backup time, which contains a HFS timestamp in UTC
Permissions
32	4		Owner identifier
36	4		Group identifier
40	1		Administration flags
41	1		Owner flags
42	2		File mode
44	4		Special permission data
File information
48	16		File information (or user information)
Extended file information
64	16		Extended file information (or finder information)

80	4		Text encoding hint
84	4	0x00	Unknown (Reserved)
88	80		Data fork descriptor
168	80		Resource fork descriptor

HFS+ catalog file record flags

Value	Identifier	Description
0x0001	kHFSFileLockedMask	File is locked and cannot be written to
0x0002	kHFSThreadExistsMask	Has thread record, which should be always set for a file record on HFS+/HSFX
0x0004	kHFSHasAttributesMask	Has extended attributes
0x0008	kHFSHasSecurityMask	Has ACLs
0x0010	kHFSHasFolderCountMask	Has number of sub-folder
0x0020	kHFSHasLinkChainMask	Has a hard link target (link chain), where the CNID of the hard link target is stored in the special permission data
0x0040	kHFSHasChildLinkMask	Has a child that is a directory link
0x0080	kHFSHasDateAddedMask	Had added time, where the extended folder of file information contains the time the folder or file was added (date_added)
0x0100	kHFSFastDevPinnedMask	Unknown
0x0200	kHFSDoNotFastDevPinMask	Unknown
0x0400	kHFSFastDevCandidateMask	Unknown
0x0800	kHFSAutoCandidateMask	Unknown

The catalog thread record

The file thread record is similar to the folder thread record except that it refers to a file, instead of a directory.

HFS catalog file thread record

The HFS catalog thread record (kHFSFolderThreadRecord (or cdrThdRec), kHFSFileThreadRecord (or cdrFThdRec)) is of variable size and consists of:

Offset	Size	Value	Description
0	2	0x0300 or 0x0400	Record type
2	2 x 4 = 8	0x00	Unknown (Reserved), which consists of an array of 32-bit integer values
10	4		Parent identifier (CNID)
14	1		Number of characters in the name string, with a maximum of 31
15	...		Name string, which contains a narrow character string without end-of-string character

HFS+ and HFSX catalog file thread record

The HFS+ and HFSX catalog thread record (kHFSPlusFolderThreadRecord, kHFSPlusFileThreadRecord) is of variable size and consists of:

Offset	Size	Value	Description
0	2	0x0003 or 0x0004	Record type
2	2	0x00	Unknown (Reserved), which consists of a unsigned 16-bit integer
4	4		Parent identifier (CNID)
8	2		Number of characters in the name string, with a maximum of 255
10	...		Name string, which contains an UTF-16 big-endian string without end-of-string character

Permissions

For each file and folder HFS+ maintains basic access permissions record for each file and folder. These are similar to basic Unix file permissions.

TODO: add note about permissions on HFS

Owner and group identifier

The Mac OS X user ID of the owner of the file or folder. Mac OS X versions prior to 10.3 treats user ID 99 as if it was the user ID of the user currently logged in to the console. If no user is logged in to the console, user ID 99 is treated as user ID 0 (root). Mac OS X version 10.3 treats user ID 99 as if it was the user ID of the process making the call (in effect, making it owned by everyone simultaneously). These substitutions happen at run-time. The actual user ID on disk is not changed.

The Mac OS X group ID of the group associated with the file or folder. Mac OS X typically maps group ID 99 to the group named “unknown.” There is no run-time substitution of group IDs in Mac OS X.

Administration flags

Value	Identifier	Description
0x01	SF_ARCHIVED	File has been archived
0x02	SF_IMMUTABLE	File is immutable and may not be changed
0x04	SF_APPEND	Writes to file may only append

Owner flags

Value	Identifier	Description
0x01	UF_NODUMP	Do not backup (dump) this file
0x02	UF_IMMUTABLE	File is immutable and may not be changed
0x04	UF_APPEND	Writes to file may only append
0x08	UF_OPAQUE	Directory is opaque

File mode

Value	Identifier	Description
0xf000 (0170000)	S_IFMT	File type bitmask
0x1000 (0010000)	S_IFIFO	Named pipe
0x2000 (0020000)	S_IFCHR	Character-special file (Character device)
0x4000 (0040000)	S_IFDIR	Directory
0x6000 (0060000)	S_IFBLK	Block-special file (Block device)
0x8000 (0100000)	S_IFREG	Regular file
0xa000 (0120000)	S_IFLNK	Symbolic link
0xc000 (0140000)	S_IFSOCK	Socket
0xe000 (0160000)	S_IFWHT	Whiteout, which is a file entry that covers up all entries of a particular name from lower branches

HFS+ uses the BSD file type and mode bits. Note that the constants from the header shown below are in octal (base eight), not hexadecimal.

Octal value	Identifier	Description
0004000	S_ISUID	Set user identifier on execution
0002000	S_ISGID	Set group identifier on execution
0001000	S_ISTXT	Sticky bit

0000700	S_IRWXU	Read, write and execute access for owner
0000400	S_IRUSR	Read access for owner
0000200	S_IWUSR	Write access for owner
0000100	S_IXUSR	Execute access for owner

0000070	S_IRWXG	Read, write and execute access for group
0000040	S_IRGRP	Read access for group
0000020	S_IWGRP	Write access for group
0000010	S_IXGRP	Execute access for group

0000007	S_IRWXO	Read, write and execute access for other
0000004	S_IROTH	Read access for other
0000002	S_IWOTH	Write access for other
0000001	S_IXOTH	Execute access for other

Note that if the sticky bit is set for a directory, then Mac OS restricts movement, deletion, and renaming of files in that directory. Files may be removed or renamed only if the user has write access to the directory; and is the owner of the file or the directory, or is the super-user.

HFS+ file special permission data

The special permission data is used to store the following information:

hard link reference (iNodeNum)
number of (hard) links (linkCount) in indirect node files
device numbers of block (S_IFBLK) and character (S_IFCHR) devices files

File system hierarchy

File and folder records have a search key with a non-empty name string. In thread records the name string in the search key is empty. E.g. to list the file entries in a directory:

find all the file or folder records given the parent CNID

Finding a file or directory by its CNID is a two-step process:

use the CNID to look up the thread record for the file or directory
use the thread record to look up the file or folder record

File forks

Forks in HFS and HFS+ can be compared to data streams in NTFS. In HFS+ the fork values are grouped in a separate fork descriptor structure. HFS+ also defines extended attributes (named forks). These are not stored in the catalog file but in the attributes file.

HFS+ fork descriptor structure

HFS+ maintains information about file contents using the HFS+ fork descriptor structure (HFSPlusForkData).

The fork descriptor structure is 80 bytes in size and consists of:

Offset	Size	Description
0	8	Size, in bytes
8	4	Clump size, in bytes
12	4	Number of blocks
16	64	Data extents record

The extents overflow file

In HFS and HFS+ extents (contiguous ranges of blocks) are used to track which blocks belong to a file. The first three (HFS) and eight (HFS+) are stored in the catalog file. Additional extents are stored in the extents overflow file.

The structure of an extents overflow file is relatively simple compared to that of a catalog file. The function of the extents overflow file is to store those file extents that are not contained in the master directory block (MDB) or volume header and the catalog file

Note that the file system B-tree files can have additional extents in the extents overflow file. This has been observed with the attributes file. It is currently unknown if the extents (overflow) file itself can have overflow extents.

The extents overflow key (record)

Disks initialized using the enhanced Disk Initialization Manager introduced in system software version might contain extent records for some blocks that do not belong to any actual file in the file system. These extent records have been marked as a bad block (CNID 5). See the chapter “Disk Initialization Manager” in this book for details on bad block sparing.

The key has been selected so that the extent records for a particular fork are grouped together in the B-tree, right next to all the extent records for the other fork of the file. The fork offset of the preceding extent record is needed to determine the key of the next extent record

In an extents overflow file the search key consists of:

fork type
file identifier
first block in the extent

HFS extents overflow key (record)

The HFS extents overflow key (record) is 8 bytes in size and consists of:

Offset	Size	Value	Description
0	1	7	Key data size, in bytes, which consists of a signed 8-bit integer
1	1		Fork type, which consists of a signed 8-bit integer
2	4		File identifier (CNID)
6	2		Logical block number

The first 8 extents in a fork are held in its catalog file record. So the number of extent records for a fork is:

(number_of_extents - 3 + 2) / 4

HFS+ and HFSX extents overflow key (record)

The HFS+ and HFSX extents overflow key (record) is 12 bytes in size and consists of:

Offset	Size	Value	Description
0	2	10	Key data size, in bytes, which consists of an unsigned 16-bit integer
2	1		Fork type, which consists of a signed 8-bit integer
3	1	0x00	Unknown (Padding)
4	4		File identifier (CNID)
8	4		Logical block number

The first 8 extents in a fork are held in its catalog file record. So the number of extent records for a fork is:

(number_of_extents - 8 + 7) / 8

HFS fork types

Value	Identifier	Description
-1 (0xff)		Resource fork
0 (0x00)		Data fork

The extent (data) record

An extent is a contiguous range of blocks that have been allocated to an individual file. An extent is represented by an extent descriptor.

HFS extents record

The HFS extents record (HFSExtentRecord) is 12 bytes in size and consists of:

Offset	Size	Value	Description
0	3 x 4 = 12		Array of HFS extent descriptors

HFS extent descriptor

The HFS extents descriptor (HFSExtentDescriptor) is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	2		Physical block number, which contains a block number relative from the start of the data area
2	2		Number of blocks

extent_offset = (data_area_block_number + extent_block_number) * block_size

An unused extent descriptor should have both the block number and number of blocks set to 0.

HFS+ and HFSX extents record

The HFS+ and HFSX extents record (HFSPlusExtentRecord) is 64 bytes in size and consists of:

Offset	Size	Value	Description
0	8 x 8 = 64		Array of HFS+ extent descriptors

HFS+ and HFSX extent descriptor

The HFS+ and HFSX extents descriptor (HFSPlusExtentDescriptor) is 8 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Physical block number, which contains a block number relative from the start of the volume
4	4		Number of blocks

extent_offset = extent_block_number * block_size

An unused extent descriptor should have both the block number and number of blocks set to 0.

Bad Block File

The extents overflow file is also used to hold information about the bad blocks; refered to as the bad block file. The bad block file is used to mark areas on the disk as bad, unable to be used for storing data; typically to map out bad sectors on the storage medium.

Typically, blocks are larger than sectors. If a single sector is found to be bad, the entire block is unusable. The bad block file is sometimes used to mark blocks as unusable when they are not bad, e.g. in the HFS wrapper.

Bad block extent records are always assumed to reference the data fork (fork type of 0).

Allocation (bitmap) file

The allocation file is uzed to keep track of whether each block in a volume is currently allocated to some file system structure or not. The contents of the allocation file is a bitmap. The bitmap contains one bit for each block in the volume.

If a bit is set, the corresponding block is currently in use by some file system structure.
If a bit is clear, the corresponding block is not currently in use, and is available for allocation.

The size of the allocation file depends on the number of blocks in the volume, which in turn depends both on the size of the disk and on the size of the volume’s blocks. For example, a volume on a 1 GB disk and having an block size of 4 KB needs an allocation file size of 256 Kbits (32 KiB, or 8 blocks). Since the allocation file itself is allocated using blocks, it always occupies an integral number of blocks (its size may be rounded up).

The allocation file may be larger than the minimum number of bits required for the given volume size. Any unused bits in the bitmap must be set to 0.

Each byte in the allocation file holds the state of eight blocks. The byte at offset X into the file contains the allocation state of allocations blocks (N x 8) through (N x 8 + 7). Within each byte, the most significant bit holds information about the block with the lowest number, the least significant bit holds information about the block with the highest number. Listing 1 shows how you would test whether an block is in use, assuming that you’ve read the entire allocation file into memory.

Determining whether a block is in use.

static Boolean IsAllocationBlockUsed(UInt32 thisAllocationBlock,
                                     UInt8 *allocationFileContents)
{
    UInt8 thisByte;

    thisByte = allocationFileContents[thisAllocationBlock / 8];
    return (thisByte & (1 << (7 - (thisAllocationBlock % 8)))) != 0;
}

Attributes file

The attributes file is a B-tree file used to store extended attributes.

The location of the attributes file can be found in the HFS+ and HFSX volume header.

Attributes file keys

An attributes file key is of variable size and consists of:

Offset	Size	Description
0	2	Key data size, in bytes
If key data size >= 12
2	2	Unknown
4	4	Identifier (CNID)
8	4	Unknown
12	2	Number of characters in the name string
14	...	Name string, which contains an UTF-16 big-endian string without end-of-string character

Note that the name of an extended attribute appears to be case senstive even on a case insensitive file system.

The attributes file data

The attributes file defines two types of attributes:

Fork data attributes, which are used for attributes whose data is large. The attribute’s data is stored in extents on the volume and the attribute merely contains a reference to those extents.
Extension attributes, which are used to augment fork descriptor structure, allowing a forks to have more than eight extents.

Attributes file data record header

Each attributes file data record starts with a type value, which describes the type of attribute data record.

The attributes file data record header is 4 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Record type

The attributes data record types

Value	Identifier	Description
0x00000010	kHFSPlusAttrInlineData	Attribute record with inline data
0x00000020	kHFSPlusAttrForkData	Attribute record with fork descriptor
0x00000030	kHFSPlusAttrExtents	Attribute record with extents overflow

Note that at the moment it is unclear when an attribute record of type kHFSPlusAttrExtents is created and how it should be handled.

The inline data attribute record

The inline data attribute record is of variable size and consists of:

Offset	Size	Value	Description
0	4	0x00000010	Record type
4	4	0	Unknown (Reserved)
8	4		Unknown
12	4		Attribute data size
16	...		Attribute data

The fork descriptor attribute record

The fork descriptor attribute record is 88 bytes in size and consists of:

Offset	Size	Value	Description
0	4	0x00000020	Record type
4	4	0	Unknown (Reserved)
8	80		Attribute fork descriptor

The extents attribute record

The extents attribute record is 72 bytes in size and consists of:

Offset	Size	Value	Description
0	4	0x00000030	Record type
4	4	0	Unknown (Reserved)
8	64		Attribute extents record

Startup file

The startup file is a file system metadata file intended to hold information needed when booting a system that does not have built-in (ROM) support for HFS+ (or HFSX). A boot loader can find the startup file without full knowledge of the format using the first eight extents of the startup file located in the volume header.

Format wise it is valid for the startup file to contain more than eight extents, but in doing so the purpose of the startup file is defeated.

Next allocation search

The next block number is used by Mac OS as a hint for where to start searching for available blocks when allocating space for a file.

Metadata zone and hot files

In Mac OS X 10.3 a metadata zone was instroduced to store certain file system metadata, such as allocation bitmap file, extents overflow file, and the catalog file, the journal file and frequently used small files (also referred to as “hot files”) near each other to reduces seek time for typical accesses.

Hot File B-tree

The hot file B-tree is a file named “.hotfiles.btree” stored the root directory.

Journal

A HFS+ (or HFSX) volume may have an optional journal to speed recovery when mounting a volume that was not unmounted safely. The purpose of the journal is to ensure that when a group of related changes are being made, that either all of those changes are actually made, or none of them are made. The journal makes it quick and easy to restore the volume structures to a consistent state, without having to scan all of the structures. The journal is used only for the volume structures and metadata; it does not protect the contents of a fork.

The volume header specifies if journalling is activated.

The journal data stuctures consist of:

a journal information block, contains the location and size of the journal header and journal buffer;
a journal header, describes which part of the journal buffer is active and contains transactions waiting to be committed;
a journal buffer, a cyclic buffer to hold the file system meta data transactions.

On HFS+ volumes, the journal information block is stored as a file. The name of that file is “.journal_info_block” and it is stored in the volume’s root directory.

The journal header and journal buffer are stored together in a different file named “.journal”, also in the volume’s root directory. Each of these files are contiguous on disk, they occupy exactly one extent.

The volume header contains the extent of the journal information block file. The journal information block contains the location of the journal file.

Journal information block

The journal information block describes where the journal header and journal buffer are stored. The journal information block is stored at the start of the block referred to by the volume header.

The journal information block is 44 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Journal flags
4	8 x 4 = 32		Device signature
36	8		Journal header offset
44	8		Journal size, in bytes, which includes the size of the journal header and the journal buffer, but not the journal information block
52	32 x 4 = 128	0x00	Unknown (Reserved)

Journal flags

The journal flags consist of the following values:

Value(s)	Description
0x00000001	On volume, where the journal header offset is relative to the start of the volume
0x00000002	On other device, where the device signature identifies the device containing the journal and the journal header offset is relative to the start of the device
0x00000004	Needs initialization, to indicate that there are no valid transactions in the journal and needs to be initialized

Note that according to TN1150 journals stored on a separate device are not supported.

The journal header

The journal header is 44 bytes in size and consists of:

Offset	Size	Value	Description
0	4	"\x4a\x4e\x4c\x78"	Signature
4	4	"\x12\x34\x56\x78"	Byte order (or endian) signature
8	8		First transaction start offset
16	8		Next transaction start offset
24	8		Journal size, in bytes, which includes the size of the journal header and buffer
32	4		Journal block header size, in bytes, typically ranges from 4096 to 16384
36	4		checksum
40	4		Journal header size, in bytes, typically the size of one sector

First and next transaction offset

The first transaction offset contains the offset in bytes from the start of the journal header to the start of the first (oldest) transaction.

The next transaction offset contains the offset in bytes from the start of the journal header to the end of the last (newest) transaction. Note that this field may be less than the start field, indicating that the transactions wrap around the end of the journal’s circular buffer. If end equals start, then the journal is empty, and there are no transactions that need to be replayed.

Journal transactions

A single transaction is stored in the journal as several blocks. These blocks include both the data to be written and the location where that data is to be written. This is represented on storage medium by a block list header, which describes the number and sizes of the blocks, immediately followed by the contents of those blocks.

Since block list headers are of limited size, a single transaction may consist of several block list headers and their associated block contents. If the next value in the first block information structure is non-zero, then the next block list header is a continuation of the same transaction.

The journal buffer is treated as a circular buffer. When reading or writing the journal buffer, the I/O operation must stop at the end of the journal buffer and resume (wrap around) immediately following the journal header. Block list headers or the contents of blocks may wrap around in this way. Only a portion of the journal buffer is active at any given time; this portion is indicated by the start and end fields of the journal header. The part of the journal buffer that is not active contains no meaningful data, and must be ignored.

To prevent ambiguity when start equals end, the journal is never allowed to be perfectly full (all of the journal buffer used by block lists and blocks). If the journal was perfectly full, and start was not equal to jhdr_size, then end would be equal to start. You would then be unable to differentiate between an empty and full journal.

When the journal is not empty (contains transactions), it must be replayed to be sure the volume is consistent. That is, the data from each of the transactions must be written to the correct blocks on disk.

The journal block list header

The block list header describes a list of blocks included in a transaction. A transaction may include several block lists if it modifies more blocks than can be represented in a single block list.

The journal block list header is 16 bytes in size and consists of:

Offset	Size	Value	Description
0	2		Maximum number of journal blocks
2	2		Number of journal blocks following the journal block header, typically 1
4	4		Block list size, in bytes, which includess the size of the header and blocks
8	4		Checksum
12	4	0x00	Unknown (Alignment padding)
16	...		Journal block information array

Note that the number of journal blocks includes the first journal block, The first journal block is reserved to be used when multiple blocks need to be chained, therefore the number of journal blocks actually containing data is minus one (-1).

Journal block information

The journal block information is 16 bytes in size and consists of:

Offset	Size	Description
0	8	Block sector number
8	4	Block size, in bytes
12	4	Next journal block

Journal checksum

The journal header and block list header both contain checksum values. The checksums are verified as part of a basic consistency check of these journal data structures. To verify the checksum, temporarily set the checksum field to 0 and then call the hfs_plus_calculate_checksum routine as specified below.

uint32_t hfs_plus_calculate_checksum(
          uint8_t *buffer,
          size_t buffer_size )
{
    size_t buffer_offset = 0;
    uint32_t checksum    = 0;

    for( buffer_offset = 0;
         buffer_offset < buffer_size;
         buffer_offset++)
    {
        checksum = ( checksum << 8 ) ^ ( checksum + buffer[ buffer_offset ] );
    }
    return( ~checksum );
}

Application specific data structures

HFS, HFS+ and HFSX contain application specific data structures.

Finder information

The finder information in the master directory block (MDB) and volume header consists of an array of 32-bit values. This array contains information used by the Mac OS Finder and the system software boot process.

Array entry	Description
0	Bootable system directory identifier (CNID), i.e. "System Folder" in Mac OS 8 or 9, or "/System/Library/CoreServices" in Mac OS X. Typically 3 or 5, is 0 if the volume is not bootable
1	Startup application parent identifier (CNID), i.e. "Finder". Is 0 if the volume is not bootable
2	Directory identifier (CNID) to display in Finder on mount, or 0 if none
3	Directory identifier (CNID) of a bootable Mac OS 8 or 9 System Folder, or 0 if none
4	Unknown (Reserved)
5	Directory identifier (CNID) of a bootable Mac OS X system, the "/System/Library/CoreServices" directory, or 0 if none
6 and 7	Mac OS X volume identifier, consist of a 64-bit integer

File information

HFS file information

The HFS file information is 16 bytes in size and consists of:

Offset	Size	Description
0	4 x 1 = 4	File type, which consists of an array of unsigned 8-bit integers
4	4 x 1 = 4	File creator, which consists of an array of unsigned 8-bit integers
8	2	Finder flags
10	4	Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically
14	2	File icon window, which contains the window in which the file's icon appears

HFS extended file information

The HFS extended file information is 16 bytes in size and consists of:

Offset	Size	Description
0	2	Finder icon identifier
2	3 x 2 = 6	Unknown (Reserved), which consists of an array of signed 16-bit integers
8	1	Extended finder script code flags
9	1	Extended finder flags
10	2	Finder comment identifier, which consists of a signed 16-bit integer
12	4	Put away folder identifier (CNID)

HFS+ and HFSX file information

The HFS+ and HFSX file information (FileInfo) is 16 bytes in size and consists of:

Offset	Size	Description
0	4 x 1 = 4	File type, which consists of an array of unsigned 8-bit integers
4	4 x 1 = 4	File creator, which consists of an array of unsigned 8-bit integers
8	2	Finder flags
10	4	Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically
14	2	Unknown (Reserved)

HFS+ and HFSX extended file information

The HFS+ and HFSX extended file information (ExtendedFileInfo) is 16 bytes in size and consists of:

Offset	Size	Description
0	4	Unknown (Reserved)
If kHFSHasDateAddedMask is not set
4	4	Unknown (Reserved)
If kHFSHasDateAddedMask is set
4	4	Added time, which contains a POSIX timestamp in UTC
Common
8	2	Extended finder flags
10	2	Unknown (Reserved), which consists of a signed 16-bit integer
12	4	Put away folder identifier (CNID)

Folder information

HFS folder information

The HFS folder information is 16 bytes in size and consists of:

Offset	Size	Description
0	8	Window position and dimension (boundaries), which contains the top, left, bottom, right-coordinate values
8	2	Finder flags
10	4	Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically
14	2	Folder view

HFS extended folder information

The HFS extended folder information is 16 bytes in size and consists of:

Offset	Size	Description
0	4	Scroll position for icon view, which contains x and y-coordinate values
If kHFSHasDateAddedMask is not set
4	4	Open folder identifier chain, which consists of a signed 32-bit integer
If kHFSHasDateAddedMask is set
4	4	Added time, which contains a POSIX timestamp in UTC
Common
8	1	Extended finder script code flags
9	1	Extended finder flags
10	2	Finder comment identifier, which consists of a signed 16-bit integer
12	4	Put away folder identifier (CNID)

HFS+ and HFSX folder information

The HFS+ and HFSX folder information is 16 bytes in size and consists of:

Offset	Size	Description
0	8	Window position and dimension (boundaries), which contains the top, left, bottom, right-coordinate values
8	2	Finder flags
10	4	Location within the parent, which contains x and y-coordinate values. If set to {0, 0}, the Finder will place the item automatically
14	2	Unknown (Reserved)

HFS+ and HFSX extended folder information

The HFS+ and HFSX extended folder information is 16 bytes in size and consists of:

Offset	Size	Description
0	4	Scroll position for icon view, which contains x and y-coordinate values
4	4	Unknown (Reserved), which consists of a signed 32-bit integer
8	2	Extended finder flags
10	2	Unknown (Reserved), which consists of a signed 16-bit integer
12	4	Put away folder identifier (CNID)

Finder flags

The finder flags consists of the following values:

Value(s)	Applies to	Description
0x0001	Files and folders	Is on desktop
0x000e	Files and folders	Color
0x0040	Files	Is shared
0x0080	Files	Has no INITs
0x0100	Files	Has been inited
0x0400	Files and folders	Has custom icon
0x0800	Files	Is stationary
0x1000	Files and folders	Name locked
0x2000	Files	Has bundle
0x4000	Files and folders	Is invisible
0x8000	Files	Is alias

Extended finder flags

The extended finder flags consists of the following values:

Value(s)	Description
0x0004	Has routing information
0x0100	Has custom badge resource
0x8000	Extended flags are invalid, which indicates that set the other extended flags should be ignored

Notes

struct Point {
  SInt16              v;
  SInt16              h;
};
typedef struct Point  Point;

struct Rect {
  SInt16              top;
  SInt16              left;
  SInt16              bottom;
  SInt16              right;
};
typedef struct Rect   Rect;

/* OSType is a 32-bit value made by packing four 1-byte characters
   together. */
typedef UInt32        FourCharCode;
typedef FourCharCode  OSType;

File content

HFS supports multiple ways to store file content:

Data fork
Compressed data extended attribute
Compressed data extended attribute with resource fork
Resource fork
Extended attribute (named fork)

Data fork

The file content size is stored in the data fork descriptor of the catalog file record.

The extents of the file content are stored in the fork descriptor and extents overflow file.

Compressed data extended attribute

The file has an attribute record with inline data named “com.apple.decmpfs” with compression method 3, 5 or 7.

The file content size is stored in the compressed data header of the extended attribute.

For compression method 3 or 7 the file content data is stored in the extended attribute after the decmpfs compressed data header.

For compression method 5 the file content data contains 0-byte values. There are 12 bytes stored after the decmpfs compressed data header that consists of:

Offset	Size	Description
0	4	Unknown (Seen: 1)
4	4	Unknown
8	4	Unknown (Seen: 0)

Compressed data extended attribute with resource fork

The file has an attribute record with inline data named “com.apple.decmpfs” with compression method 4 or 8.

The file content size is stored in the compressed data header of the extended attribute.

The file content data is stored in a “com.apple.ResourceFork” extended attribute.

The compressed data starts with metadata that contains the offsets of the compressed data blocks.

ZLIB (DEFLATE) compressed data

ZLIB (DEFLATE) compressed header
Unknown (empty values)
ZLIB (DEFLATE) compressed data block offsets and sizes
ZLIB (DEFLATE) compressed data blocks
ZLIB (DEFLATE) compressed footer

ZLIB (DEFLATE) compressed header

The ZLIB (DEFLATE) compressed header is 16 bytes in size and consists of:

Offset	Size	Description
0	4	Compressed data block descriptors offset, where the offset is relative from the start of the ZLIB (DEFLATE) compressed data
4	4	Compressed footer offset, where the offset is relative from the start of the ZLIB (DEFLATE) compressed data
8	4	Compressed data block descriptors and data size
12	4	Compressed footer size

Note that the values in the ZLIB (DEFLATE) compressed header are stored in big-endian.

ZLIB (DEFLATE) compressed data block descriptors

The ZLIB (DEFLATE) compressed data block descriptors are of variable size and consist of:

Offset	Size	Description
0	4	Compressed data size
4	4	Number of compressed data block offset and size tuples
8	8 x ...	Array of compressed data block descriptors

ZLIB (DEFLATE) compressed data block descriptor

The ZLIB (DEFLATE) compressed data block descriptor is 8 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Compressed block offset, where the offset is relative from the start of the ZLIB (DEFLATE) compressed data + 20
4	4		Compressed block size

The ZLIB (DEFLATE) compressed footer is 50 bytes size and consists of:

Offset	Size	Value	Description
0	24		Unknown (empty values)
24	2		Unknown
26	2		Unknown
28	2		Unknown
30	2		Unknown
32	4	"cmpf"	Unknown (signature)
36	4		Unknown
40	4		Unknown
44	6		Unknown (empty values)

Note that the values in the ZLIB (DEFLATE) compressed footer are stored in big-endian.

LZVN compressed data

Offset	Size	Value	Description
0	4 x ...		Array of compressed data block offsets, where an offset is relative from the start of the LZVN compressed data
...	...		LZVN compressed data blocks

Note that the compressed data block contains a maximum of 65536 bytes of data. The compressed data block therefore should not exceed 65537 bytes in size.

Resource fork

The file content size is stored in the resource fork descriptor of the catalog file record.

The extents of the file content are stored in the fork descriptor and extents overflow file.

Extended attribute (named fork)

Extended attributes, also referred to as named forks, are stored in the HFS+ attributes file.

HFS wrapper

TODO: complete section

A HFSX volume cannot be wrapped in a HFS volume.

References

hfs_format.h
Data Organization on Volumes, by Apple Inc.
Technical Note TN1150: HFS plus volume format, by Apple Inc.

Macintosh File System (MFS)

The Macintosh File System (MFS) is the first file system created for Mac OS, intended for 400 KiB floppy disks.

Overview

A MFS file system consists of:

optional boot block
master directory block (MDB)
file directory area
data area
optional backup (or alternate) master directory block (MDB)

The backup master directory block (MDB), is stored in the last 2 sectors of the volume.

Characteristics

Characteristics	Description
Byte order	big-endian
Date and time values	TODO
Character strings	Narrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a system defined codepage

Terminology

Term	Description
Clump size	Size of the group of (allocation) blocks (or clump), in bytes, to avoid fragmentation

Boot Block

If a volume is bootable, the first 2 blocks of the volume contain boot block. The boot block consists of:

boot block header
boot code
unknown (filler)

Boot Block Header

The boot block header is 138 or 144 bytes in size and consists of:

Offset	Size	Value	Description
0	2	"LK" (or "\x4c\x4b")	Boot block signature
2	4		Boot code entry point
6	1		Flags
7	1		Format version
8	2		Page flags (or Secondary Sound and Video Pages)
10	1		System file name size, with a maximum of 15
11	15		System file name
26	1		Finder (or shell) file name size, with a maximum of 15
27	15		Finder (or shell) file name, typically "Finder"
42	1		Debugger file name size, with a maximum of 15
43	15		Debugger file name, typically "Macsbug"
58	1		Disassembler (or second debugger) file name size, with a maximum of 15
59	15		Disassembler (or second debugger) file name, typically "Disassembler"
74	1		Startup screen file name size, with a maximum of 15
75	15		Startup screen file name, typically "StartUpScreen"
90	1		Startup (or bootup) file name size, with a maximum of 15
91	15		Startup (or bootup) file name, typically "Finder"
106	1		Clipboard (or scrap) file name size, with a maximum of 15
107	15		Clipboard (or scrap) file name, typically "Clipboard"
122	2		Number of allocated file control blocks (FCBs)
124	2		Number of elements in the event queue, typically 20
126	4		System heap size on Macintosh computer with 128 KiB of RAM
130	4		System heap size on Macintosh computer with 256 KiB of RAM
134	4		System heap size on Macintosh computer with +512 KiB of RAM
Newer boot block header format
138	4		Additional system heap space
140	4		Fraction of available RAM for the system heap

Note that “LK” presumably is short for “Larry Kenyon” who originally designed MFS.

Boot code entry point

The boot code entry point contains machine-language instructions that translate to:

BRA.S *+ 0x90

Or for older versions of the boot block header:

BRA.S *+ 0x88

BRA.W *+ 0x88

BRA     $88(PC)         * $6000,$0086

This instruction jumps to the main boot code following the boot block header.

This field is ignored, however, if bit 6 is clear in the high-order byte of the boot block version number or if the low-order byte contains 0x0d.

Boot Block Header Flags

Bit(s)	Description
0 - 4	Unknown (Reserved), should contain 0
5	Use relative system heap sizing
6	Execute boot code
7	Newer boot block header format is used

If bit 7 of the flag byte is clear, then bits 5 and 6 are ignored and the version number is set in the format version value.

If the format version value is:

less than 21, the values in the system heap size on 128K Mac and 256K Mac should be ignored and the value in system heap size on all machines should be used.
13 the boot code should be executed using the value in boot code entry point.
greater than or equal to 21 the value in system heap size on all machines should be used.

If bit 7 of the flag byte is set

bit 6 should be used to determine whether to execute the boot code using the value in boot code entry point.
bit 5 should be used to determine whether to use relative System heap sizing. If bit 5 is
- clear the value in system heap size on all machines should be used.
- is set the System heap is extended by the value in the additional system heap space plus the fraction of available RAM for the system heap.

Master Directory Block (MDB)

The Master Directory Block (MDB) is located at offset 1024 of the volume and consists of:

master directory block header
block map

Master Directory Block (MDB) header

The Master Directory Block (MDB) header is 64 bytes in size and consists of:

Offset	Size	Value	Description
0	2	"\xd2\xd7"	Volume signature
2	4		Creation date and time, which contains a HFS timestamp in local time
6	4		Last modification date and time, which contains a HFS timestamp in local time
10	2		Volume attribute flags
12	2		Number of files in the root directory
14	2		File directory area sector number, contains a sector number relative from the start of the volume, where 0 is the first sector number
16	2		File directory area size, in number of sectors
18	2		Number of blocks
20	4		Block size, in bytes, must be a multitude of 512
24	4		Clump size, in bytes
28	2		Data area sector number, contains a sector number relative from the start of the volume, where 0 is the first sector number
30	4		Next available file identifier
34	2		Number of unused blocks
36	1		Volume label size, with a maximum of 27
37	27		Volume label

Block map

TODO: describe similar to FAT-12 block allocation table

File Directory Area

The file directory area consists of:

one or more file directory entries, where an individual file directory entry does not span multiple blocks

File Directory Entry

A file directory entry is of variable size and consists of:

Offset	Size	Value	Description
0	1		Flags, where 0x80 indicates the file directory entry is in use
1	1	0	Format version
2	4	"\x3f\x3f\x3f\x3f"	File type
6	4		File creator
10	2		Finder flags
12	4		Window position and dimension (boundaries), which contains the top, left, bottom, right-coordinate values
16	2		Folder file identifier, where 0 represents the main volume, -2 the desktop, -3 the trash, otherwise, if positive, a file identifier
18	4		File identifier
22	2		Data fork block number, contains 0 if the file entry has no data fork
24	4		Data fork size, in bytes
28	4		Data fork allocated size, in bytes
32	2		Resource fork block number, contains 0 if the file entry has no resource fork
34	4		Resource fork size, in bytes
38	4		Resource fork allocated size, in bytes
42	4		Creation date and time, which contains a HFS timestamp in local time
46	4		(Content) modification date and time, which contains a HFS timestamp in local time
50	1		File name size, with a maximum of 255
51	...		File name
...	...		16-bit alignment padding

New Technologies File System (NTFS) format

The New Technologies File System (NTFS) format is the primary file system for Microsoft Windows versions that are based on Windows NT.

Overview

An New Technologies File System (NTFS) consists of:

boot record
boot loader
Master File Table (MFT)
Mirror Master File Table (MFT)

Characteristics

Characteristics	Description
Byte order	little-endian
Date and time values	FILETIME in UTC
Character strings	UCS-2 little-endian, which allows for unpaired Unicode surrogates such as "U+d800" and "U+dc00"

Versions

Format version	Remarks
1.0	Introduced in Windows NT 3.1
1.1	Introduced in Windows NT 3.5, also seen to be used by Windows NT 3.1
1.2	Introduced in Windows NT 3.51
3.0	Introduced in Windows 2000
3.1	Introduced in Windows XP

Note that the format versions mentioned above are the version as used by NTFS. Another common versioning schema uses the Windows version, e.g. NTFS 5.0 is the version of NTFS used on Windows XP which is version 3.1 in schema mentioned above.

Windows does not necessarily uses the latest format version, e.g. Windows 10 (1809) has been observed to use NTFS version 1.2 for 64k cluster block size.

Terminology

Cluster

NTFS refers to it file system blocks as clusters. Note that these are not the same as the physical clusters of a harddisk. For clarity this document will refer to these as cluster blocks. In other sources they are also referred to as logical clusters.

Typically a cluster block is 8 sectors (or 8 x 512 = 4096 bytes) in size. A cluster block number is relative to the start of the boot record.

Virtual cluster

The term virtual cluster refers to cluster blocks which are relative to the start of a data stream.

Long and short (file) name

In Windows terminology the name of a file (or directory) can either be short or long. The short name is an equivalent of the file name in the (DOS) 8.3 format. The long name is actual the (full) name of the file. The term long refers to the aspect that the name is longer than the short variant. Because most documentation refer to the (full) name as the long name, for clarity sake so will this document.

Metadata files

NTFS uses the Master File Table (MFT) to store information about files and directories. The MFT entries reference the different volume and file system metadata. There are several predefined metadata files.

The following metadata files are predefined and use a fixed MFT entry number.

MFT entry number	File name	Description
0	"$MFT"	Master File Table
1	"$MFTMirr"	Back up of the first 4 entries of the Master File Table
2	"$LogFile"	Metadata transaction journal
3	"$Volume"	Volume information
4	"$AttrDef"	MFT entry attribute definitions
5	"."	Root directory
6	"$Bitmap"	Cluster block allocation bitmap
7	"$Boot"	Boot record (or boot code)
8	"$BadClus"	Bad clusters
Used in NTFS version 1.2 and earlier
9	"$Quota"	Quota information
Used in NTFS version 3.0 and later
9	"$Secure"	Security and access control information
Common
10	"$UpCase"	Case folding mappings
11	"$Extend"	A directory containing extended metadata files
12-15		Unknown (Reserved), which are marked as in-use but are empty
16-23		Unused, which are marked as unused
Used in NTFS version 3.0 and later
24	"$Extend$Quota"	Quota information
25	"$Extend$ObjId"	Unique file identifiers for distributed link tracking
26	"$Extend$Reparse"	Backreferences to reparse points
Transactional NTFS metadata files, which have been observed in Windows Vista and later
27	"$Extend$RmMetadata"	Resource manager metadata directory
28	"$Extend$RmMetadata$Repair"	Repair information
29 or 30	"$Extend$RmMetadata$TxfLog"	Transactional NTFS (TxF) log metadata directory
30 or 31	"$Extend$RmMetadata$Txf"	Transactional NTFS (TxF) metadata directory
31 or 32	"$Extend$RmMetadata$TxfLog$Tops"	TxF Old Page Stream (TOPS) file, which is used to store data that has been overwritten inside a currently active transaction
32 or 33	"$Extend$RmMetadata$TxfLog$TxfLog.blf"	Transactional NTFS (TxF) base log metadata file
Observed in Windows 10 and later
29	"$Extend$Deleted"	Temporary location for files that have an open handle but a request has been made to delete them
Common
	...	A file or directory

The following metadata files are predefined, however the MFT entry number is commonly used but not fixed.

MFT entry number	File name	Description
	"$Extend$UsnJrnl"	USN change journal

The boot record

The boot record is stored at the start of the volume (in the $Boot metadata file) and contains:

the file system signature
the BIOS parameter block
the boot loader

Offset	Size	Value	Description
0	3		Boot entry point
3	8	"NTFS\x20\x20\x20\x20"	File system signature (Also known as OEM identifier or dummy identifier)
DOS version 2.0 BIOS parameter block (BPB)
11	2		Bytes per sector. Note that the following values are supported by mkntfs: 256, 512, 1024, 2048 and 4096
13	1		Number of sectors per cluster block
14	2	0	Unknown (Reserved Sectors), which is not used by NTFS and must be 0
16	1	0	Number of cluster block allocation tables, which is not used by NTFS and must be 0
17	2	0	Number of root directory entries, which is not not used by NTFS and must be 0
19	2	0	Number of sectors (16-bit), which is not used by NTFS must be 0
21	1		Media descriptor
22	2	0	Cluster block allocation table size (16-bit) in number of sectors, which is not used by NTFS and must be 0
DOS version 3.4 BIOS parameter block (BPB)
24	2	0x3f	Sectors per track, which is not used by NTFS
26	2	0xff	Number of heads, which is not used by NTFS
28	4	0x3f	Number of hidden sectors, which is not used by NTFS
32	4	0x00	Number of sectors (32-bit), which is not used by NTFS must be 0
NTFS version 8.0 BIOS parameter block (BPB) or extended BPB, which was introduced in Windows NT 3.1
36	1	0x80	Unknown (Disc unit number), which is not used by NTFS
37	1	0x00	Unknown (Flags), which is not used by NTFS
38	1	0x80	Unknown (BPB version signature byte), which is not used by NTFS
39	1	0x00	Unknown (Reserved), which is not used by NTFS
40	8		Number of sectors (64-bit)
48	8		Master File Table (MFT) cluster block number
56	8		Mirror MFT cluster block number
64	4		MFT entry size
68	4		Index entry size
72	8		Volume serial number
80	4	0	Checksum, which is not used by NTFS
Common
84	426		Boot code
510	2	"\x55\xaa"	The (boot) signature

Boot entry point

The boot entry point often contains a jump instruction to the boot code at offset 84 followed by a no-operation, e.g.

eb52   jmp 0x52
90     nop

Number of sectors per cluster block

The number of sectors per cluster block value as used by mkntfs is defined as following:

Values 0 to 128 represent sizes of 0 to 128 sectors.
Values 244 to 255 represent sizes of 2^(256-n) sectors.
Other values are unknown.

Cluster block size

The cluster block size can be determined as following:

cluster block size = bytes per sector x sectors per cluster block

Different NTFS implementations support different cluster block sizes. Known supported cluster block size:

Cluster block size	Bytes per sector	Supported by
256	256	mkntfs
512	256 - 512	mkntfs, ntfs3g, Windows
1024	256 - 1024	mkntfs, ntfs3g, Windows
2048	256 - 2048	mkntfs, ntfs3g, Windows
4096	256 - 4096	mkntfs, ntfs3g, Windows
8192	256 - 4096	mkntfs, ntfs3g, Windows
16K (16384)	256 - 4096	mkntfs, ntfs3g, Windows
32K (32768)	256 - 4096	mkntfs, ntfs3g, Windows
64K (65536)	256 - 4096	mkntfs, ntfs3g, Windows
128K (131072)	256 - 4096	mkntfs, ntfs3g, Windows 10 (1903)
256K (262144)	256 - 4096	mkntfs, ntfs3g, Windows 10 (1903)
512K (524288)	256 - 4096	mkntfs, ntfs3g, Windows 10 (1903)
1M (1048576)	256 - 4096	mkntfs, ntfs3g, Windows 10 (1903)
2M (2097152)	512 - 4096	mkntfs, ntfs3g, Windows 10 (1903)

Note that Windows 10 (1903) requires the partition containing the NTFS file system to be aligned with the cluster block size. For example for a cluster block size of 128k the partition must 128 KiB aligned. The default partition partition alignment appears to be 64 KiB.

mkntfs restricts the cluster size to:

bytes_per_sector >= cluster_block_size > 4096 * bytes_per_sector

Master File Table (MFT) offset

The Master File Table (MFT) offset can be determined as following:

mft_offset = boot_record_offset + (mft_cluster_block_number * cluster_block_size)

The lower 32-bit part of the NTFS volume serial number is the Windows API (WINAPI) volume serial number. This can be determined by comparing the output of:

fsutil fsinfo volumeinfo C:
fsutil fsinfo ntfsinfo C:

Often the total number of sectors in the boot record will be smaller than the underlying partition. A (nearly identical) backup of the boot record is stored in last sector of cluster block, that follows the last cluster block of the volume. Often this is the 512 bytes after the last sector of the volume, but not necessarily. The backup boot record is not included in the total number of sectors.

Master File Table (MFT) and index entry size

The Master File Table (MFT) entry size and index entry size are defined as following:

Values 0 to 127 represent sizes of 0 to 127 cluster blocks.
Values 128 to 255 represent sizes of 2^(256-n) bytes or 2^(-n) if considered as a signed byte.
Other values are not considered valid.

BitLocker Drive Encryption (BDE)

BitLocker Drive Encryption (BDE) uses the file system signature: “-FVE-FS-”. Where FVE is an abbreviation of Full Volume Encryption.

The data structures of BDE on Windows Vista and 7 differ.

A Windows Vista BDE volume starts with:

eb 52 90 2d 46 56 45 26 46 53 2d

A Windows 7 BDE volume starts with:

eb 58 90 2d 46 56 45 26 46 53 2d

BDE is largely a stand-alone but has some integration with NTFS.

TODO: link to BDE format documentation

Volume Shadow Snapshots (VSS)

Volume Shadow Snapshots (VSS) uses the GUID 3808876b-c176-4e48-b7ae-04046e6cc752 (stored in little-endian) to identify its data.

VSS is largely a stand-alone but has some integration with NTFS.

TODO: link to VSS format documentation

Media descriptor

Offset	Size	Description
0.0	1 bit	Sides, where single-sided (0) and double-sided (1)
0.1	1 bit	Track size, where 9 sectors per track (0) and 8 sectors per track (1)
0.2	1 bit	Density, where 80 tracks (0) and 40 tracks (1)
0.3	1 bit	Type, where Fixed disc (0) and Removable disc (1)
0.4	4 bits	Always set to 1

The boot loader

Offset	Size	Value	Description
512			Windows NT (boot) loader (NTLDR/BOOTMGR)

The Master File Table (MFT)

The MFT consist of an array of MFT entries. The offset of the MFT table can be found in the volume header and the size of the MFT is defined by the MFT entry of the $MFT metadata file.

Note that the MFT can consists of multiple data ranges, defined by the data runs in the $MFT metadata file.

MFT entry

Although the size of a MFT entry is defined in the volume header is commonly 1024 bytes in size and consists of:

The MFT entry header
The fix-up values
An array of MFT attribute values
Padding, which should contain 0-byte values

Note that the MFT entry can be filled entirely with 0-byte values. Seen in Windows XP for MFT entry numbers 16 - 23.

MFT entry header

The MFT entry header (FILE_RECORD_SEGMENT_HEADER) is 42 or 48 bytes in size and consists of:

Offset	Size	Value	Description
MULTI_SECTOR_HEADER
0	4	"BAAD", "FILE"	Signature
4	2		The fix-up values (or update sequence array) offset, which contain an offset relative from the start of the MFT entry
6	2		The number of fix-up values (or update sequence array size)
Common
8	8		Metadata transaction journal sequence number, which contains a $LogFile Sequence Number (LSN)
16	2		Sequence (number)
18	2		Reference (link) count
20	2		Attributes offset (or first attribute offset), which contains an offset relative from the start of the MFT entry
22	2		MFT entry flags
24	4		Used size in bytes
28	4		MFT entry size in bytes
32	8		Base record file reference
40	2		First available attribute identifier
If NTFS version is 3.0
42	2		Unknown (wfixupPattern)
44	4		Unknown
If NTFS version is 3.1
42	2		Unknown (wfixupPattern)
44	4		MFT entry number

“BAAD” signature

According to NTFS documentation if during chkdsk, when a multi-sector item is found where the multi-sector header does not match the values at the end of the sector, it marks the item as “BAAD” and fill it with 0-byte values except for a fix-up value at the end of the first sector of the item. The “BAAD” signature has been seen to be used on Windows NT4 and XP.

Sequence number

According to FILE_RECORD_SEGMENT_HEADER structure the sequence number is incremented each time that a file record segment is freed; it is 0 if the segment is not used.

Base record file reference

The base record file reference is used to store additional attributes for another MFT entry, e.g. for attribute lists.

MFT entry flags

Value	Identifier	Description
0x0001	FILE_RECORD_SEGMENT_IN_USE, MFT_RECORD_IN_USE	In use
0x0002	FILE_FILE_NAME_INDEX_PRESENT, FILE_NAME_INDEX_PRESENT, MFT_RECORD_IS_DIRECTORY	Has file name (or $I30) index. When this flag is set the file entry represents a directory
0x0004	MFT_RECORD_IN_EXTEND	Unknown. According to ntfs_layout.h this is set for all system files present in the $Extend directory
0x0008	MFT_RECORD_IS_VIEW_INDEX	Is index. When this flag is set the file entry represents an index. According to ntfs_layout.h this is set for all indices other than $I30

The fix-up values

The fix-up values are of variable size and consists of:

Offset	Size	Value	Description
0	2		Fix-up placeholder value
2	2 x number of fix-up values		Fix-up (original) value array

On disk the last 2 bytes for each 512 byte block is replaced by the fix-up placeholder value. The original value is stored in the corresponding fix-up (original) value array entry.

Note that there can be more fix-up values than the number of 512 byte blocks in the data.

According to MULTI_SECTOR_HEADER structure the update sequence array must end before the last USHORT value in the first sector. It also states that the update sequence array size value contains the number of bytes, but based on analysis of data samples it seems to be more likely to the number of words.

In NT4 (version 1.2) the MFT entry is 42 bytes in size and the fix-up values are stored at offset 42. This is likely where the name wfixupPattern originates from.

TODO: provide examples on applying the fix-up values.

The file reference

The file reference (FILE_REFERENCE or MFT_SEGMENT_REFERENCE) is 8 bytes in size and consists of:

Offset	Size	Value	Description
0	6		MFT entry number
6	2		Sequence number

Note that the index value in the MFT entry is 32-bit in size.

MFT attribute

The MFT attribute consist of:

the attribute header
the attribute resident or non-resident data
the attribute name
Unknown data, likely alignment padding (4-byte alignment)
resident attribute data or non-resident attribute data runs
alignment padding (8-byte alignment), can contain remnant data

MFT attribute header

The MFT attribute header (ATTRIBUTE_RECORD_HEADER) is 16 bytes in size and consists of:

Offset	Size	Description
0	4	Attribute type (or type code)
4	4	Attribute size (or record length), which includes the 8 bytes of the attribute type and size
8	1	Non-resident flag (or form code), where RESIDENT_FORM (0) and NONRESIDENT_FORM (1)
9	1	Name size (or name length), which contains the number of characters without the end-of-string character
10	2	Name offset, which contains an offset relative from the start of the MFT attribute
12	2	Attribute data flags
14	2	Attribute identifier (or instance), which contains an unique identifier to distinguish between attributes that contain segmented data

MFT attribute data flags

Value	Identifier	Description
0x0001		Is LZNT1 compressed

0x00ff	ATTRIBUTE_FLAG_COMPRESSION_MASK

0x4000	ATTRIBUTE_FLAG_ENCRYPTED	Is encrypted
0x8000	ATTRIBUTE_FLAG_SPARSE	Is sparse

TODO: determine the meaning of compression flag in the context of resident $INDEX_ROOT. Do the data flags have a different meaning for different attributes?

Resident MFT attribute

The resident MFT attribute data is present when the non-resident flag is not set (0). The resident data is 8 bytes in size and consists of:

Offset	Size	Value	Description
0	4		Data size (or value length)
4	2		Data offset (or value size), which contains an offset relative from the start of the MFT attribute
6	1		Indexed flag
7	1	0x00	Unknown (Padding)

TODO: determine the meaning of indexed flag bits, other than the LSB

Non-resident MFT attribute

The non-resident MFT attribute data is present when the non-resident flag is set (1). The non-resident data is 48 or 56 bytes in size and consists of:

Offset	Size	Description
0	8	First (or lowest) Virtual Cluster Number (VCN) of the data
8	8	Last (or highest) Virtual Cluster Number (VCN) of the data
16	2	Data runs offset (or mappings pairs offset), which contains an offset relative from the start of the MFT attribute
18	2	Compression unit size, which contains the compression unit size as `2^(n)` number of cluster blocks
20	4	Unknown (Padding)
24	8	Allocated data size (or allocated length), which contains the allocated data size in number of bytes. This value is not valid if the first VCN is nonzero
32	8	Data size (or file size), which contains the data size in number of bytes. This value is not valid if the first VCN is nonzero
40	8	Valid data size (or valid data length), which contains the valid data size in number of bytes. This value is not valid if the first VCN is nonzero
If compression unit size > 0
48	8	Compressed data size

The total size of the data runs should be larger or equal to the data size.

Note that Windows will fill data beyond the valid data size with 0-byte values. The data size remains unchanged. This applies to compressed and uncompressed data. If the first VCN is zero a valid data size of 0 represents a file entirely filled with 0-byte values.

TODO: determine the meaning of a VCN of -1

For more information about compressed MFT attributes see compression.

Attribute name

The attribute name is of variable size and consists of:

Offset	Size	Value	Description
0	...		Name, which contains an UCS-2 little-endian string without end-of-string character

Data runs

The data runs are stored in a variable size (data) runlist. This runlist consists of runlist elements.

A runlist element is of variable size and consists of:

Offset	Size	Description
0.0	4 bits	Number of cluster blocks value size, which contains the number of bytes used to store the data run size
0.4	4 bits	Cluster block number value size, which contains the number of bytes used to store the data run size
1	Size value size	Data run number of cluster blocks, which contains the number of cluster blocks
...	Cluster block number value size	Data run cluster block number

The data run cluster block number is a singed value, where the MSB is the singed bit, e.g. if the data run cluster block contains “dbc8” it corresponds to the 64-bit value 0xffffffffffffdbc8.

The first data run offset contains the absolute cluster block number where successive data run offsets are relative to the last data run offset.

Note that the cluster block number byte size is the first nibble when reading the byte stream, but here it is represented as the upper nibble of the first byte.

The last runlist element is (0, 0), which is stored as a 0-byte value.

According to NTFS documentation the size of the runlist is rounded up to the next multitude of 4 bytes, but based on analysis of data samples it seems that the size of the trailing data can be even larger than 3 and are not always 0-byte values.

TODO: provide examples of data runs

Sparse data runs

The MFT attribute data flag (ATTRIBUTE_FLAG_SPARSE) indicates if the data stream is sparse or not, where the runlist can contain both sparse and non-sparse data runs.

A sparse data run has a cluster block number value size of 0, representing there is no offset (cluster block number). A sparse data run is filled with 0-byte values.

Compressed data streams also define sparse data runs without setting the ATTRIBUTE_FLAG_SPARSE flag.

Note that $BadClus:$Bad also defines a data run with a cluster block number value size of 0, without setting the ATTRIBUTE_FLAG_SPARSE flag.

Compresssed data runs

The MFT attribute data flags (0x00ff) indicate if the data stream is compressed or not.

Windows supports compressed data runs for NTFS file systems with a cluster block size of 4096 bytes or less.

Windows 10 supports Windows Overlay Filter (WOF) compressed data, which stores the LZXPRESS Huffman or LZX compressed data in alternate data stream named WofCompressedData and links it to the default data stream using a reparse point.

The data is stored in compression unit blocks. A compression unit typically consists of 16 cluster blocks. However the actual value is stored in the non-resident MFT attribute.

Also see compression.

The attributes

Known attribute types

The attribute types are stored in the $AttrDef metadata file.

Value	Identifier	Description
0x00000000		Unused
0x00000010	$STANDARD_INFORMATION	Standard information
0x00000020	$ATTRIBUTE_LIST	Attributes list
0x00000030	$FILE_NAME	The file or directory name
Used in NTFS version 1.2 and earlier
0x00000040	$VOLUME_VERSION	Volume version
Used in NTFS version 3.0 and later
0x00000040	$OBJECT_ID	Object identifier
Common
0x00000050	$SECURITY_DESCRIPTOR	Security descriptor
0x00000060	$VOLUME_NAME	Volume label
0x00000070	$VOLUME_INFORMATION	Volume information
0x00000080	$DATA	Data stream
0x00000090	$INDEX_ROOT	Index root
0x000000a0	$INDEX_ALLOCATION	Index allocation
0x000000b0	$BITMAP	Bitmap
Used in NTFS version 1.2 and earlier
0x000000c0	$SYMBOLIC_LINK	Symbolic link
Used in NTFS version 3.0 and later
0x000000c0	$REPARSE_POINT	Reparse point
Common
0x000000d0	$EA_INFORMATION	(HPFS) extended attribute information
0x000000e0	$EA	(HPFS) extended attribute
Used in NTFS version 1.2 and earlier
0x000000f0	$PROPERTY_SET	Property set
Used in NTFS version 3.0 and later
0x00000100	$LOGGED_UTILITY_STREAM	Logged utility stream
Common

0x00001000		First user defined attribute

0xffffffff		End of attributes marker

Attribute chains

Multiple attributes can be chained to make up a single attribute data stream, e.g. the attributes:

$INDEX_ALLOCATION ($I30) VCN: 0
$INDEX_ALLOCATION ($I30) VCN: 596

The first attribute will contain the size of the data defined by all the attributes and successive attributes should have a size of 0.

It is assumed that the attributes in a chain must be continuous and defined in-order.

The standard information attribute

The standard information attribute ($STANDARD_INFORMATION) contains the basic file entry metadata. It is stored as a resident MFT attribute.

The standard information data (STANDARD_INFORMATION) is either 48 or 72 bytes in size and consists of:

Offset	Size	Description
0	8	Creation date and time, which contains a FILETIME
8	8	Last modification (or last written) dat and time, which contains a FILETIME
16	8	MFT entry last modification date and time, which contains a FILETIME
24	8	Last access date and time, which contains a FILETIME
32	4	File attribute flags
36	4	Unknown (Maximum number of versions)
40	4	Unknown (Version number)
44	4	Unknown (Class identifier)
If NTFS version 3.0 or later
48	4	Owner identifier
52	4	Security descriptor identifier, which contains the entry number in the security ID index ($Secure:$SII). Also see Access Control
56	8	Quota charged
64	8	Update Sequence Number (USN)

Note that MFT entries have been observed without a $STANDARD_INFORMATION attribute, but with other attributes such as $FILE_NAME and an $I30 index.

Recent version of NTFS support case-sentive file names. If a directory is case-sensitive the corresponding $STANDARD_INFORMATION attribute will have a maximum number of versions of 0 and a version number of 1.

The attribute list attribute

The attribute list attribute ($ATTRIBUTE_LIST) is used to store MFT attributes outside the MFT entry, e.g. when the MFT entry is too small to store all the attributes.

The entries in the list reference the location of MFT attributes. The attribute list attribute can be stored as either a resident (for a small amount of data) or non-resident MFT attribute.

Note that MFT entry 0 also can contain an attribute list and allows to store listed attributes beyond the first data run.

The attribute list

An attribute list consists of:

one or more attribute list entries

The attribute list entry

An attribute list entry (ATTRIBUTE_LIST_ENTRY) is of variable size and consists of:

Offset	Size	Description
0	4	Attribute type (or type code)
4	2	Size (or record length), which includes the 6 bytes of the attribute type and size
6	1	Name size (or name length), which contains the number of characters without the end-of-string character
7	1	Name offset, which contains an offset relative from the start of the attribute list entry
8	8	Data first (or lowest) VCN
16	8	File reference (or segment reference), which contains a reference to the MFT entry that contains (part of) the attribute data
24	2	Attribute identifier (or instance), which contains an unique identifier to distinguish between attributes that contain segmented data
26	...	Name, which contains an UCS-2 little-endian string without end-of-string character
...	...	alignment padding (8-byte alignment), can contain remnant data

The file name attribute

The file name attribute ($FILE_NAME) contains the basic file system information, like the parent file entry, various date and time values and name. It is stored as a resident MFT attribute.

The file name data (FILE_NAME) is of variable size and consists of:

Offset	Size	Description
0	8	Parent file reference
8	8	Creation date and time, which contains a FILETIME
16	8	Last modification (or last written) date and time, which contains a FILETIME
24	8	MFT entry last modification date and time, which contains a FILETIME
32	8	Last access date and time, which contains a FILETIME
40	8	Allocated (or reserved) file size
48	8	Data size
56	4	File attribute flags
If FILE_ATTRIBUTE_REPARSE_POINT is set
60	4	Reparse point tag
If FILE_ATTRIBUTE_REPARSE_POINT is not set
60	4	Unknown (extended attribute data size)
Common
64	1	Name string size, which contains the number of characters without the end-of-string character
65	1	Namespace of the name string
66	...	Name, which contains an UCS-2 little-endian string without end-of-string character

An MFT attribute can contain multiple file name attributes, e.g. for a separate (long) name and short name.

In several cases on a Vista NTFS volume the MFT entry contained both a DOS & Windows and POSIX name space $FILE_NAME attribute. However the directory entry index ($I30) of the parent directory only contained the DOS & Windows name.

In case of a hard link the MFT entry will contain additional file name attributes with the parent file reference of each hard link.

Namespace

Value	Identifier	Description
0	POSIX	Case-sensitive character set that consists of all Unicode characters except for: "\0" (zero character), "/" (forward slash). The ":" (colon) is valid for NTFS but not for Windows
1	FILE_NAME_NTFS, WINDOWS	Case-insensitive sub set of the POSIX character set that consists of all Unicode characters except for: `" * / : < > ? \ \| +`. Note that names cannot end with a "." (dot) or " " (space)
2	FILE_NAME_DOS, DOS	Case-insensitive sub set of the WINDOWS character set that consists of all upper case ASCII characters except for: `" * + , / : ; < = > ? \`. Note that the name must follow the 8.3 format
3	DOS_WINDOWS	Both the DOS and WINDOWS names are identical, which is the same as the DOS character set, with the exception that lower case is used as well

Note that the Windows API function CreateFile allows to create case-sensitive file names when the flag FILE_FLAG_POSIX_SEMANTICS is set.

Long to short name conversion

A short name can be determined from a long name with the following approach. In the long name:

ignore Unicode characters beyond the first 8-bit (extended ASCII)
ignore control characters and spaces (character < 0x20)
ignore non-allowed characters " * + , / : ; < = > ? \
ignore dots except the last one, which is used for the extension
make all letters upper case

Additional observations:

[ or ] are replaced by an underscore (_)

Make the name unique:

use the characters 1 to 6 add ~1 and if the long name has an extension add the a dot and its first 3 letters, e.g. “Program Files” becomes “PROGRA~1” or “ ~PLAYMOVIE.REG“ becomes “~PLAYM~1.REG”
if the name already exists try ~2 up to ~9, e.g. “Program Data”, in the same directory as “Program Files”, becomes “PROGRA~2”
if the name already exists use a 16-bit hexadecimal value for characters 3 to 6 with ~1, e.g. “x86_microsoft-windows-r..ry-editor.resources_31bf3856ad364e35_6.0.6000.16386_en-us_f89a7b0005d42fd4” in a directory with a lot of file names starting with “x86_microsoft”, becomes “X8FCA6~1.163”

TODO: determine if the behavior is dependent on a setting that can be changed with fsutil

The volume version attribute

The volume version attribute ($VOLUME_VERSION) contains volume version.

TODO: complete section. Need a pre NTFS version 3.0 volume with this attribute. $AttrDef indicates the attribute to be 8 bytes in size.

The object identifier attribute

The object identifier attribute ($OBJECT_ID) contains distributed link tracker properties. It is stored as a resident MFT attribute.

The object identifier attribute data is either 16 or 64 bytes in size and consists of:

Offset	Size	Description
0	16	Droid file identifier, which contains a GUID
16	16	Birth droid volume identifier, which contains a GUID
32	16	Birth droid file identifier, which contains a GUID
48	16	Birth droid domain identifier, which contains a GUID

Droid in this context refers to CDomainRelativeObjId.

The security descriptor attribute

TODO: determine if this override any value in $Secure:$SDS?

The security descriptor attribute ($SECURITY_DESCRIPTOR) contains a Windows NT security descriptor. It can be stored as either a resident (for a small amount of data) and non-resident MFT attribute.

TODO: link to security descriptor format documentation

The volume name attribute

The volume name attribute ($VOLUME_NAME) contains the volume label. It is stored as a resident MFT attribute.

The volume name attribute data is of variable size and consists of:

Offset	Size	Value	Description
0	...		Volume label, which contains an UCS-2 little-endian string without end-of-string character

The volume name attribute is used in the $Volume metadata file MFT entry.

The volume information attribute

The volume information attribute ($VOLUME_INFORMATION) contains information about the volume. It is stored as a resident MFT attribute.

The volume information attribute data is 12 bytes in size and consists of:

Offset	Size	Description
0	8	Unknown
8	1	Major format version
9	1	Minor format version
10	2	Volume flags

The volume information attribute is used in the $Volume metadata file MFT entry.

Volume flags

Value	Identifier	Description
0x0001	VOLUME_IS_DIRTY	Is dirty
0x0002	VOLUME_RESIZE_LOG_FILE	Re-size journal ($LogFile)
0x0004	VOLUME_UPGRADE_ON_MOUNT	Upgrade on next mount
0x0008	VOLUME_MOUNTED_ON_NT4	Mounted on Windows NT 4
0x0010	VOLUME_DELETE_USN_UNDERWAY	Delete USN in progress
0x0020	VOLUME_REPAIR_OBJECT_ID	Repair object identifiers

0x0080		Unknown

0x4000	VOLUME_CHKDSK_UNDERWAY	chkdsk in progress
0x8000	VOLUME_MODIFIED_BY_CHKDSK	Modified by chkdsk

The data stream attribute

The data stream attribute ($DATA) contains the file data. It can be stored as either a resident (for a small amount of data) and non-resident MFT attribute.

Multiple data attributes for the same data stream can be used in the attribute list to define different parts of the data stream data. The first data stream attribute will contain the size of the entire data stream data. Other data stream attributes should have a size of 0. Also see attribute chains.

The index root attribute

The index root attribute ($INDEX_ROOT) contains the root of the index tree. It is stored as a resident MFT attribute.

Also see the index and the index root.

The index allocation attribute

The index allocation attribute ($INDEX_ALLOCATION) contains an array of index entries. It is stored as a non-resident MFT attribute.

The index allocation attribute itself does not define which attribute type it contains in the index value data. For this information it needs the corresponding index root attribute.

Multiple index allocation attributes for the same index can be used in the attribute list to define different parts of the index allocation data. The first index allocation attribute will contain the size of the entire index allocation data. Other index allocation attributes should have a size of 0. Also see attribute chains.

Also see the index.

The bitmap attribute

The bitmap attribute ($BITMAP) contains the allocation bitmap. It can be stored as either a resident (for a small amount of data) and non-resident MFT attribute.

It is used to maintain information about which entry is used and which is not. Every bit in the bitmap represents an entry. The index is stored byte-wise with the LSB of the byte corresponds to the first allocation element. The allocation element can represent different things:

an MFT entry in the MFT (nameless) bitmap;
an index entry in an index ($I30).

The allocation element is allocated if the corresponding bit contains 1 or unallocated if 0.

The symbolic link attribute

The symbolic link attribute ($SYMBOLIC_LINK) contains a symbolic link.

TODO: complete section. Need a pre NTFS version 3.0 volume with this attribute. $AttrDef indicates the attribute is of variable size.

The reparse point attribute

The reparse point attribute ($REPARSE_POINT) contains information about a file system-level link. It is stored as a resident MFT attribute.

Als see the reparse point.

The (HPFS) extended attribute information

The (HPFS) extended attribute information ($EA_INFORMATION) contains information about the extended attribute ($EA).

The extended attribute information data is 8 bytes in size and consists of:

Offset	Size	Description
0	2	Size of an extended attribute entry
2	2	Number of extended attributes which have the NEED_EA flag set
4	4	Size of the extended attribute ($EA) data

The (HPFS) extended attribute

The (HPFS) extended attribute ($EA) contains the extended attribute data.

The extended attribute data is of variable size and consists of:

Offset	Size	Description
0	4	Offset to next extended attribute entry, where the offset is relative from the start of the extended attribute data
4	1	Extended attribute flags
5	1	Number of characters of the extended attribute name
6	2	Value data size
8	...	The extended attribute name, which contains an ASCII string
...	...	Value data
...	...	Unknown

TODO: determine if the name is 2-byte aligned

Extended attribute flags

Value	Identifier	Description
0x80	NEED_EA	Unknown (Need EA) flag

TODO: determine what the NEED_EA flag is used for

UNITATTR extended attribute value data

Offset	Size	Value	Description
0	4		Unknown (equivalent of st_mode?)

The property set attribute

The property set attribute ($PROPERTY_SET) contains a property set.

TODO: complete section. Need a pre NTFS version 3.0 volume with this attribute. $AttrDef does not seem to always define this attribute.

The logged utility stream attribute

TODO: complete section

Value	Identifier	Description
$EFS		Encrypted NTFS (EFS)
$TXF_DATA		Transactional NTFS (TxF)

The attribute types

The attribute types are stored in the $AttrDef metadata file.

Offset	Size	Description
0	128	Attribute which contains an UCS-2 little-endian string with end-of-string character. Unused bytes are filled with 0-byte values
128	4	Attribute type (or type code)
132	8	Unknown
140	4	Unknown (flags?)
144	8	Unknown (minimum attribute size?)
152	8	Unknown (maximum attribute size?)

The index

The index structures are used for various purposes one of which are the directory entries.

The root of the index is stored in index root. The index root attribute defines which type of attribute is stored in the index and the root index node.

If the index is too large part of the index is stored in an index allocation attribute with the same attribute name. The index allocation attribute defines a data stream which contains index entries. Each index entry contains an index node.

An index consists of a tree, where both the branch and index leaf nodes contain the actual data. E.g. in case of a directory entries index, any node that contains index value data make up for the directory entries.

The index value data in a branch node signifies the upper bound of the values in the that specific branch. E.g. if directory entries index branch node contains the name “textfile.txt” all names in that index branch are smaller than “textfile.txt”.

Note the actual sorting order is dependent on the collation type defined in the index root attribute.

The index allocation attribute is accompanied by a bitmap attribute with the corresponding attribute name. The bitmap attribute defines the allocation of virtual cluster blocks within the index allocation attribute data stream.

Note that the index allocation attribute can be present even though it is not used.

Common used indexes

Indexes commonly used by NTFS are:

Value	Identifier	Description
$I30		Directory entries (used by directories)
$SDH		Security descriptor hashes (used by $Secure)
$SII		Security descriptor identifiers (used by $Secure)
$O		Object identifiers (used by $ObjId)
$O		Owner identifiers (used by $Quota)
$Q		Quotas (used by $Quota)
$R		Reparse points (used by $Reparse)

The index root

The index root consists of:

index root header
index node header
an array of index values

The index root header

The index root header is 16 bytes in size and consists of:

Offset	Size	Description
0	4	Attribute type, which contains the type of the indexed attribute or 0 if none
4	4	Collation type, which contains a value to indicate the ordering of the index entries
8	4	Index entry size
12	4	Number of cluster blocks per index entry

Note that for NTFS version 1.2 the index entry size does not have to match the index entry size in the volume header. The correct size seems to be the value in the index root header.

Collation type

Value	Identifier	Description
0x00000000	COLLATION_BINARY	Binary, where the first byte is most significant
0x00000001	COLLATION_FILENAME	UCS-2 strings case-insensitive, where the case folding is stored in $UpCase
0x00000002	COLLATION_UNICODE_STRING	UCS-2 strings case-sensitive, where upper case letters should come first

0x00000010	COLLATION_NTOFS_ULONG	Unsigned 32-bit little-endian integer
0x00000011	COLLATION_NTOFS_SID	NT security identifier (SID)
0x00000012	COLLATION_NTOFS_SECURITY_HASH	Security hash first, then NT security identifier
0x00000013	COLLATION_NTOFS_ULONGS	An array of unsigned 32-bit little-endian integer values

The index entry

The index entry consists of:

the index entry header
the index node header
The fix-up values
alignment padding (8-byte alignment), contains zero-bytes
an array of index values

The index entry header

The index entry header is 24 bytes in size and consists of:

Offset	Size	Value	Description
0	4	"INDX"	Signature
4	2		The fix-up values offset, which contains an offset relative from the start of the index entry header
6	2		The number of fix-up values
8	8		Metadata transaction journal sequence number, which contains a $LogFile Sequence Number (LSN)
16	8		Virtual Cluster Number (VCN) of the index entry

Note that there can be more fix-up value than supported by the index entry data size.

The index node header

The index node header is 16 bytes in size and consists of:

Offset	Size	Description
0	4	Index values offset, where the offset is relative from the start of the index node header
4	4	Index node size, where the value includes the size of the index node header
8	4	Allocated index node size, where the value includes the size of the index node header
12	4	Index node flags

In an index entry (index allocation attribute) the index node size includes the size of the fix-up values and the alignment padding following it.

The remainder of the index node contains remnant data and/or zero-byte values.

The index node flags

Value	Identifier	Description
0x00000001		Is branch node, which is used to indicate if the node is a branch node that has sub nodes

The index value

The index value is of variable size and consists of:

Offset	Size	Description
0	8	File reference
8	2	Size, which includes the 10 bytes of the file reference and size
10	2	Key data size
12	4	Index value flags
If index key data size > 0
16	...	Key data
...	...	Data
If index value flag 0x00000001 (is branch node) is set
...	8	Sub node Virtual Cluster Number (VCN)

The index values are stored 8 byte aligned.

Note that some other sources define the index value flags as a 16-bit value followed by 2 bytes of padding.

The index value flags

Value	Identifier	Description
0x00000001		Has sub node, when set the index value contains a sub node Virtual Cluster Number (VCN)
0x00000002		Is last, when set the index value is the last in the index values array

Index key and value data

Directory entry index value

The MFT attribute name of the directory entry index is: $I30.

The directory entry index value contains a file name attribute in the index key data.

Note that the index value data can contain remnant data.

The short and long names of the same file have a separate index values. The short name uses the DOS name space and the long name the WINDOWS name space. Index values with a single name use either the POSIX or DOS_WINDOWS name space.

A hard link to a file in the same directory has separate index values.

Security descriptor hash index value

The MFT attribute name of the security descriptor hash index is: $SDH. It appears to only to be used by the $Secure metadata file.

Also see the security descriptor hash index value.

Security descriptor identifier index value

The MFT attribute name of the security descriptor identifier index is: $SII. It appears to only to be used by the $Secure metadata file.

Also see the security descriptor identifier index value.

Compression

Compressed data-runs

NTFS compression groups 16 cluster blocks together. This group of 16 cluster blocks also named a compression unit, which is either “compressed” or uncompressed.

The term compressed is quoted here because the group of cluster blocks can also contain uncompressed data. A group of cluster blocks is “compressed” when it is compressed size is smaller than its uncompressed data size. Within a group of cluster blocks each of the 16 blocks is “compressed” individually.

The compression unit size is stored in the non-resident MFT attribute. The maximum uncompressed data size is always the cluster size (in most case 4096).

Note that a resident $DATA attribute with the compression type in the data flags is stored uncompressed.

The data runs in the $DATA attribute define cluster block ranges, e.g.

21 02 35 52

This data run defines 2 data blocks starting at block number 21045 followed by 14 sparse blocks. The total number of blocks in the compression unit is 16. Compressed data is stored in the first 2 blocks and the 14 sparse blocks are only there to make sure the data runs add up to the compression unit size. They do not define actual sparse data.

Another example:

21 40 37 52

This data run defines 64 data blocks starting at block number 21047. Since this data run is larger than the compression unit size the data is stored uncompressed.

If the data run was e.g. 60 data blocks followed by 4 sparse blocks the first 3 compression units (blocks 1 to 48) would be uncompressed and the last compression unit (blocks 49 to 64) would be compressed.

Also “sparse data” and “sparse compression unit” data runs can be mixed. If in the previous example the 60 data blocks would be followed by 20 sparse blocks the last compression unit (blocks 65 to 80) would be sparse.

A compression unit can consists of multiple compressed data runs, e.g. 1 data block followed by 4 data blocks followed by 11 sparse blocks. Data runs have been observed where the last data run size does not align with the compression unit size.

The sparse blocks data run can be stored in a subsequent attribute in an attribute chain and can be stored in multiple data runs.

NTFS compression stores the “compressed” data in blocks. Each block has a 2 byte block header.

The block is of variable size and consists of:

Offset	Size	Value	Description
0	2		Block size
2	compressed data size		Uncompressed or LZNT1 compressed data

The upper 4 bits of the block size are used as flags:

Bit(s)	Description
0 - 11	Compressed data size
12 - 14	Unknown
15	Data is compressed

TODO: link to LZNT1 documentation

Windows Overlay Filter (WOF) compressed data

A MFT entry that contains Windows Overlay Filter (WOF) compressed data has the following attributes:

reparse point attribute with tag 0x80000017, which defines the compression method
a nameless data attribute that is sparse and contains the uncompressed data size
a data attribute named WofCompressedData that contains LZXPRESS Huffman or LZX compressed data

Offset	Size	Description
Chunk offset table
0	...	Array of 32-bit of 64-bit compressed data chunk offsets, where the offset is relative from the start of the data chunks
Data chunks
...	...	One or more compressed or uncompressed data chunks

Note that if the chunk size equals the size of the uncompressed data the chunk is stored (as-is) uncompressed.

The size of the chunk offset table is:

number of chunk offsets = uncompressed size / compression unit size

The offset of the first compressed data chunk is at the end of the chunk offset table and is not stored in the chunk offset table.

If the uncompressed size of a chunk is smaller than the compression unit size the chunk is stored uncompressed.

Also see Windows Overlay Filter (WOF) compression method.

The reparse point

The reparse point is used to create file system-level links. Reparse data is stored in the reparse point attribute. The reparse point data (REPARSE_DATA_BUFFER) is of variable size and consists of:

Offset	Size	Value	Description
0	4		Reparse point tag
4	2		Reparse data size
6	2	0	Unknown (Reserved)
8	...		Reparse data

TODO: determine if non-native (Microsoft) reparse points are stored with their GUID

The reparse point tag

Offset	Size	Description
0.0	16 bits	Type
2.0	12 bits	Unknown (Reserved)
3.4	4 bits	Flags

Reparse point tag flags

Value	Identifier	Description
0x1		Unknown (Reserved)
0x2		Is alias (Name surrogate bit), when this bit is set, the file or directory represents another named entity in the system
0x4		Is high-latency media (Reserved)
0x8		Is native (Microsoft-bit)

Known reparse point tags

Value	Identifier	Description
0x00000000	IO_REPARSE_TAG_RESERVED_ZERO	Unknown (Reserved)
0x00000001	IO_REPARSE_TAG_RESERVED_ONE	Unknown (Reserved)
0x00000002	IO_REPARSE_TAG_RESERVED_TWO	Unknown (Reserved)

0x80000005	IO_REPARSE_TAG_DRIVE_EXTENDER	Used by Home server drive extender
0x80000006	IO_REPARSE_TAG_HSM2	Used by Hierarchical Storage Manager Product
0x80000007	IO_REPARSE_TAG_SIS	Used by single-instance storage (SIS) filter driver
0x80000008	IO_REPARSE_TAG_WIM	Used by the WIM Mount filter
0x80000009	IO_REPARSE_TAG_CSV	Used by Clustered Shared Volumes (CSV) version 1
0x8000000a	IO_REPARSE_TAG_DFS	Used by the Distributed File System (DFS)
0x8000000b	IO_REPARSE_TAG_FILTER_MANAGER	Used by filter manager test harness

0x80000012	IO_REPARSE_TAG_DFSR	Used by the Distributed File System (DFS)
0x80000013	IO_REPARSE_TAG_DEDUP	Used by the Data Deduplication (Dedup)
0x80000014	IO_REPARSE_TAG_NFS	Used by the Network File System (NFS)
0x80000015	IO_REPARSE_TAG_FILE_PLACEHOLDER	Used by Windows Shell for placeholder files
0x80000016	IO_REPARSE_TAG_DFM	Used by Dynamic File filter
0x80000017	IO_REPARSE_TAG_WOF	Used by Windows Overlay Filter (WOF), for either WIMBoot or compression
0x80000018	IO_REPARSE_TAG_WCI	Used by Windows Container Isolation (WCI)

0x8000001b	IO_REPARSE_TAG_APPEXECLINK	Used by Universal Windows Platform (UWP) packages to encode information that allows the application to be launched by CreateProcess

0x8000001e	IO_REPARSE_TAG_STORAGE_SYNC	Used by the Azure File Sync (AFS) filter

0x80000020	IO_REPARSE_TAG_UNHANDLED	Used by Windows Container Isolation (WCI)
0x80000021	IO_REPARSE_TAG_ONEDRIVE	Unknown (Not used)

0x80000023	IO_REPARSE_TAG_AF_UNIX	Used by the Windows Subsystem for Linux (WSL) to represent a UNIX domain socket
0x80000024	IO_REPARSE_TAG_LX_FIFO	Used by the Windows Subsystem for Linux (WSL) to represent a UNIX FIFO (named pipe)
0x80000025	IO_REPARSE_TAG_LX_CHR	Used by the Windows Subsystem for Linux (WSL) to represent a UNIX character special file
0x80000036	IO_REPARSE_TAG_LX_BLK	Used by the Windows Subsystem for Linux (WSL) to represent a UNIX block special file

0x9000001c	IO_REPARSE_TAG_PROJFS	Used by the Windows Projected File System filter, for files managed by a user mode provider such as VFS for Git

0x90001018	IO_REPARSE_TAG_WCI_1	Used by Windows Container Isolation (WCI)

0x9000101a	IO_REPARSE_TAG_CLOUD_1	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000201a	IO_REPARSE_TAG_CLOUD_2	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000301a	IO_REPARSE_TAG_CLOUD_3	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000401a	IO_REPARSE_TAG_CLOUD_4	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000501a	IO_REPARSE_TAG_CLOUD_5	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000601a	IO_REPARSE_TAG_CLOUD_6	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000701a	IO_REPARSE_TAG_CLOUD_7	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000801a	IO_REPARSE_TAG_CLOUD_8	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000901a	IO_REPARSE_TAG_CLOUD_9	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000a01a	IO_REPARSE_TAG_CLOUD_A	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000b01a	IO_REPARSE_TAG_CLOUD_B	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000c01a	IO_REPARSE_TAG_CLOUD_C	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000d01a	IO_REPARSE_TAG_CLOUD_D	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000e01a	IO_REPARSE_TAG_CLOUD_E	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0x9000f01a	IO_REPARSE_TAG_CLOUD_F	Used by the Cloud Files filter, for files managed by a sync engine such as OneDrive

0xa0000003	IO_REPARSE_TAG_MOUNT_POINT	Junction (or mount point)

0xa000000c	IO_REPARSE_TAG_SYMLINK	Symbolic link

0xa0000010	IO_REPARSE_TAG_IIS_CACHE	Used by Microsoft Internet Information Services (IIS) caching

0xa0000019	IO_REPARSE_TAG_GLOBAL_REPARSE	Used by NPFS to indicate a named pipe symbolic link from a server silo into the host silo
0xa000001a	IO_REPARSE_TAG_CLOUD	Used by the Cloud Files filter, for files managed by a sync engine such as Microsoft OneDrive

0xa000001d	IO_REPARSE_TAG_LX_SYMLINK	Used by the Windows Subsystem for Linux (WSL) to represent a UNIX symbolic link

0xa000001f	IO_REPARSE_TAG_WCI_TOMBSTONE	Used by Windows Container Isolation (WCI)

0xa0000022	IO_REPARSE_TAG_PROJFS_TOMBSTONE	Used by the Windows Projected File System filter, for files managed by a user mode provider such as VFS for Git

0xa0000027	IO_REPARSE_TAG_WCI_LINK	Used by Windows Container Isolation (WCI)

0xa0001027	IO_REPARSE_TAG_WCI_LINK_1	Used by Windows Container Isolation (WCI)

0xc0000004	IO_REPARSE_TAG_HSM	Used by Hierarchical Storage Manager Product

0xc0000014	IO_REPARSE_TAG_APPXSTRM	Unknown (Not used)

Junction or mount point reparse data

A reparse point with tag IO_REPARSE_TAG_MOUNT_POINT (0xa0000003) contains junction or mount point reparse data. The junction or mount point reparse data is of variable size and consists of:

Offset	Size	Description
0	2	Substitute name offset, where the offset is relative from the start of the reparse name data
2	2	Substitute name size in bytes, where the size of the end-of-string character is not included
4	2	Display name offset, where the offset is relative from the start of the reparse name data
6	2	Display name size in bytes, where the size of the end-of-string character is not included
Reparse name data
8	...	Substitute name, which contains an UCS-2 little-endian string without end-of-string character
...	...	Display name, which contains an UCS-2 little-endian string without end-of-string character

Note that it is currently unclear if the names contain an end-of-string character or if they are followed by alignment padding.

TODO: determine what character values like 0x0002 represent in the substitute name

00000010: 5c 00 3f 00 3f 00 02 00  43 00 3a 00 5c 00 55 00   \.?.?... C.:.\.U.
00000020: 73 00 65 00 72 00 73 00  5c 00 74 00 65 00 73 00   s.e.r.s. \.t.e.s.
00000030: 74 00 5c 00 44 00 6f 00  63 00 75 00 6d 00 65 00   t.\.D.o. c.u.m.e.
00000040: 6e 00 74 00 73 00 00 00                            n.t.s...

Symbolic link reparse data

A reparse point with tag IO_REPARSE_TAG_SYMLINK (0xa000000c) contains symbolic link reparse data. The symbolic link reparse data is of variable size and consists of:

Offset	Size	Description
0	2	Substitute name offset, where the offset is relative from the start of the reparse name data
2	2	Substitute name size in bytes
4	2	Display name offset, where the offset is relative from the start of the reparse name data
6	2	Display name size, in bytes
8	4	Symbolic link flags
Reparse name data
12	...	Substitute name, which contains an UCS-2 little-endian string without end-of-string character
...	...	Display name, which contains an UCS-2 little-endian string without end-of-string character

Symbolic link flags

Value	Identifier	Description
0x00000001	SYMLINK_FLAG_RELATIVE	The substitute name is a path name relative to the directory containing the symbolic link

Windows Overlay Filter (WOF) reparse data

A reparse point with tag IO_REPARSE_TAG_WOF (0x80000017) contains Windows Overlay Filter (WOF) reparse data. The Windows Overlay Filter (WOF) reparse data is 16 bytes in size and consists of:

Offset	Size	Value	Description
External provider information
0	4	1	Unknown (WOF version)
4	4	2	Unknown (WOF provider)
Internal provider information
8	4	1	Unknown (file information version)
12	4		Compression method

Windows Overlay Filter (WOF) compression method

Value	Identifier	Description
0		LZXPRESS Huffman with 4k window (compression unit)
1		LZX with 32k window (compression unit)
2		LZXPRESS Huffman with 8k window (compression unit)
3		LZXPRESS Huffman with 16k window (compression unit)

TODO: link to LZXPRESS Huffman and LZX documentation

Windows Container Isolation (WCI) reparse data

A reparse point with tag IO_REPARSE_TAG_WCI (0x80000018) contains Windows Container Isolation (WCI) reparse data. The Windows Container Isolation (WCI) reparse data is of variable size and consists of:

Offset	Size	Value	Description
0	4	1	Version
4	4	0	Unknown (reserved)
8	16		Look-up identifier, which contains a GUID
24	2		Name size in bytes
26	...		Name, which contains an UCS-2 little-endian string without end-of-string character

The allocation bitmap

The metadata file $Bitmap contains the allocation bitmap.

Every bit in the allocation bitmap represents a block the size of the cluster block, where the LSB is the first bit in a byte.

TODO: describe what the $SRAT data stream is used for.

Access control

The $Secure metadata file contains the security descriptors used for access control.

Type	Name	Description
Data	$SDS	Security descriptor data stream, which contains all the Security descriptors on the volume
Index	$SDH	Security descriptor hash index
Index	$SII	Security descriptor identifier index, which contains the mapping of the security descriptor identifier (in $STANDARD_INFORMATION) to the offset of the security descriptor data (in $Secure:$SDS)

Security descriptor hash ($SDH) index

The security descriptor hash index value

Offset	Size	Description
Key data
0	4	Security descriptor hash
4	4	Security descriptor identifier
Value data
8	4	Security descriptor hash
12	4	Security descriptor identifier
16	8	Security descriptor data offset (in $SDS)
24	4	Security descriptor data size (in $SDS)
28	4	Unknown

Security descriptor identifier ($SII) index

The security descriptor identifier index value

Offset	Size	Description
Key data
0	4	Security descriptor identifier
Value data
4	4	Security descriptor hash
8	4	Security descriptor identifier
12	8	Security descriptor data offset (in $SDS)
20	4	Security descriptor data size (in $SDS)

TODO: describe the hash algorithm

Security descriptor ($SDS) data stream

Offset	Size	Description
0	4	Security descriptor hash
4	4	Security descriptor identifier
12	8	Security descriptor data offset (in $SDS)
20	4	Security descriptor data size (in $SDS)
24	...	Security descriptor data
...	...	Alignment padding (2-byte alignment)

TODO: link to security descriptor format documentation

The object identifiers

$ObjID:$O

Offset	Size	Description
Key data
0	16	File (or object) identifier, which contains a GUID
Value data
4	8	File reference
12	16	Birth droid volume identifier, which contains a GUID
28	16	Birth droid file (or object) identifier, which contains a GUID
44	16	Birth droid domain identifier, which contains a GUID

Metadata transaction journal (log file)

TODO: complete section

The metadata file $LogFile contains the metadata transaction journal and consists of:

Log File Service restart page header
The fix-up values

The Log File service restart page header (LFS_RESTART_PAGE_HEADER) is 30 bytes in size and consists of:

Offset	Size	Value	Description
MULTI_SECTOR_HEADER
0	4	"CHKD", "RCRD", "RSTR"	Signature
4	2		The fix-up values (or update sequence array) offset, which contain an offset relative from the start of the restart page header
6	2		The number of fix-up values (or update sequence array size)
Common
8	8		Checkdisk last LSN
16	4		System page size
20	4		Log page size
24	2		Restart offset
26	2		Minor format version
28	2		Major format version

Log File service restart page versions

Major format version	Remarks
-1	Beta Version
0	Transition
1	Update sequence support

USN change journal

The metadata file $Extend$UsnJrnl contains the USN change journal. It is a sparse file in which NTFS stores records of changes to files and directories. Applications make use of the journal to respond to file and directory changes as they occur, like e.g. the Windows File Replication Service (FRS) and the Windows (Desktop) Search service.

The USN change journal consists of:

the $UsnJrnl:$Max data stream, containing metadata like the maximum size of the journal
the $UsnJrnl:$J data stream, containing the update (or change) entries. The $UsnJrnl:$J data stream is sparse.

USN change journal metadata

The USN change journal metadata is 32 bytes in size and consists of:

Offset	Size	Description
0	8	Maximum size in bytes
8	8	Allocation (size) delta in bytes
16	8	Update (USN) journal identifier, which contains a FILETIME
24	8	Unknown (empty)

USN change journal entries

The $UsnJrnl:$J data stream consists of an array of USN change journal entries. The USN change journal entries are stored on a per block-basis and 8-byte aligned. Therefore the remainder of the block can contain 0-byte values.

TODO: describe journal block size

Once the stream reaches maximum size the earliest USN change journal entries are removed from the stream and replaced with a sparse data run.

USN change journal entry

The USN change journal entry (USN_RECORD_V2) is of variable size and consists of:

Offset	Size	Value	Description
0	4		Entry (or record) size
4	2	2	Major format version
6	2	0	Minor format version
8	8		File reference
16	8		Parent file reference
24	8		Update sequence number (USN), which contains the file offset of the USN change journal entry which is used as a unique identifier
32	8		Update date and time, which contains a FILETIME
40	4		Update reason flags
44	4		Update source flags
48	4		Security descriptor identifier, which contains the entry number in the security ID index ($Secure:$SII). Also see Access Control
52	4		File attribute flags
56	2		Name size in bytes
58	2		Name offset, which is relative from the start of the USN change journal entry
60	(name size)		Name, which contains an UCS-2 little-endian string without end-of-string character
...	...	0x00	Unknown (Padding)

Update reason flags

Value	Identifier	Description
0x00000001	USN_REASON_DATA_OVERWRITE	The data in the file or directory is overwritten
0x00000002	USN_REASON_DATA_EXTEND	The file or directory is extended
0x00000004	USN_REASON_DATA_TRUNCATION	The file or directory is truncated

0x00000010	USN_REASON_NAMED_DATA_OVERWRITE	One or more named data streams ($DATA attributes) of file were overwritten
0x00000020	USN_REASON_NAMED_DATA_EXTEND	One or more named data streams ($DATA attributes) of file were extended
0x00000040	USN_REASON_NAMED_DATA_TRUNCATION	One or more named data streams ($DATA attributes) of a file were truncated

0x00000100	USN_REASON_FILE_CREATE	The file or directory was created
0x00000200	USN_REASON_FILE_DELETE	The file or directory was deleted
0x00000400	USN_REASON_EA_CHANGE	The extended attributes of the file were changed
0x00000800	USN_REASON_SECURITY_CHANGE	The access rights (security descriptor) of a file or directory were changed
0x00001000	USN_REASON_RENAME_OLD_NAME	The name changed, where the USN change journal entry contains the old name
0x00002000	USN_REASON_RENAME_NEW_NAME	The name changed, where the USN change journal entry contains the new name
0x00004000	USN_REASON_INDEXABLE_CHANGE	Content indexed status changed. The file attribute FILE_ATTRIBUTE_NOT_CONTENT_INDEXED was changed
0x00008000	USN_REASON_BASIC_INFO_CHANGE	Basic file or directory attributes changed. One or more file or directory attributes were changed e.g. read-only, hidden, system, archive, or sparse attribute, or one or more time stamps
0x00010000	USN_REASON_HARD_LINK_CHANGE	A hard link was created or deleted
0x00020000	USN_REASON_COMPRESSION_CHANGE	The file or directory was compressed or decompressed
0x00040000	USN_REASON_ENCRYPTION_CHANGE	The file or directory was encrypted or decrypted
0x00080000	USN_REASON_OBJECT_ID_CHANGE	The object identifier of a file or directory was changed
0x00100000	USN_REASON_REPARSE_POINT_CHANGE	The reparse point that in a file or directory was changed, or a reparse point was added to or deleted from a file or directory
0x00200000	USN_REASON_STREAM_CHANGE	A named data stream ($DATA attribute) is added to or removed from a file, or a named stream is renamed
0x00400000	USN_REASON_TRANSACTED_CHANGE	Unknown

0x80000000	USN_REASON_CLOSE	The file or directory was closed

Update source flags

Value	Identifier	Description
0x00000001	USN_SOURCE_DATA_MANAGEMENT	The operation added a private data stream to a file or directory. The modifications did not change the application data
0x00000002	USN_SOURCE_AUXILIARY_DATA	The operation was caused by the operating system. Although a write operation is performed on the item, the data was not changed
0x00000004	USN_SOURCE_REPLICATION_MANAGEMENT	The operation was caused by file replication

Alternate data streams (ADS)

Data stream name	Description
"♣BnhqlkugBim0elg1M1pt2tjdZe", "♣SummaryInformation", "{4c8cc155-6c1e-11d1-8e41-00c04fb9386d}"	Used to store properties, where ♣ (black club) is Unicode character U+2663
"{59828bbb-3f72-4c1b-a420-b51ad66eb5d3}.XPRESS"	Used during remote differential compression
"AFP_AfpInfo", "AFP_Resource"	Used to store Macintosh operating system property lists
"encryptable"	Used to store attributes relating to thumbnails in the thumbnails database
"favicon"	Used to store favorite icons for web pages
"ms-properties"	Used to store properties
"OECustomProperty"	Used to store custom properties related to email files
"Zone.Identifier"	Used to store the Internet Explorere URL security zone of the origin

ms-properties

The ms-properties alternate data stream contains a Windows Serialized Property Store (SPS).

TODO: link to Windows Serialized Property Store (SPS) format documentation

Zone.Identifier

The Zone.Identifier alternate data stream contains ASCII text in the form:

[ZoneTransfer]
ZoneId=3

Where ZoneId refers to the Internet Explorer URL security zone of the origin.

Transactional NTFS (TxF)

As of Vista Transactional NTFS (TxF) was added.

In TxF the resource manager (RM) keeps track of transactional metadata and log files. The TxF related metadata files are stored in the metadata directory:

$Extend\$RmMetadata

Resource manager repair information

The resource manager repair information metadata file: $Extend$RmMetadata$Repair consists of the following data streams:

the default (unnamed) data stream
the $Config data stream, contains the resource manager repair configuration information

TODO: determine the purpose of the default (unnamed) data stream

Resource manager repair configuration information

TODO: complete section

The $Repair:$Config data streams contains:

Offset	Size	Value	Description
0	4		Unknown
4	4		Unknown

Transactional NTFS (TxF) metadata directory

TODO: complete section

The transactional NTFS (TxF) metadata directory: $Extend$RmMetadata$Txf is used to isolate files for delete or overwrite operations.

TxF Old Page Stream (TOPS) file

The TxF Old Page Stream (TOPS) file: $Extend$RmMetadata$TxfLog$Tops consists of the following data streams:

the default (unnamed) data stream, contains metadata about the resource manager, such as its GUID, its CLFS log policy, and the LSN at which recovery should start
the $T data stream, contains the file data that is partially overwritten by a transaction as opposed to a full overwrite, which would move the file into the Transactional NTFS (TxF) metadata directory

TxF Old Page Stream (TOPS) metadata

TODO: complete section

The $Tops default (unnamed) data streams contains:

Offset	Size	Description
0	2	Unknown
2	2	Size of TOPS metadata
4	4	Unknown (Number of resource managers/streams?)
8	16	Resource Manager (RM) identifier, which contains a GUID
24	8	Unknown (empty)
32	8	Base (or log start) LSN of TxFLog stream
40	8	Unknown
48	8	Last flushed LSN of TxFLog stream
56	8	Unknown
64	8	Unknown (empty)
72	8	Unknown (Restart LSN?)
80	20	Unknown

TxF Old Page Stream (TOPS) file data

The $Tops:$T data streams contains the file data that is partially overwritten by a transaction. It consists of multiple pending transaction XML-documents.

TODO: describe start of each sector containing 0x0001

A pending transaction XML-document starts with an UTF-8 byte-order-mark. Is roughly contains the following data:

<?xml version='1.0' encoding='utf-8'?>
<PendingTransaction Version="2.0" Identifier="...">
   <Transactions>
      <Transaction TransactionId="...">
      <Install Application="..., Culture=..., Version=..., PublicKeyToken=...,
                           ProcessorArchitecture=..., versionScope=..."
               RefGuid="..."
               RefIdentifier="..."
               RefExtra="..."/>
      ...
      </Transaction>
   </Transactions>
   <ChangeList>
      <Change Family="..., Culture=..., PublicKeyToken=...,
                     ProcessorArchitecture=..., versionScope=..."
              New="..."/>
      ...
   </ChangeList>
   <POQ>
      <BeginTransaction id="..."/>

      <CreateFile path="..."
                  fileAttribute="..."/>
      <DeleteFile path="..."/>
      <MoveFile source="..." destination="..."/>
      <HardlinkFile source="..." destination="..."/>
      <SetFileInformation path="..."
                          securityDescriptor="binary base64:..."
                          flags="..."/>

       <CreateKey path="..."/>
       <SetKeyValue path="..."
                    name="..."
                    type="..."
                    encoding="base64"
                    value="..."/>
      <DeleteKeyValue path="..."
                      name="..."/>

      ...
   </POQ>
   <InstallerQueue Length="...">
      <Action Installer="..."
              Mode="..."
              Phase="..."
              Family="..., Culture=..., PublicKeyToken=...,
                     ProcessorArchitecture=..., versionScope=..."
              Old="..."
              New="..."/>

      ...
   </InstallerQueue >
</PendingTransaction>

Transactional NTFS (TxF) Common Log File System (CLFS) files

TxF uses a Common Log File System (CLFS) log store and the logged utility stream attribute named $TXF_DATA.

TODO: link to CLFS format documentation

The base log file (BLF) of the TxF log store is:

$Extend\$RmMetadata\$TxfLog\TxfLog.blf

Commonly the corresponding container files are:

$Extend\$RmMetadata\$TxfLog\TxfLogContainer00000000000000000001
$Extend\$RmMetadata\$TxfLog\TxfLogContainer00000000000000000002

TxF uses a multiplexed log store which contains the following streams:

the KtmLog stream used for Kernel Transaction Manager (KTM) metadata records
TxfLog stream, which contains the TxF log records.

Transactional data logged utility stream attribute

The transactional data ($TXF_DATA) logged utility stream attribute is 56 bytes in size and consist of:

Offset	Size	Description
0	6	Unknown (remnant data)
6	8	Resource manager root file reference, which contains an NTFS file reference that refers to the MFT
14	8	Unknown (USN index?)
22	8	File identifier (TxID), which contains a TxF file identifier
30	8	Data LSN, which contains a CLFS LSN of file data transaction records
38	8	Metadata LSN, which contains a CLFS LSN of file system metadata transaction records
46	8	Directory index LSN, which contains a CLFS LSN of directory index transaction records
54	2	Unknown (Flags?)

Note that a single MFT entry can contain multiple Transactional data logged utility stream attributes.

Windows definitions

File attribute flags

The file attribute flags consist of the following values:

Value	Identifier	Description
0x00000001	FILE_ATTRIBUTE_READONLY	Is read-only
0x00000002	FILE_ATTRIBUTE_HIDDEN	Is hidden
0x00000004	FILE_ATTRIBUTE_SYSTEM	Is a system file or directory
0x00000008		Is a volume label, which is not used by NTFS
0x00000010	FILE_ATTRIBUTE_DIRECTORY	Is a directory, which is not used by NTFS
0x00000020	FILE_ATTRIBUTE_ARCHIVE	Should be archived
0x00000040	FILE_ATTRIBUTE_DEVICE	Is a device, which is not used by NTFS
0x00000080	FILE_ATTRIBUTE_NORMAL	Is normal file. Note that none of the other flags should be set
0x00000100	FILE_ATTRIBUTE_TEMPORARY	Is temporary
0x00000200	FILE_ATTRIBUTE_SPARSE_FILE	Is a sparse file
0x00000400	FILE_ATTRIBUTE_REPARSE_POINT	Is a reparse point or symbolic link
0x00000800	FILE_ATTRIBUTE_COMPRESSED	Is compressed
0x00001000	FILE_ATTRIBUTE_OFFLINE	Is offline. The data of the file is stored on an offline storage
0x00002000	FILE_ATTRIBUTE_NOT_CONTENT_INDEXED	Do not index content. The content of the file or directory should not be indexed by the indexing service
0x00004000	FILE_ATTRIBUTE_ENCRYPTED	Is encrypted
0x00008000		Unknown (seen on Windows 95 FAT)
0x00010000	FILE_ATTRIBUTE_VIRTUAL	Is virtual

The following flags are mainly used in the file name attribute and sparsely in the standard information attribute. It could be that they have a different meaning in both types of attributes or that the standard information flags are not updated. For now the latter is assumed.

Value	Identifier	Description
0x10000000		Unknown (Is directory or has $I30 index? Note that an $Extend directory without this flag has been observed)
0x20000000		Is index view

Corruption scenarios

Data steam with inconsistent data flags

An MFT entry contains an $ATTRIBUTE_LIST attribute that contains multiple $DATA attributes. The $DATA attributes define a LZNT1 compressed data stream though only the first $DATA attribute has the compressed data flag set.

Note that it is unclear if this is a corruption scenario or not.

MFT entry: 220 information:
    Is allocated                   : true
    File reference                 : 220-59
    Base record file reference     : Not set (0)
    Journal sequence number        : 51876429013
    Number of attributes           : 5

Attribute: 1
    Type                           : $STANDARD_INFORMATION (0x00000010)
    Creation time                  : Jun 05, 2019 06:56:26.032730300 UTC
    Modification time              : Oct 05, 2019 06:56:04.150940700 UTC
    Access time                    : Oct 05, 2019 06:56:04.150940700 UTC
    Entry modification time        : Oct 05, 2019 06:56:04.150940700 UTC
    Owner identifier               : 0
    Security descriptor identifier : 5862
    Update sequence number         : 11553149976
    File attribute flags           : 0x00000820
       Should be archived (FILE_ATTRIBUTE_ARCHIVE)
       Is compressed (FILE_ATTRIBUTE_COMPRESSED)

Attribute: 2
    Type                           : $ATTRIBUTE_LIST (0x00000020)

Attribute: 3
    Type                           : $FILE_NAME (0x00000030)
    Parent file reference          : 33996-57
    Creation time                  : Jun 05, 2019 06:56:26.032730300 UTC
    Modification time              : Oct 05, 2019 06:56:03.510061800 UTC
    Access time                    : Oct 05, 2019 06:56:03.510061800 UTC
    Entry modification time        : Oct 05, 2019 06:56:03.510061800 UTC
    File attribute flags           : 0x00000020
       Should be archived (FILE_ATTRIBUTE_ARCHIVE)
    Namespace                      : POSIX (0)
    Name                           : setupapi.dev.20191005_085603.log

Attribute: 4
    Type                           : $DATA (0x00000080)
    Data VCN range                 : 513 - 1103
    Data flags                     : 0x0000

Attribute: 5
    Type                           : $DATA (0x00000080)
    Data VCN range                 : 0 - 512
    Data size                      : 4487594 bytes
    Data flags                     : 0x0001

Directory entry with outdated file reference

The directory entry: \ProgramData\McAfee\Common Framework\Task\5.ini

File entry:
    Path                           : \ProgramData\McAfee\Common Framework\Task\5.ini
    File reference                 : 51106-400
    Name                           : 5.ini
    Parent file reference          : 65804-10
    Size                           : 723
    Creation time                  : Sep 16, 2011 20:47:54.561041200 UTC
    Modification time              : Apr 07, 2012 21:07:02.684060000 UTC
    Access time                    : Apr 07, 2012 21:07:02.652810200 UTC
    Entry modification time        : Apr 07, 2012 21:07:02.684060000 UTC
    File attribute flags           : 0x00002020
       Should be archived (FILE_ATTRIBUTE_ARCHIVE)
       Content should not be indexed (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED)

The corresponding MFT entry:

MFT entry: 51106 information:
    Is allocated                   : true
    File reference                 : 51106-496
    Base record file reference     : Not set (0)
    Journal sequence number        : 0
    Number of attributes           : 3

Attribute: 1
    Type                           : $STANDARD_INFORMATION (0x00000010)
    Creation time                  : Sep 16, 2011 20:47:54.561041200 UTC
    Modification time              : Apr 07, 2012 21:07:02.684060000 UTC
    Access time                    : Apr 07, 2012 21:07:02.652810200 UTC
    Entry modification time        : Apr 07, 2012 21:07:02.684060000 UTC
    Owner identifier               : 0
    Security descriptor identifier : 1368
    Update sequence number         : 1947271600
    File attribute flags           : 0x00002020
       Should be archived (FILE_ATTRIBUTE_ARCHIVE)
       Content should not be indexed (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED)

Attribute: 2
    Type                           : $FILE_NAME (0x00000030)
    Parent file reference          : 65804-10
    Creation time                  : Sep 16, 2011 20:47:54.561041200 UTC
    Modification time              : Apr 07, 2012 21:07:02.652810200 UTC
    Access time                    : Apr 07, 2012 21:07:02.652810200 UTC
    Entry modification time        : Apr 07, 2012 21:07:02.652810200 UTC
    File attribute flags           : 0x00002020
       Should be archived (FILE_ATTRIBUTE_ARCHIVE)
       Content should not be indexed (FILE_ATTRIBUTE_NOT_CONTENT_INDEXED)
    Namespace                      : DOS and Windows (3)
    Name                           : 1.ini

Attribute: 3
    Type                           : $DATA (0x00000080)
    Data size                      : 723 bytes
    Data flags                     : 0x0000

TODO: determine if $LogFile could be used to recover from this corruption scenario

LZNT1 compressed block with data size of 0

Not sure if this is a corruption scenario or a data format edge case.

A compression unit (index 30) consisting of the following data runs:

reading data run: 60.
data run:
00000000: 11 01 01                                           ...

value sizes                               : 1, 1
number of cluster blocks                  : 1 (size: 4096)
cluster block number                      : 687143 (1) (offset: 0xa7c27000)

reading data run: 61.
data run:
00000000: 01 0f                                              ..

value sizes                               : 1, 0
number of cluster blocks                  : 15 (size: 61440)
cluster block number                      : 0 (0) (offset: 0x00000000)
        Is sparse

Contains the following data:

a7c27000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
...
a7c27ff0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

This relates to an empty LZNT1 compressed block.

compressed data offset                    : 0 (0x00000000)
compression chunk header                  : 0x0000
compressed chunk size                     : 1
signature value                           : 0
is compressed flag                        : 0

It was observed in 2 differnt NTFS implementations that the entire block is filled with 0-byte values.

TODO: verify behavior of Windows NTFS implementation.

Truncated LZNT1 compressed block

Not sure if this is a corruption scenario or a data format edge case.

A compression unit (index 0) consisting of the following data runs:

reading data run: 0.
data run:
00000000: 31 08 48 d8 01                                     1.H..

value sizes                               : 1, 3
number of cluster blocks                  : 8 (size: 32768)
cluster block number                      : 120904 (120904) (offset: 0x1d848000)

reading data run: 1.
data run:
00000000: 01 08                                              ..

value sizes                               : 1, 0
number of cluster blocks                  : 8 (size: 32768)
cluster block number                      : 0 (0) (offset: 0x00000000)
        Is sparse

Contains the following data:

1d848000  bd b7 50 44 46 50 00 01  00 01 00 40 e0 00 07 0b  |..PDFP.....@....|
...
1d84c000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
1d84fff0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

This relates to a LZNT1 compressed block that appears to be truncated at offset 16384 (0x00004000).

compressed data offset                    : 16384 (0x00004000)
compression flag byte                     : 0x00

Different behavior was observed in 2 differnt NTFS implementations:

one implementation fills the compressed block with the uncompressed data it could read and the rest with with 0-byte values
another implementation seems to provide the data that was already in its buffer

TODO: verify behavior of Windows NTFS implementation.

References

How NTFS Works, by Microsoft
Master File Table, by Microsoft
NTFS Attribute Types, by Microsoft
File Attribute Constants, by Microsoft
Reparse Tags, by Microsoft
NTFS documentation, by Richard Russon
ATTRIBUTE_LIST_ENTRY structure, by Microsoft
ATTRIBUTE_RECORD_HEADER structure, by Microsoft
FILE_RECORD_SEGMENT_HEADER structure, by Microsoft
MULTI_SECTOR_HEADER structure, by Microsoft
REPARSE_DATA_BUFFER structure (ntifs.h), by Microsoft
REPARSE_DATA_BUFFER_EX structure (ntifs.h), by Microsoft
USN_RECORD_V2, by Microsoft
Zone.Identifier Stream Name, by Microsoft
the Internet Explorer URL security zone, by Microsoft
ntfs_layout.h, by Anton Altaparmakov

Assorted formats

Property list (plist)
Zlib compressed data

Property list (plist) format

The property list (plist) formats are used to store various kinds of data, for example configuration data. The format is know to be used stand-alone as well as embedded in other data formats.

Overview

Known plist formats are:

ASCII plist format
Binary plist format
XML plist format

TODO: What about other plist formats like JSON?

Value types

Type	Description
array	Collection of plist values without key
boolean	Boolean value
data	Binary data
date	Date and time value
dictionary	Collection of plist values with key
integer	Signed integer value
real	Floating-point value
string	String value

ASCII plist format

TODO: complete section

Binary plist format

A binary plist file consists of:

header
object table
offset table
trailer

Characteristics	Description
Byte order	big-endian
Date and time values	Number of seconds since Jan 1, 2001 00:00:00 UTC
Character strings	UTF-16 big-endian

Binary plist header

The binary plist header (CFBinaryPlistHeader) is 8 bytes in size and consists of:

Offset	Size	Value	Description
0	6	"bplist"	Signature
6	2		Format version

Format versions

Version	Description
"00"	Supported as of Tiger
"01"	Supported as of Leopard
"0x"	Supported as of Snow Leopard, where x is any character

Object table

The object table consists of:

zero or more objects

Objects are of variable size and consist of:

an object maker byte
(optional) object data

Object marker byte

Value	Identifier	Description
0x00	kCFBinaryPlistMarkerNull	Empty value (NULL)

0x08	kCFBinaryPlistMarkerFalse	Boolean False
0x09	kCFBinaryPlistMarkerTrue	Boolean True

0x0f	kCFBinaryPlistMarkerFill	Unknown (Fill byte?)
0x1#	kCFBinaryPlistMarkerInt	Integer, where 2^# is the number of bytes
0x2#	kCFBinaryPlistMarkerInt	Floating point, where 2^# is the number of bytes

0x33	kCFBinaryPlistMarkerDate	Date and time value, which is stored as a 64-bits floating point that contains the number of seconds since Jan 1, 2001 00:00:00 UTC

0x4#	kCFBinaryPlistMarkerData	Binary data, where # is the number of bytes. If # is 15 then the object marker byte is followed by a 32-bit integer that contains the size of the data
0x5#	kCFBinaryPlistMarkerASCIIString	ASCII string, where # is the number of characters. If # is 15 then the object marker byte is followed by an integer object that contains the number of characters in the string. The string is stored in ASCII (with codepage?) without an end-of-string marker
0x6#	kCFBinaryPlistMarkerUnicode16String	Unicode string, where # is the number of characters. If # is 15 then the object marker byte is followed by an integer object that contains the number of characters in the string. The string is stored in UTF-16 big-endian without an end-of-string marker
0x7#		Unused
0x8#	kCFBinaryPlistMarkerUID	UID, where # + 1 is the number of bytes
0x9#		Unused
0xa#	kCFBinaryPlistMarkerArray	Array of objects, where # is the number of elements. If # is 15 then the object marker byte is followed by an integer object that contains the number of elements in the array
0xb#		Unused
0xc#	kCFBinaryPlistMarkerSet	Set of objects, where # is the number of elements. If # is 15 then the object marker byte is followed by an integer object that contains the number of ele,emts in the set
0xd#	kCFBinaryPlistMarkerDict	Dictionary of key value pairs, where # is the number of key value pairs. If # is 15 then the object marker byte is followed by an integer object that contains the number of key value pairs in the dictionary
0xe#		Unused
0xf#		Unused

Array object

The array object consists of:

array object marker with number of elements
array of object references that identify the element objects.
the element object data

The byte size of the object reference is defined in the trailer. An object reference of 1 will refer to the first object in the (object) offset table.

Set object

The set object consists of:

set object marker with number of elements
array of object references that identify the element objects.
the element object data

The byte size of the object reference is defined in the trailer. An object reference of 1 will refer to the first object in the (object) offset table.

Dictionary object

The dictionary object consists of:

dictionary object marker with number of key and value pairs
array of key references that identify key objects.
array of object references that identify the value objects.
the key/value object data

The byte size of the key and object reference is defined in the trailer. A key and object reference of 1 will refer to the first object in the (object) offset table.

(Object) offset table

The offset table consists of an array of offsets. The trailer defines:

The location of the offset table
The offset byte size
The number of offsets in the table

The offset values are relative from the start of the file.

Binary plist trailer

The binary plist trailer (CFBinaryPlistTrailer) is 32 bytes in size and consists of:

Offset	Size	Value	Description
0	5 x 1	0	Unknown (0-byte values)
5	1	0	Unknown (Sort version)
6	1		Offset byte size
7	1		Key and object reference byte size
8	8		Number of objects
16	8		Root (or top-level) object
24	8		Offset table offset, where the offset is relative to the start of the file

XML plist format

A XML plist file consists of:

optional XML declaration
optional Document Type Definition (DTD)
plist root XML element
key-value pair XML elements

For example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist SYSTEM "file://localhost/System/Library/DTDs/PropertyList.dtd">
<plist version="1.0">
...
</plist>

Zlib compressed data

Zlib compression is commonly used in file formats. The zlib compressed data format, as defined in RFC1950, allows for multiple techniques but only the Deflate compression method, a variation of LZ77, is used.

Overview

Zlib compressed data consist of:

data header
compressed data
Adler-32 checksum of the uncompressed data

Characteristics

Characteristics	Description
Byte order	big-endian

Data header

The data header is 2 or 6 bytes in size and consist of:

Offset	Size	Description
The bit values are stored a 8-bit values
0.0	4 bits	Compression method
0.4	4 bits	Compression information
Flags
1.0	5 bits	Check bits
1.5	1 bit	Preset dictionary flag
1.6	2 bits	Compression level. The compression level is used mainly for re-compression
If the dictionary identifier flag is set
2	4	Preset dictionary identifier, which contains an Adler-32 used to identifier the preset dictionary
Common
...	...	Compressed data
...	4	Checksum, which contains an Adler-32 of the compressed data

The check bits value must be such that when the first 2 bytes are represented as a 16-bit unsigned integer in big-endian byte order the value is a multiple of 31, such that:

((first * 256) + second) % 31 = 0

Compression method

Value	Identifier	Description
8		Deflate (RFC1951), with a maximum window size of 32 KiB

15		Reserved for additional header data

Note that RFC1950 only defines 8 as a valid compression method.

Compression information

The value of the compression information is dependent on the compression method.

Compression information - compression method 8 (Deflate)

For compression method 8 (Deflate) the compression information contains the base-2 logarithm of the LZ77 window size minus 8.

Offset	Size	Value	Description
0.0	4 bits		Window size, which consists of a base-2 logarithm (2n), with a maximum value of 7 (32 KiB)

To determine the corresponding window size:

1 << (7 + 8)

E.g. a compression information value of 7 indicates a 32768 bytes window size. Values larger than 7 are not allowed according to RFC1950 and thus the maximum window size is 32768 bytes.

Compression level

Value	Identifier	Description
0		Fastest
1		Fast
2		Default
3		Slowest, maximum compression

Compressed data

Deflate compressed data

The deflate compressed data consists of one or more deflate compressed blocks. Each block consists of:

block header
block data

Note that a block can reference uncompressed data that is stored in a previous block.

Block header

The block header is 3 bits in size and consists of:

Offset	Size	Value	Description
0	1 bit		Last block (in stream) marker, where 1 represents the last block and 0 otherwise
0.1	2 bits		Block type

Block types

Value	Identifier	Description
0		Uncompressed (or stored) block
1		Fixed Huffman compressed block
2		Dynamic Huffman compressed block
3		Reserved (not used)

Uncompressed block data

The uncompressed block data is of variable size and consists of:

Offset	Size	Description
0.3	5 bits	Empty values (not used)
1	2	Uncompressed data size
3	2	Copy of uncompressed data size, which contains a 1s complement of the uncompressed data size
5	...	Uncompressed data

The uncompressed data size can range between 0 and 65535 bytes.

Huffman compressed block data

The uncompressed block data is of variable size and consists of:

Optional dynamic Huffman table
Encoded bit-stream
End-of-stream (or end-of-block or end-of-data) marker

Dynamic Huffman table

The dynamic Huffman table consists of:

Offset	Size	Description
0.3	5 bits	Number of literal codes, which is value + 257. The number of literal codes must be smaller than 286
1.0	5 bits	Number of distance codes, which is value + 1. The number of distance codes must be smaller than 30
1.5	4 bits	The number of Huffman codes for the code sizes, which is value + 4
2.1	...	The code sizes
...	...	Huffman encoded stream of the Huffman codes for the literals
...	...	Huffman encoded stream of the Huffman codes for the distances

A single code size value is 3 bits of size. A value of 0 means the code size is not used in the Huffman encoding of the literal and distance codes.

The codes size values are stored in the following sequence:

16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15

The first value applies to a code size of 16, the second to 17, etc. Code sizes that are not stored default to 0.

The code size values are used to construct the code sizes Huffman table. This must be a complete Huffman table which is used to decode the literal and distance codes. The corresponding codes size Huffman encoding is defined as:

Value	Identifier	Description
0 - 15		Represents a code size of 0 - 15
16		Copy the previous code size 3 - 6 times. The next 2 bits indicate repeat length (0 = 3, ... , 3 = 6), e.g. codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will expand to 12 code lengths of 8 (1 + 6 + 5)
17		Repeat a code length of 0 for 3 - 10 times (3 bits of length)
18		Repeat a code length of 0 for 11 - 138 times (7 bits of length)

Both the literal and distance Huffman codes are stored Huffman encoded using the code sizes Huffman table. Code sizes that are not stored default to 0. The code size for the literal code 256 (end-of-block) should be set and thus not 0.

Encoded bit-stream

The encoded bit-stream is stored in 8-bit integers, where bit values are stored back-to-front. So that 3 least-significant bits (LSB) would represent a 3-bit value at the start of the -stream. Note that the LSB of the 3-bit value is the LSB of the byte value.

Deflate uses a Huffman tree of 288 Huffman codes (or symbols) where the values:

0 - 255; represent the literal byte values: 0 - 255
256: represents the end of (compressed) stream (or block)
257 - 285 (combined with extra-bits): represent a (size, offset) tuple (or match length) of 3 - 258 bytes
286, 287: are not used (reserved) and their use is considered illegal although the values are still part of the tree

This document refers to this Huffman tree as the literals Huffman tree.

The bits in the encoded bit-stream correspond to values in the literals Huffman tree. If a symbol is found that represents a compression size and offset tuple (or match length code) the bits following the literals symbol contains a distance (Huffman) code. The match length coedes might require additional (or extra) bits to store the length (or size).

The distances Huffman tree contains space for 32 symbols. See section Distance codes. The distance code might require additional (or extra) bits to store the distance.

Literal codes

The literal codes consist of:

Value	Identifier	Description
0x00 – 0xff		literal byte values
0x100		end-of-block marker
0 additional bits
0x101		Size of 3
0x102		Size of 4
0x103		Size of 5
0x104		Size of 6
0x105		Size of 7
0x106		Size of 8
0x107		Size of 9
0x108		Size of 10
1 additional bit
0x109		Size of 11 to 12
0x10a		Size of 13 to 14
0x10b		Size of 15 to 16
0x10c		Size of 17 to 18
2 additional bits
0x10d		Size of 19 to 22
0x10e		Size of 23 to 26
0x10f		Size of 27 to 30
0x110		Size of 31 to 34
3 additional bits
0x111		Size of 35 to 42
0x112		Size of 43 to 50
0x113		Size of 51 to 58
0x114		Size of 59 to 66
4 additional bits
0x115		Size of 67 to 82
0x116		Size of 83 to 98
0x117		Size of 99 to 114
0x118		Size of 115 to 130
5 additional bits
0x119		Size of 131 to 162
0x11a		Size of 163 to 194
0x11b		Size of 195 to 226
0x11c		Size of 227 to 257
0 additional bits
0x11d		Size of 258

Distance codes

The distance codes consist of:

Value	Identifier	Description
0	distance of 1
1	distance of 2
2	distance of 3
3	distance of 4
1 additional bit
4	distance of 5 - 6
5	distance of 7 - 8
2 additional bits
6	distance of 9 - 12
7	distance of 13 - 16
3 additional bits
8	distance of 17 - 24
9	distance of 25 - 32
4 additional bits
10	distance of 33 - 48
11	distance of 49 - 64
5 additional bits
12	distance of 65 - 96
13	distance of 97 - 128
6 additional bits
14	distance of 129 - 192
15	distance of 193 - 256
7 additional bits
16	distance of 257 - 384
17	distance of 385 - 512
8 additional bits
18	distance of 513 - 768
19	distance of 769 - 1024
9 additional bits
20	distance of 1025 - 1536
21	distance of 1537 - 2048
10 additional bits
22	distance of 2049 - 3072
23	distance of 3073 - 4096
11 additional bits
24	distance of 4097 - 6144
25	distance of 6145 - 8192
12 additional bits
26	distance 8193 - 12288
27	distance 12289 - 16384
13 additional bits
28	distance 16385 - 24576
29	distance 24577 - 32768
other
30-31	not used, reserved and illegal but still part of the tree

TODO: complete this section

Additional bits

The additional bits are stored in big-endian (MSB first) and indicate the index into the corresponding array of size values (or base size + additional size).

Value	Identifier	Description
0 additional bits
0		Offset of 1
1		Offset of 2
2		Offset of 3
3		Offset of 4
1 additional bit

TODO: complete this section

Decompression

The decompression in pseudo code:

if block_header.type == HUFFMANN_FIXED:
{
    initialize the fixed Huffman trees
}

do
{
    read block_header from input stream

    if( block_header.type == UNCOMPRESSED )
    {
        align with next byte
        read and check block_header.size and block_header.size_copy
        read data of block_header.size
    }
    else
    {
        if( block_header.type == HUFFMANN_DYNAMIC )
        {
            read the dynamic Huffman trees (see subsection below)
        }
        loop (until end of block code recognized)
        {
            decode literal/length value from input stream
            if( value < 256 )
            {
                copy value (literal byte) to output stream
            }
            else if value = end of block (256)
            {
                 break from loop
             }
             else (value = 257..285)
             {
                 decode distance from input stream

                 move backwards distance bytes in the output
                 stream, and copy length bytes from this
                 position to the output stream.
            }
        }
    }
}
while( block_header.last_block_flag == 0 );

Adler-32 checksum

Zlib provides a highly optimized version of the algorithm provided below.

uint32_t adler32(
          uint8_t *buffer,
          size_t buffer_size,
          uint32_t previous_key )
{
    size_t buffer_iterator = 0;
    uint32_t lower_word    = previous_key & 0xffff;
    uint32_t upper_word    = ( previous_key >> 16 ) & 0xffff;

    for( buffer_iterator = 0;
         buffer_iterator < buffer_size;
         buffer_iterator++ )
    {
        lower_word += buffer[ buffer_iterator ];
        upper_word += lower_word;

        if( ( buffer_iterator != 0 )
         && ( ( buffer_iterator % 0x15b0 == 0 )
          ||  ( buffer_iterator == buffer_size - 1 ) ) )
        {
            lower_word = lower_word % 0xfff1;
            upper_word = upper_word % 0xfff1;
        }
    }
    return( ( upper_word << 16 ) | lower_word );
}

Keyboard shortcuts

Keramics data format specifications