Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

VMware Virtual Disk (VMDK) format

The VMware Virtual Disk (VMDK) format is used by VMware virtualization products as one of its image format.

Overview

A VMDK disk image can consist of multiple files, such as:

  • descriptor file
  • extent data files
  • raw extent data file
  • VMDK sparse extent data file
  • COWD sparse extent data file

Characteristics

CharacteristicsDescription
Byte orderlittle-endian
Date and time values
Character stringsnarrow character (Single Byte Character (SBC) or Multi Byte Character (MBC)) stored using a codepage defined in the descriptor file

The number of bytes per sector is 512.

Disk types

There are multiple types of VMKD images, namely:

The 2GbMaxExtentFlat (or twoGbMaxExtentFlat) disk image, which consists of:

  • a descriptor file (<name>.vmdk)
  • raw data extent files (<name>-f###.vmdk), where ### is contains a decimal value starting with 1.

The 2GbMaxExtentSparse (or twoGbMaxExtentSparse) disk image, which consists of:

  • a descriptor file (<name>.vmdk)
  • VMDK sparse data extent files (<name>-s###.vmdk), where ### is contains a decimal value starting with 1.

The monolithicFlat disk image, which consists of:

  • a descriptor file (<name>.vmdk)
  • raw data extent file (<name>-f001.vmdk)

The monolithicSparse disk image, which consists of:

  • VMDK sparse data extent file (<name>.vmdk) also contains the descriptor file data.

The vmfs disk image, which consists of:

  • a descriptor file (<name>.vmdk)
  • raw data extent file (<name>-flat.vmdk)

The vmfsSparse differential disk image, which consists of:

  • a descriptor file (<name>.vmdk)
  • COWD sparse data extent files (<name>-delta.vmdk)

TODO: describe more disk types

A delta link is similar to a differential image where the image contains the changes (or delta) in comparison of a parent image. According to the Virtual Disk Format 5.0 specification one delta image can chain to another delta image.

TODO: Name <name>-delta.vmdk

Descriptor file

The descriptor file is a case-insensitive text based file that contains the following information:

  • optional comment and empty lines
  • header
  • extent descriptions
  • optional change tracking file
  • disk data base (DDB)

Note that the descriptor file can contains leading and trailing whitespace. Lines are separated by a line feed character (0x0a). And leading comment (starting with #) and empty lines.

The header of a descriptor file looks similar to the data below.

# Disk DescriptorFile
version=1
CID=12345678
parentCID=ffffffff
createType="twoGbMaxExtentSparse"

The header consists of the following values:

ValueDescription
"# Disk DescriptorFile"Section header (or file signature)
versionFormat version
encodingEncoding
CIDContent identifier, which contains a random 32-bit value updated the first time the content of the virtual disk is modified after the virtual disk is opened
parentCIDThe content identifier of the parent, which contains a 32-bit value identifying the parent content, where a value of 'ffffffff' (-1) represents no parent content
isNativeSnapshotTODO: add description. A value of "no" has been observed in a VMWare Player 9 descriptor file
createTypeDisk type
parentFileNameHintContains the path to the parent image, which is only present if the image is a differential image (delta link)

TODO: confirm if a content identifier of ‘fffffffe’ (-2) represents that the long content identifier should be used

Format versions

ValueDescription
1TODO: add description
2TODO: add description
3TODO: add description

Encodings

Note that it is currently unknown which encodings are supported, currently it is assumed that at least the Windows codepages are supported and that the default is UTF-8.

ValueDescription
Big5Big5 assumed to be equivalent to Windows codepage 950
GBKGBK assumed to be equivalent to Windows codepage 936, which was observed in VMWare Workstation for Windows, Chinese edition
Shift_JISShift_JIS assumed to be equivalent to Windows codepage 932, which was observed in VMWare Workstation for Windows, Japanese edition
UTF-8UTF-8
windows-949-2000Windows codepage 949, 2000 version
windows-1252Windows codepage 1252, which was observed in VMWare Player 9 descriptor file

Disk types

ValueDescription
2GbMaxExtentFlat, twoGbMaxExtentFlatThe disk is split into fixed-size extents of maximum 2 GB, which consists of raw extent data files
2GbMaxExtentSparse, twoGbMaxExtentSparseThe disk is split into sparse (dynamic-size) extents of maximum 2 GB, which consists of VMDK sparse extent data files
customTODO: add description. Descriptor file with arbitrary extents, used to mount v2i-format
fullDeviceThe disk uses a full physical disk device
monolithicFlatThe disk is a single raw extent data file
monolithicSparseThe disk is a single VMDK sparse extent data file
partitionedDeviceThe disk uses a full physical disk device, using access per partition
streamOptimizedThe disk is a single compressed VMDK sparse extent data file
vmfsThe disk is a single raw extent data file, which is similar to the "monolithicFlat"
vmfsEagerZeroedThickThe disk is a single raw extent data file
vmfsPreallocatedThe disk is a single raw extent data file
vmfsRawThe disk uses a full physical disk device
vmfsRDM, vmfsRawDeviceMapThe disk uses a full physical disk device, which is also referred to as Raw Device Map (RDM)
vmfsRDMP, vmfsPassthroughRawDeviceMapThe disk uses a full physical disk device, which is similar to the Raw Device Map (RDM), but sends SCSI commands to underlying hardware
vmfsSparseThe disk is split into COWD sparse (dynamic-size) extents
vmfsThinThe disk is split into COWD sparse (dynamic-size) extents

Extent descriptions

The extent descriptions of a descriptor file looks similar to the data below.

# Extent description
RW 4192256 SPARSE "test-s001.vmdk"
# Extent description
RW 1048576 FLAT "test-f001.vmdk" 0

The extent descriptions consists of the following values:

ValueDescription
"# Extent description"Section header
Extent descriptors

Extent descriptor

The extent descriptor consists of the following values:

ValueDescription
1stAccess mode
2ndThe number of sectors
3rdExtent type
If extent type is not ZERO
4thPath of the VMDK extent data file, relative to the location of the VMDK descriptor file
Optional
5thThe extent start sector
Seen in VMWare Player 9 in combination with a physical device extent on Windows
6th and 7th"partitionUUID" followed by a device identifier

The extent offset is specified only for flat extents and corresponds to the offset in the file or device where the extent data is located. For device-backed virtual disks (physical or raw disks) the extent offset can be non-zero. For raw extent data files the extent offset should be zero.

Extent access mode

The extent access mode consists of the following values:

ValueDescription
NOACCESSNo access
RDONLYRead only
RWRead write

Extent types

The extent type consists of the following values:

ValueDescription
FLATraw extent data file
SPARSEVMDK sparse extent data file
ZEROSparse extent that consists of 0-byte values
VMFSraw extent data file
VMFSSPARSECOWD sparse extent data file
VMFSRDMUnknown (Physical disk device that uses RDM?)
VMFSRAWUnknown (Physical disk device?)

Note that VMWare Player 9 has been observed to use “FLAT” for Windows devices

Change tracking file section

The change tracking file section was introduced in version 3 and looks similar to:

# Change Tracking File
changeTrackPath="test-flat.vmdk"

The change tracking file section consists of the following values:

ValueDescription
"# Change Tracking File"Section header
changeTrackPathUnknown (The path to the change tracking file?)

Disk database

The disk data base of a descriptor file looks similar to the data below.

# The Disk Data Base
#DDB

ddb.virtualHWVersion = "4"
ddb.geometry.cylinders = "16383"
ddb.geometry.heads = "16"
ddb.geometry.sectors = "63"
ddb.adapterType = "ide"
ddb.toolsVersion = "0"

The disk data base consists of the following values:

ValueDescription
"# The Disk Data Base"Section header
"#DDB"Currently assumed to be part of the section header
ddb.deletableUnknown (seen: "true")
ddb.virtualHWVersionThe virtual hardware version. For VMWare Player and Workstation this seems to correspond with the application version
ddb.longContentIDThe long content identifier, which contains a 128-bit base16 encoded value, without spaces
ddb.uuidUUIDm which contains a 128-bit base16 encoded value, with spaces between bytes
ddb.geometry.cylindersThe number of cylinders
ddb.geometry.headsThe number of heads
ddb.geometry.sectorsThe number of sectors
ddb.geometry.biosCylindersThe number of cylinders as reported by the BIOS
ddb.geometry.biosHeadsThe number of heads as reported by the BIOS
ddb.geometry.biosSectorsThe number of sectors as reported by the BIOS
ddb.adapterTypeDisk adapter type
ddb.toolsVersionString containing the version of the installed VMWare tools version
ddb.thinProvisionedUnknown (seen: "1")

VirtualBox has been observed to use a different case for “disk” in the section header:

# The disk Data Base

Virtual hardware version

ValueDescription
4TODO: add description
 
6TODO: add description
7TODO: add description
 
9VMWare Player/Workstation 9.0

Disk adapter types

ValueDescription
ideTODO: add description
buslogicTODO: add description
lsilogicTODO: add description
legacyESXTODO: add description

The buslogic and lsilogic values are for SCSI disks and show which virtual SCSI adapter is configured for the virtual machine. The legacyESX value is for older ESX Server virtual machines when the adapter type used in creating the virtual machine is not known.

The raw extent data file

The raw extent data file contains the actual disk data. The raw extent data file can be a file or a device.

This type of extent data file is also known as “Simple” or “Flat Extent”.

The VMDK sparse extent data file

The VMDK sparse extent data file contains the actual disk data. A VMDK sparse extent data file consists of:

  • file header
  • optional embedded descriptor file
  • optional secondary grain directory
    • optional secondary grain tables
  • (primary) grain directory
    • (primary) grain tables
  • grains
  • optional backup file header

This type of extent data file is also known as “Hosted Sparse Extent” or “Stream-Optimized Compressed Sparse Extent” when markers are used.

Note that the actual layout can vary per file, Stream-Optimized Compressed Sparse Extent have been observed to use secondary file headers.

Changes in format version 2:

  • added encrypted disk support (though this feature never seem to never have been implemented).

Changes in format version 3:

  • the size of extent files is no longer limited to 2 GiB;
  • added support for persistent changed block tracking (CBT).

Note that “CBT”, the changeTrackPath value in the descriptor file references a file that describes changed areas on the virtual disk.

File header

The file header is 512 bytes in size and consists of:

OffsetSizeValueDescription
04"KDMV"Signature
441, 2 or 3Format version
84Flags
128Maximum data number of sectors (capacity)
208Sectors per grain, which must be a power of 2 and > 8
288Embedded descriptor file start sector, which is relative from the start of the file or 0 if not set
368Embedded descriptor file size in sectors
444512The number of grains table entries
488Secondary grain directory start sector, which is relative from the start of the file or 0 if not set
568Primary grain directory start sector, which is relative from the start of the file, 0 if not set or 0xffffffffffffffff (GD_AT_END) if relative from the end of the file
648Metadata size in sectors
721Value to determine if the extent data file was cleanly closed (or dirty flag)
731'\n'Single end of line character
741' 'Non end of line character
751'\r'First double end of line character
761'\n'Second double end of line character
772Compression method
794330Unknown (Padding)

The end of line characters are used to detect corruption due to file transfers that alter line end characters.

According to Virtual Disk Format 5.0 specification the maximum data number of sectors (capacity) should be a multitude of the sectors per grain. Note that it has been observed that this is not always the case.

If the primary grain directory start sector is 0xffffffffffffffff (GD_AT_END) in a Stream-Optimized Compressed Sparse Extent there should be a secondary file header stored at offset -1024 relative from the end of the file (stream) that contains the correct grain directory start sector.

Flags

The flags consist of the following values:

ValueIdentifierDescription
0x00000001Valid new line detection test
0x00000002Use secondary grain directory. The secondary (redundant) grain directory should be used instead of the primary grain directory
As of format version 2
0x00000004Use zeroed-grain table entry. The zeroed-grain table entry overloads grain data sector number 1 to indicate the grain is sparse
Common
0x00010000Has compressed grain data
0x00020000Contains metadata, where the file contains markers to identify metadata or data blocks

Compression method

The compression method consist of the following values:

ValueIdentifierDescription
0x00000000COMPRESSION_NONENo compression
0x00000001COMPRESSION_DEFLATECompression using Deflate (RFC1951)

Markers

The markers are used in Stream-Optimized Compressed Sparse Extents. The corresponding flag must be set for markers to be present. An example of the layout of a Stream-Optimized Compressed Sparse Extent that uses markers is:

  • file header
  • embedded descriptor
  • compressed grain markers
  • grain table marker
  • grain table
  • grain directory marker
  • grain directory
  • footer marker
  • secondary file header
  • end-of-stream marker

The marker

The marker is 512 bytes in size and consists of:

OffsetSizeValueDescription
08Value
84Marker data size
If marker data size equals 0
124Marker type
164960Unknown (Padding)
If marker data size > 0
12...Compressed grain data

If the marker data size > 0 the marker is a compressed grain marker.

Marker types

ValueIdentifierDescription
0x00000000MARKER_EOSEnd-of-stream marker
0x00000001MARKER_GTGrain table (metadata) marker
0x00000002MARKER_GDGrain directory (metadata) marker
0x00000003MARKER_FOOTERFooter (metadata) marker

Compressed grain marker

The compressed grain marker indicates that compressed data follows.

OffsetSizeValueDescription
Compressed grain header
080Logical sector number
84Compressed data size
 
12...Compressed data, which contains Deflate compressed data

Note that the compressed grain data can be larger than the grain data size.

End of stream marker

The end-of-stream marker indicates the end of the virtual disk. Basically the end-of-stream marker is an empty sector block.

OffsetSizeValueDescription
080Value
840Marker data size
124MARKER_EOSMarker type
164960Unknown (Padding)

Grain table marker

The grain table marker indicates that a grain table follows the marker sector block.

OffsetSizeValueDescription
080Value
840Marker data size
124MARKER_GTMarker type
164960Unknown (Padding)
512...Grain table

Grain directory marker

The grain directory marker indicates that a grain directory follows the marker sector block.

OffsetSizeValueDescription
080Value
840Marker data size
124MARKER_GDMarker type
164960Unknown (Padding)
512...Grain directory

The footer marker indicates that a footer follows the marker sector block.

OffsetSizeValueDescription
080Value
840Marker data size
124MARKER_FOOTERMarker type
164960Unknown (Padding)
512...Footer

Grain directory

The grain directory is also referred to as level 0 metadata.

The size of the grain directory is dependent on the number of grains in the extent data file. The number of entries in the grain directory can be determined as following:

grain table size = number of grain table entries * grain size

number of grain directory entries = maximum data size / grain table size
if maximum data size % grain table size > 0:
	number of grain directory entries += 1

The grain directory consists of 32-bit grain table offsets:

OffsetSizeValueDescription
04Grain table start sector, which is relative from the start of the file or 0 if sparse or the sector is stored in the parent image

The grain directory is stored in a multitude of 512 byte sized blocks.

Note that as of VMDK sparse extent data file version 2 if the “use zeroed-grain table entry” flag is set, a start sector of 1 indicates the grain table is sparse.

Grain table

The grain table is also referred to as level 1 metadata.

The size of the grain table is of variable size. The number of entries in the grain table is stored in the file header. Note that the number of entries in the last grain table is dependent on the maximum data size and not necessarily the same as the value stored in the file header.

The grain directory consists of 32-bit grain table offsets:

OffsetSizeValueDescription
04Grain data sector number, which is relative from the start of the file or 0 if sparse or the sector is stored in the parent image

The number of entries in a grain table and should be 512, therefore the size of the grain table is 512 x 4 = 2048 bytes.

The grain table is stored in a multitude of 512 byte sized blocks.

Note that as of VMDK sparse extent data file version 2 if the “use zeroed-grain table entry” flag is set, a sector number of 1 indicates the grain table is sparse.

Grain data

In an uncompressed sparse extent data file the data is stored at the grain data sector number.

In a compressed sparse extent data file every non-sparse grain is assumed to be stored compressed.

Compressed grain data

The compressed grain data is of variable size and consists of:

OffsetSizeValueDescription
Compressed grain header
080Logical sector number
84Compressed data size
 
12...Compressed data, which contains zlib compressed data
......Unknown (Padding)

The uncompressed data size should be the grain size or less for the last grain.

The footer is only used in Stream-Optimized Compressed Sparse Extents. The footer is the same as the file header. The footer should be the last block of the disk and immediately followed by the end-of-stream marker so that they together make up the last two sectors of the disk.

The header and footer differ in that the grain directory offset value in the header is set to 0xffffffffffffffff (GD_AT_END) and in the footer to the correct value.

Changed block tracking (CBT)

TODO: complete section

The COWD sparse extent data file

The copy-on-write disk (COWD) sparse extent data file contains the actual disk data. The COW sparse extent data file consists of:

  • file header
  • grain directory
  • grain tables
  • grains

This type of extent data file is also known as ESX Server Sparse Extent.

File header

The file header is 2048 bytes in size and consists of:

OffsetSizeValueDescription
04"COWD"Signature
441Format version
840x00000003Unknown (Flags)
124Maximum data number of sectors (capacity)
164Sectors per grain
2044Grain directory start sector, which is relative from the start of the file or 0 if not set
244Number of grain directory entries
284The next free sector
In root extent data file
324The number of cylinders
364The number of heads
404The number of sectors
441016Unknown (Empty values)
In child extent data files
321024Parent file name
10564Parent generation
Common
10604Generation
106460Name
1124512Description
16364Saved generation
16408Unknown (Reserved)
16484Value to determine if the extent data file was cleanly closed (or dirty flag)
1652396Unknown (Padding)

Note that the parent file name seems not to be set in recent delta sparse extent files.

Grain directory

The grain directory is also referred to as level 0 metadata.

The size of the grain directory is dependent on the number of grains in the extent data file. The number of entries in the grain directory is stored in the file header.

The grain directory consists of 32-bit grain table offsets:

OffsetSizeValueDescription
04Grain table start sector, which is relative from the start of the file or 0 if not set

The grain directory is stored in a multitude of 512 byte sized blocks. Unused bytes are set to 0.

Grain table

The grain table is also referred to as level 1 metadata.

The size of the grain table is of variable size. The number of entries in a grain table is the fixed value of 4096.

The grain directory consists of 32-bit grain table offsets:

OffsetSizeValueDescription
04Grain sector number, which is relative from the start of the file or 0 if not set

The grain table is stored in a multitude of 512 byte sized blocks. Unused bytes are set to 0.

Change tracking file

TODO: complete section

OffsetSizeValueDescription
04"\xa2\x72\x19\xf6"Unknown (signature?)
441Unknown (version?)
84Unknown (empty values)
1240x200Unknown
168Unknown
248Unknown
324Unknown
364Unknown
404Unknown
4416Unknown (UUID?)
60...Unknown (empty values?)

Corruption scenarios

The total size specified by the number of grain table entries is lager than size specified by the maximum number of sectors. Seen in VMDK images generated by qemu-img.

Notes

The markers can be used to scan for the individual parts of the VMDK sparse extent data file if the stream has been truncated, but not that this can be very expensive process IO-wise.

References