Connecting the iDOTs
Tuesday, 8 September 2020
Last night's insomnia gave me some time to work through one of the big PNG forensics mysteries in life: What does Apple's iDOT chunk do?
The PNG data format divides sections into "chunks". Each chunk has a data length, a four-character name that determines the type of data, and then the data itself. (At the end of each chunk is also a checksum for validation.)
The most common PNG chunks are well-defined by the PNG standard. For example:
This is nowhere near the entire list. There are a few standard-and-required chunks (IHDR, IDAT, and IEND; PLTE is only required if IHDR says there is a palette). There are dozens of standard-but-optional chunks (pHYs, cHRM, iTXT, etc.). On top of this, applications can define non-standard and proprietary chunks. For example:
The iDOT chunk is create by Apple devices. It appears to be generated by both mobile and desktop systems.
I don't know when Apple first introduced iDOT. FotoForensics has been around since 2012, and I've seen them since the site first started. However, the volume of PNG images with the iDOT field has dramatically increased. I don't think this is because Apple devices have increased in popularity. Rather, I think that Apple has integrated it into more of their applications and image libraries.
The PNG naming convention (based on the weird capitalizations) notes how the iDOT chunk should be used:
I spent a late night looking over the iDOT structure, and I think I've got it!
The iDOT chunk contains 28 bytes that are divided into seven 4-byte words (in big endian). The fields appear to be:
With Apple, every IDAT chunk contains 16,384 bytes. Only the very last chunk can contain fewer bytes (because it just stores whatever is left).
Using this kind of chunking, you can't just point to an IDAT and say "that contains data for row x". Since the data is streamed and compressed, you have no idea what rows are contained in an IDAT until you decompress the data. ... unless there is an iDOT chunk.
The iDOT contains a pointer to an IDAT chunk that contains the start of the second half of the picture. Technically, it's the offset from the start of the iDOT chunk. (The offset is relative to the start of the iDOT and not the start of the file.)
For example, here are the chunks from an Apple PNG file that contains an iDOT record:
With JPEG files, there's a property called the 'restart interval' that can permit parallel processes to decode the image stream. (Because sometimes fractions of a second to decode an image isn't fast enough?) Keep in mind, I have never seen any JPEG libraries use parallel processes to decode a JPEG, and I think it's backwards to have the encoder determine the parallel performance used by the decoder.
Anyway, that zLib sync flush could permit parallel decoding. And the iDOT would help the 2nd decoding thread jump to the correct point in the IDAT structure. So maybe Apple wants to use two threads to decode one PNG a little faster. (Given how fast a PNG decodes, I don't see any real speed benefit here.)
Alternately, the iDOT block can be used for tamper detection.
Now that I've finally worked out what the iDOT chunk is and how it works, perhaps I can sleep better at night.
PNG Chunks
Yes, they are really called "chunks" and Apple provides their own proprietary one called an iDOT. As far as I can tell, nobody has managed to reverse-engineer it. (I really wanted to work into this blog a pun about Apple blowing chunks, but I couldn't make it fit the theme.)The PNG data format divides sections into "chunks". Each chunk has a data length, a four-character name that determines the type of data, and then the data itself. (At the end of each chunk is also a checksum for validation.)
The most common PNG chunks are well-defined by the PNG standard. For example:
- IHDR: Defines the image header.
- PLTE: Defines the optional color palette.
- cHRM: Defines chromaticities and white point for optional color corrections.
- iTXT: Stores international text data.
- pHYs: Stores physical pixel dimensions.
- IDAT: Defines the image data.
- IEND: End of image marker.
This is nowhere near the entire list. There are a few standard-and-required chunks (IHDR, IDAT, and IEND; PLTE is only required if IHDR says there is a palette). There are dozens of standard-but-optional chunks (pHYs, cHRM, iTXT, etc.). On top of this, applications can define non-standard and proprietary chunks. For example:
- meTa: When saving a PNG, Microsoft may include a meTa chunk that contains metadata.
- skMf and skRf: Skitch (software) for the Mac includes these chunks.
- caNv, orNT, vpAg: ImageMagick can generate a range of non-standard chunks. (These tags are for defining a canvas, setting the image orientation, and describing any virtual page information.)
A Brief History of iDOT
Here's what I know about iDOT:The iDOT chunk is create by Apple devices. It appears to be generated by both mobile and desktop systems.
I don't know when Apple first introduced iDOT. FotoForensics has been around since 2012, and I've seen them since the site first started. However, the volume of PNG images with the iDOT field has dramatically increased. I don't think this is because Apple devices have increased in popularity. Rather, I think that Apple has integrated it into more of their applications and image libraries.
The PNG naming convention (based on the weird capitalizations) notes how the iDOT chunk should be used:
- First letter: A capital denotes a chunk that is required for rendering the picture. Lowercase means ancillary (not critical) for rendering the image. The lowercase "i" means you can ignore this chunk and still render the picture.
- Second letter: A capital denotes a public/registered chunk, while lowercase is for proprietary or non-standard. With this chunk, the capital "D" should denote a public standard, but Apple's iDOT is not defined in any public standard that I can find. (The name should be "idOT", not "iDOT".)
- Third letter: Always a capital letter.
- Fourth letter: A lowercase letter means it is safe to copy if you generate a new PNG. A capital letter (T) means that a new PNG file should not copy over this chunk. (Either re-generate it or delete it when making a derivative PNG file.)
Reverse-Engineering iDOT
If you search Google for PNG and "iDOT", you'll find lots of people asking about the purpose. For example:- 2015: "What is the iDOT chunk?"
- 2016: "Apple screenshot PNGs contain non-standard chunk"
- 2016: A user speculates about the iDOT chunk, saying, "The iDOT chunk from Apple appears to be the only metadata which could be used to automate such a size tagging. It is not standardized, will be destroyed by image editors, and doesn’t seem to cater for non-integer pixel densities."
I spent a late night looking over the iDOT structure, and I think I've got it!
The iDOT chunk contains 28 bytes that are divided into seven 4-byte words (in big endian). The fields appear to be:
- uint32_t: Height divisor. Apple divides the image vertically. The only value I have ever seen is "2".
- uint32_t: Unknown. I don't know the purpose, but it's always "0". This might be reserved for flags.
- uint32_t: Divided height. The IHDR block defines the image height. This is half the height. Since I've only ever seen a divisor of 2 and Apple PNG images with an even number of lines, I don't know if this serves some other purpose. It could be the maximum height after dividing the image (rounding up).
- uint32_t: Unknown. The only value I have ever seen is "0x40". (The iDOT chunk contains 28 bytes. The value 0x28 in hex is 40. So maybe this is related?)
- uint32_t: First half height. The divisor splits the image vertically, into two halves. This is the height of the first half.
- uint32_t: Second half height. Since I've only seen a divisor of 2 and images that have an even number of rows, the divided height, first half, and second half are all the same value.
- uint32_t: IDAT restart offset. This is the important value and I think it's the entire purpose of the iDOT block. But to understand it, you have to understand the IDAT chunks.
A closer look at image data (IDAT)
When a PNG encodes a picture, it stores every pixel in a raster -- from left to right and then top to bottom. This creates a stream of pixels. The stream is then compressed using the zLib library's "deflate" algorithm. (This is the most common compression method.) The compressed bytes are then stored in one or more IDAT chunks. Some PNG libraries just write one large IDAT chunk that contains all of the compressed data. Others write a bunch of IDAT chunks. (The segmentation into smaller chunks can result in easier decoding if the library decompresses each IDAT chunk during processing.)With Apple, every IDAT chunk contains 16,384 bytes. Only the very last chunk can contain fewer bytes (because it just stores whatever is left).
Using this kind of chunking, you can't just point to an IDAT and say "that contains data for row x". Since the data is streamed and compressed, you have no idea what rows are contained in an IDAT until you decompress the data. ... unless there is an iDOT chunk.
The iDOT contains a pointer to an IDAT chunk that contains the start of the second half of the picture. Technically, it's the offset from the start of the iDOT chunk. (The offset is relative to the start of the iDOT and not the start of the file.)
For example, here are the chunks from an Apple PNG file that contains an iDOT record:
IHDR: Image Header : This image is 500x408This image starts with an IHDR and an iDOT, then has a bunch of IDAT records and the IEND.
iDOT: Apple Data : The half-height is 204 rows.
IDAT: Image Data, 16,384 bytes : Start of row #1
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 10,283 bytes : End of row #204
IDAT: Image Data, 16,384 bytes : Start of row #205
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 16,384 bytes
IDAT: Image Data, 489 bytes : End of row #408
IEND: Image Trailer
- The IDAT records are all 16,384 bytes long, with two exceptions.
- The last IDAT record doesn't have enough data to fill out the full 16,384 bytes, so it's just the remainder. (That's expected with any PNG.)
- There is one IDAT near the middle that is only 10,283 bytes. I may not know what rows are contained in the prior IDAT blocks, but I know that this IDAT contains the end of the first half of the image. If you look at the binary compressed stream, you'll see that it ends of "00 00 ff ff", denoting a zLib sync flush.
- The very next IDAT is the start of the second half of the picture.
Why iDOT?
I can't find any public documents where Apple says how they use this iDOT chunk. However, they are clearly using it. For example:- Many people have reported seeing an "iDOT doesn't point to valid IDAT chunk" error. I believe this happens when a PNG encoder copies the iDOT chunk to a new PNG. (The name 'iDOT' has a capital for the 4th letter, meaning that the chunk should not be copied to a new PNG.) Since re-encoding the PNG changes the IDAT flow, the iDOT offset to the second half of the picture's IDAT becomes wrong.
- CVE-2016-1811 identifies a bug where a bad iDOT chunk can crash Apple's ImageIO processor.
With JPEG files, there's a property called the 'restart interval' that can permit parallel processes to decode the image stream. (Because sometimes fractions of a second to decode an image isn't fast enough?) Keep in mind, I have never seen any JPEG libraries use parallel processes to decode a JPEG, and I think it's backwards to have the encoder determine the parallel performance used by the decoder.
Anyway, that zLib sync flush could permit parallel decoding. And the iDOT would help the 2nd decoding thread jump to the correct point in the IDAT structure. So maybe Apple wants to use two threads to decode one PNG a little faster. (Given how fast a PNG decodes, I don't see any real speed benefit here.)
Alternately, the iDOT block can be used for tamper detection.
- If the IHDR image height doesn't match the height found in the iDOT chunk, then someone tampered with the image.
- If the iDOT offset doesn't point to a valid iDAT, then someone tampered with the image. (This includes copying the iDOT chunk to a different PNG file.)
- If the IDAT sync is missing or the pointed half-way point isn't at the half-way point denoted by the iDOT restart offset, then someone tampered with the image.
Now that I've finally worked out what the iDOT chunk is and how it works, perhaps I can sleep better at night.

The reason was indeed performance: on the first retina iPads, decoding PNGs was a huge portion of the total launch times for some apps, while one of the two cores sat completely idle. My guess is that it's still at least somewhat useful today, even with much much faster single-thread performance, because screen sizes have also grown quite a bit.
It makes a huge difference for example for test runs on large data sets if the time for loading test images is larger than the actual machine vision algorithm to be tested.