It's been a while since I wrote about image formats themselves. This time, it's TIFF -- the
Tagged Image File Format. If JPEG is the shallow, stuck up, spoiled cheerleader who everyone thinks "why is she so popular?", then TIFF is her creepy older brother.
On the positive side, TIFF is a loss-less image format with some compression capabilities. It also can store multiple images per file. But those are really the only positive aspects.
TIFF does support a few well-define meta data fields (similar to PNG), but they are all associated with predefined two-byte tags (similar to JPEG). This means that TIFF lacks extendable meta types. If your meta data is anything other than date/time, artist, copyright, software, host computer, make, model, document name, or the image description, then you're out of luck. The format doesn't even have a generic "comment" meta type.
TIFF Format Details
The TIFF file format resembles a file system. There are a series of Image File Directory (IFD) records. The beginning of the TIFF has a pointer to the first IFD record. From there, every IFD points to the next IFD. The IFD structures are strictly a
linked-list.
Each IFD defines one image. If the TIFF only contains one picture, then there is only one IFD record. Within the IFD are a bunch of Directory Entry (DE) records. Each IFD defines the number of DE records, and each DE record has a tag that defines the purpose, type of data, number of data records, and an offset to the data itself. There is one DE per image attribute. Image width? One DE. Image height? One DE. Bits per pixel, color space, meta data, etc. -- one DE each. This makes parsing a breeze.
However, while JPEG's creepy older brother may seem to know everything about the image, his room is still a mess. The offset to the DE's data is 4 bytes. If the data length is 4 bytes or less, then the data is stored in the offset location. If the data is longer, then it is stored wherever the offset points. (Sound familiar? JPEG does this too.)
A TIFF is really a messy file format. With JPEG, PNG, BMP, GIF, and other image formats, all data is stored sequentially. However, TIFF is based on pointers to offsets in the file. So the first 8 bytes in a TIFF are well-defined -- they identify the TIFF and pointer to the first IFD. After that, everything goes to hell. You jump forward to the first IFD which has you jump forward or back for each DE record's data, then you jump again (forward or backwards) to the next IFD and repeat the jumping around. Because of all of the jumping, TIFF cannot be used as a streaming file format. TIFF's random access all destroys most performance gains from
pipelining.
Really Creepy
TIFF predates JPEG. JPEG actually uses some of TIFF's features. For example, the first two bytes in the TIFF defines the byte ordering. It uses "II" or "MM" (for Intel or Motorola) to define big-endian or little-endian formatting. If this sounds familiar, it is because JPEG does this too. (Creepy older brother. See? They're related!) JPEG also uses the two-byte tags; JPEG has more predefined tags, but this comes from TIFF. And JPEG inherited the predefined data types and count for data size (but JPEG counts are off by two).
However, there is an incestuous relationship between TIFF and JPEG. I call this the "creepy factor". You see, JPEG introduced a novel method for storing lossy images: quantization tables. With
TIFF revision 6.0 (June 1992), they added support for lossy JPEG-style compression. So now you have a horrible image format (JPEG) influencing its older brother, who was a mess to start with.
Common Usage
Between the complexity of the JPEG algorithm, and TIFF's random-access file format, this is really an ugly implementation. Fortunately, few systems use TIFF for lossy compression. In fact, I have yet to see a system that defaults to anything other than TIFF loss-less compression; TIFF supports a run-length compression scheme, modified Huffman compression, and LZW (GIF-style) compression. In general, TIFF is usually uncompressed and used in place of bitmaps.
If most TIFF encoders don't compress, then is TIFF better than a BMP? Well, kind of... BMPs have variable, platform-dependent header formats. In contrast, TIFF is a well-defined format. TIFF also supports RGB, YCbCr, CMYK, and other color space models; BMP only supports RGB (stored in BGR). However, BMP uses a flat data stream, while TIFF makes you jump all over the place. Then again, if you're just storing an uncompressed RGB image, then BMP is more efficient than TIFF.
http://en.wikipedia.org/wiki/Geotiff
GeoTIFF is probably the most common image format for distribution of airphoto and satellite imagery, as well as a few other forms of geospatial raster data.
You can draft JPG, BMP, PNG and others (who don't have an extended tagset) into crude geospatial duty by adding an external World File:
http://en.wikipedia.org/wiki/World_file
Other formats like JPEG2000
http://en.wikipedia.org/wiki/JPEG_2000
and its proprietary wavelet-based predecessors ECW and MrSID:
http://en.wikipedia.org/wiki/ECW_%28file_format%29
http://en.wikipedia.org/wiki/Mrsid
also incorporate geospatial metadata. But GeoTIFF is still the widest-supported format.
TIFF itself had some thrice-damned offspring (maternity unknown, devilspawn suspected) in the form of Adobe's proprietary but royalty-free Digital Negative (DNG) format:
http://en.wikipedia.org/wiki/Digital_Negative_%28file_format%29
TIFF's genes will be with us for a long time.
JPEG's been sleeping around too, perhaps you haven't seen the family Christmas photo with JPEG-XR (everyone still calls her "HD-Photo")
http://en.wikipedia.org/wiki/HD_Photo
HD-Photo is not not talking about her father, but a chromosome test on shows JPEG Compression algorithm but TIFF container format. Ewww.
Great feedback. I'm still laughing.
TIFF is to images like FTP is to network protocols.
FTP was the basis for email (SMTP), news (NNTP), web (HTTP), and many other protocols. That "404 File Not Found"? Yeah, it has a basis in FTP. (If you use FTP on a command-line, then you've seen those three-digit codes whenever a you issue a command. "200 OK" -- yeah, that's FTP. Web stole it.)
In today's network world, FTP has some serious security limitations. Many of those limitations have carried though to email (blame FTP's weak authentication and it's derivative protocol "email" for spam) and the web (cookie theft, session hijacking, etc.). We should replace all of these with new protocols, but these are so ingrained that everyone still uses them as the basis for new protocols.
By the same means, all of TIFF's evil spawn share the same limitations. We should come up with something new, but displacing this historic image format seems more difficult than replacing the Iranian incumbent.
With regards to GeoTIFF: you can put lipstick on a pig...