I use
FreeImage for image analysis because it supports loading dozens of different image file formats (and because I like the
FIPL licensing). However, FreeImage does not support every format that I need. For example, it has problems with PNG files that use something other than 8-bit data. It also has problems with 16-bit PPM/PGM files and it does not support lossless JPEG compression. So for these formats (and a few others), I use my own rendering systems.
However, I seem to be coming across more odd file formats that FreeImage simply does not recognize. In these cases, I usually implement my own rendering system. For example, I recently encountered the Digital Imaging and Communications in Medicine (DICOM) file format. Wow... I used to think that JPEG was a horrible file format because it is
inconsistent and
cannot store true color. And
TIFF is bad because you jump all over the file. And PDF is bad due to the complexity (objects reference objects reference objects...). But then comes along DICOM, and suddenly I have a newfound respect for JPEG.
Is there a doctor in the house?
Let's face it: hospitals have problems. Many hospitals and doctors offices still use paper records and digital records are not necessarily compatible between vendors. If you switch hospitals, then you either need to bring over your medical records or you need to start with no records on file.
So it should come as no surprise that the image file format used for CT scans, MR scans, PET, and other imaging systems uses a format that, on the surface appears complete, yet actually isn't.
Thinking Positively
With JPEG, PNG, and other file formats, the image usually comprises most of the file. However, many DICOM files store much more meta information that images. Each DICOM file includes information about the patient (name, age, gender), doctor (name, address, hospital, referring physician, etc.), and the image itself (date, time, sequence number, equipment, scanning duration, relative slice position, and much more). For the medical community, this is outstanding! You will never have a case of the image being separated from the relevant medical information.
Although DICOM is capable of storing multiple images in one file, this appears to be uncommon. Instead, each picture in a series is stored in an independent DICOM file. Each file stores one image plus all of the relevant meta data. I view this as a positive thing -- it makes it very easy to send one specific image to a coworker for a second opinion. (Since a DICOM can be a multi-megabyte file, why send a Gig of images when you can send a meg?)
Unlike PNG or JPEG, DICOM has literally thousands of pre-defined meta fields. This is beneficial since it guarantees that everyone will operate with compatible meta data. The list of meta data fields reads like a committee trying to guess about every possible need -- and covering it! For example, there is no generic "comment" field. But there are fields for Patient Comments, Image Comments, Frame Comments, Study Comments, Comments on Radiation Dose, Comments on Schedule Procedure Steps, and much more.
While DICOM overflows with detailed meta data, it totally lacks flexibility. For example, let's say that a new scanner appears on the market tomorrow. They want to be compatible, so they want to use DICOM to record their medical images. The doctor, patient, and related information is still relevant. However, there is no place to store the new, scanner-specific parameters. At this point, the vendor has three options: (1) use vendor-specific fields (incompatible with other vendors), (2) submit new fields to the
Medical Imaging and Technology Alliance (committee that updates the standard
almost annually), or (3) don't record the data (not useful).
Moreover, every meta data field is defined by a 4-byte numeric type. Thus, if you don't already know what 0018:9090 means, then you are out of luck -- I have yet to find a central repository that details each standard tag and type of data that goes with it.

Most DICOM seem to store 12-bit or 16-bit grayscale images. I applied a coloring algorithm.
Unnecessary Procedures
The problem with DICOM really doesn't come from the plethora of meta data. Rather, it comes from the file format itself. The basic DICOM record consists of tag-type-length-data blocks. However, they are a little more complicated.
Tags. The tags actually consist of two fields. The first two bytes define the class of data element. For example 0x0010 defines patient information, 0x0028 describes the image file format, and 0x300A seems to be for radiation therapy. The next two bytes describe the specific meta data. For example, 0028:0010 and 0028:0011 are the image width and height (called "Rows" and "Columns"), 0010:21C0 is the patient's pregnancy status, and 300A:0018 is for radiation dose reference point coordinates.
Type. Called the "Value Representation" (or VR), this two-byte field defines the type of data. Most data types are pretty standard: short string, unsigned integer, date, float, etc. The only real odd-balls are the "other" fields -- other byte (OB), other word (OW), and other float (OF). These define arbitrary data lists.
Length. This is the amount of data following the tag. Unlike JPEG or TIFF, this is actually the number of bytes
not including the tag-vr-length. (Excellent! Zero actually means no data! With JPEG, no data is a length of 2.)
This seems simple enough to parse, right? Except that there are a few exceptions... For example:
- All length are 16-bits (2-bytes) long. Except for records with a VR of OB, OW, OF, sequences (SQ), unlimited text (UT), and unknown (UN). These exceptions use 4-byte lengths.
- Tag 0002:0010 defines the transfer encoding. One encoding method specifies no VR field. In this case, there may be no VR in records following this record and all lengths for undefined VR fields are 4-bytes long. Also, if you don't know the default VR for the particular record, then you cannot parse the data. (If you don't know that 0008:0092 should be a string, then you won't be able to parse the referring physician's address.)
- Assuming you can parse the lengths, some records have a length of 0xffffffff. This means that the length is unspecified and the subsequent records are nested under this record. (Just keep reading the next records.) Eventually there will be an FFFE:E0DD tag with length zero. This tag marks the end of the nested data sequence. Oh, and these undefined length records can be nested...
- Most tags only store one piece of data, but some have complex data formats. For example, text fields may use "^", "\", or "/" to separate strings. (Doctor names use "^" as in "REIL^TODD^D.^^M.D.", image type uses "\", and addresses use "/".)
- All values are stored in little endian format. Unless Tag 0002:0010 defines big endian. In that case, 16-bit image data should be processed in big endian but all other values remain in little endian.
Frankly, the data fields should be consistent. Either always specify the type of data or require the type to be known. Making the data type optional is just going to lead to parsing problems. While I'm not a big fan of hard-coded tag numbers (0028:0031 is hard-coded as the zoom factor), they could at least embed the type of data in the tag. For example, there are 27 different data types. They could reserve 5 bits in the Tag for specifying the data type, or 1 bit for indicating 2 or 4 bytes for the length.
I should also point out that you must be able to parse the entire file. Image data is almost always found at the very end. So if you cannot parse the file then you cannot find the start of the image data.
Wrong Prescriptions
So you finally found the image data itself (record 7FE0:0010). The image data is either stored in one big block (OB or OW), or in multiple fragments (frames).
Image data itself is even more complicated. For example, 0028:0100 defines the number of bits allocated per pixel data record (either 8 or 16). (I don't know why they use this when OB/OW also define the data size. But I like using 0028:0100 better.) 0028:00101 defines the number of bits stored, and 0028:0102 defines the position of the most significant bit (MSB). So, for example, they may allocate 16 bits per sample, use 12 bits, and have the MSB at position 14. This means bits 3-14 contain the data, and bits 0, 1, 2, and 15 are unused. (Unless there are overlays... yet more complexity.)
And what format is the data in? This is defined by Tag 0002:0010 (Transfer Syntax UID). The value is a sequence of numbers (that looks a lot like an SNMP MIB entry). For example, 1.2.840.10008.1.2 defines an implicit VR (VR field is removed) and the image is a bitmap in little endian. 1.2.840.10008.1.2.2 defines a bitmap in big endian. 1.2.840.10008.1.2.4.50 through 1.2.840.10008.1.2.4.93 are different JPEG formats (lossy and lossless, JPEG and JPEG-2000). 1.2.840.10008.1.2.5 uses a bitmap with run-length encoding while 1.2.840.10008.1.2.1.99 uses zlib compression. Exiftool identifies
over 250 different possible values. (I currently support all JPEGs and bitmaps, but nothing else. I'll be adding RLE this weekend.)
Of course, this isn't even including the 3D image information... There's an entirely different section for storing voxel data.
Feeling Better
Consistency is always a good thing. With image formats, consistency simplifies parsing, reduces potential implementation problems, and increases compatibility. While DICOM includes an amazing number of pre-defined data fields, it lacks generic expansion and the hard-coded tag identifiers restrict the total number of possible tags (there is a finite amount of expansion). DICOM's meta fields are great for today, but could be problematic in the future.
The inconsistent file format options and overly complex 0002:0010 tag means that a parser must understand hundreds of very different variations before being able to render an arbitrary DICOM file. While I can understand the need for lossy, lossless, and 3D data storage, couldn't they decided on a single lossless method? I mean, all lossless will look the same anyway since they are
lossless. So why provide a dozen different storage methods? (Alright, DICOM dates back to 1985 so some formats may be old residues. Yet none have been obsoleted. And this doesn't explain the dozens of system-specific formats like 1.2.840.10008.5.1.4.1.1.2, which is a CT Image Storage format, or 1.2.840.10008.5.1.4.1.1.1.2, which is Digital Mammography X-Ray Image Storage for presentations as opposed to 1.2.840.10008.5.1.4.1.1.1.2.1 which is the same thing for processing.)
While DICOM is very detailed, it lacks consistency in file structure, meta data format, and image format. All of the problems with the medical community have clearly been passed into their image file format. And with all the implementation complexity, medical software is bound to be more expensive. Perhaps the medical community should consult a computer scientist before creating any future image formats. I don't prescribe medicines and they should not design file formats.
If there are errors, let me know and I will correct them.
You can find the standard at:
ftp://medical.nema.org/medical/dicom/
Section 5 covers the file format. I've been focusing on 2007 and 2009, which are practically identical:
ftp://medical.nema.org/medical/dicom/2007/07_05pu.pdf
ftp://medical.nema.org/medical/dicom/2009/09_05pu.pdf
For the long list of tags and types, see DICOM.pm from the Exiftool package (link is above).
I've currently downloaded about a hundred DICOM example files. (Many are found on the nema.org FTP server.) The example files use everything from RLE to 8- and 16-bit bitmaps, 8 and 16-bit lossless JPEGs, lossy JPEGs, and even 3D data. Most of the samples are grayscale but a few have color bitmaps.
Can you please explain the algorithm of coloring you used for grayscale dicom images. Or give me a reference to it's description. Because the result is looking pretty good and I would like to implement it by myself.
Thanks in advance.
I've implemented about a dozen different colorizing algorithms.
The torso was colored using a sinusoid with a double-circle.
(Range goes from 0 to 4*PI.)
The hand was colored using a sinusoid with a single circle.
(Range goes from 0 to 2*PI.)
The basic sinusoid algorithm is described at:
http://www.dca.fee.unicamp.br/dipcourse/html-dip/c4/s10/front-page.html
The hand also looks great when colored using the standard medical coloring, but not when using weathermap or visible spectrum color scales.
-Neal
I just found your site and many interesting things about image processing.
About the FreeImage library, it can load/save 16-bit data (png, pgm/ppm) and also handle many more data formats (e.g. HDR formats). Check the last documentation at the FreeImage home page and if its not enough for your experimentations, feel free to ask for new features using one of the FreeImage forums.
Also, keep up this good blog
Hervé Drolon
FreeImage Project Manager