Continuing Education
Monday, 8 June 2015
While error level analysis (ELA) seems like a simple enough concept, budding analysts need to understand what the algorithm does, how it works, and how to apply it. By itself, ELA highlights the various compression level potentials across an image (analogous to adding dye to a petri dish). However, the analyst needs to know what to look for in the results. There's the straightforward "significantly different" coloring, the more subtle chrominance separation (rainbowing), and other artifacts that alter the compression rate across the image.
To help with this learning curve, I developed tutorials and challenges. The tutorials describe how the algorithm works and the challenges allow people to test their knowledge. Since different teaching methods work better for different people, it is good to offer a variety of training methods.
I recently ran the stats on the FotoForensics tutorials. I checked them by weekly and monthly distributions and the results were consistent: about a third of visitors to the site (35% average) visit the tutorials page. However, only about a tenth of them actually spend more than a few seconds on the tutorials pages in a given week. The challenges average about 7%, but those users appear to work on at least one challenge puzzle.
I also looked for longer trends. A solid third (34%) of unique network addresses have spent time with the tutorials and/or challenges in the previous year. (I keep thinking: a free site where a third of the users are actually reading the training materials? WOW!)
I even see people applying what they learned. FotoForensics has been very popular with the Reddit community. A few years ago, people gave a lot of bad interpretations (e.g., "white means modified" or "color means fake" or "ELA doesn't work"). However, those ~30% of users who took the time to learn have become a dominant force at Reddit. When someone posts a link to FotoForensics, it is usually followed by someone asking what it means, and someone else giving an intelligent answer.
On one hand, this tells me that the tutorials and challenges are easy enough for users to find on the site. And about 3 in 10 users are interested enough to take the time to learn how it works. (I am open to suggestions for other possible training options that could engage with more of the other 70%.)
Unfortunately, I still see people misapplying the technology or giving really bad advice on how it works.
But let's go back a moment and talk about compression...
JPEG is based on a lossy compression system. By "lossy", we mean that the decompressed data does not look exactly like the pre-compressed data. What comes out is similar, but not exactly like what went in. Since there's a little difference, it loses quality. Even saving a JPEG at "100%" will result in a little data loss; what most tools call "100%" is actually closer to 99%. The purpose of the lossy compression is to make as many repetitive zeros and small values as possible in the encoded sequence, while remaining visually similar to the source image. More repitition leads to better compression.
The lossy compression works by quantizing the values; effectively turning a smooth curve into stair steps. For example the quantization value "3" would make the values "40 20 10 5 1" become "13 6 3 1 0". JPEG uses integer math so fractions after dividing by 3 are lost.
To restore the sequence, the values are multiplied by the quantization value: "39 18 9 3 0". Each of these decoded values are close enough to the source values. When talking about pixels, the human eye is unlikely to notice any difference. (The actual JPEG encoding method is a little more complicated and includes 64 quantization values as well as some other compression steps. For much more detail about JPEG encoding, see JPEG Bitstream Bytes.)
If JPEG encoded RGB values, then this would be it. The first encoding would generates a little loss but repeated encoding/decoding cycles would not. Unfortunately, JPEG does not encode RGB values. Instead, it first converts the values from RGB to YUV (an alternate color space). This conversion is lossy and causes values to shift a little. This means two things. First, JPEG cannot store true 24-bit color. Second, the values may shift a little between the first decoding and second encoding steps, so the next encoding may result in values that are a little different.
But JPEG doesn't stop there. It also converts the colors from the 8x8 grid pixel space to an 8x8 frequency space. This conversion uses a discrete cosine transform (DCT). When you see the word "cosine" you should be thinking "floating point values". Except that JPEG does everything with integers so fractions get truncated. Simply repeatedly encoding and decoding the DCT values with integer math will result in constant degradation. When combined with the quantization step, it results in significant degradation.
I say that the compression "constantly" degrades, but it really does stop eventually. With JPEG encoding, the first save at a given quality level (e.g., save as 80%) causes the most data loss. Subsequent decode and re-encode cycles at the same quality level will result in less and less loss. The first save causes the most loss. The second causes some loss, but not as much as the first time. The third save causes less loss than the second, etc. You would probably have to resave a JPEG over a dozen times to see it normalize, but it should eventually stop degrading, unless you use Photoshop. With Adobe products, JPEGs may take thousands of resaves to normalize, and they will look very distorted.
Because nothing else was altered between saves, the first and second resaves are very similar. The first save removed most of the artifacts and the second save removed a few artifacts. (If you look in the ELA map at the cup's lid, you may notice that some of the small white squares are gone in the second resave.)
While the picture's content may not be very exciting, it does have a couple of great attributes:
When a picture is edited, the modified areas are likely at a different compression level than the rest of the picture. This is how we know that the sidewalk picture (beginning of this post) was digitally altered. We do not make the determination by saying "white means edited". Instead, we identify that a section of each sidewalk cover is inconsistent with the rest of the picture. This inconsistency permits identifying the edit.
The thing to remember is that ELA maps out the error level potential -- the amount of expected loss during a resave. If a picture is resaved too many times, then the compression level becomes normalized. At that point, subsequent resaves at the same quality level will not alter the picture. This results in a black, or near black ELA map.
Unfortunately, it is still common to see people who don't read the tutorials and claim that ELA does not work by uploading a low quality picture as their proof. Alternately, they upload a picture that has undergone global modifications (e.g., scaling or recoloring) that changes all pixel values, resulting in a higher/whiter ELA compression map. But even in these cases, ELA still functions properly -- it still generates a topology map that represents the potential compression rate across the picture. This may not be useful for identifying if a picture was spliced, but it is useful for detecting what happened during the last save.

In Bellingcat's analysis, they claims that the picture was altered because the five regions (A-E) look different. However, they failed to remember to compare similar attributes:
Using ELA, we cannot determine the authenticity of this picture: we cannot tell if it is real, and we cannot tell if it is fake. We can only conclude that this is a low quality picture and that the black text on white annotations were added last. If there was a higher quality version of this picture (without the annotations), then we would have a better chance at detecting any potential alterations.
Unfortunately, other forensic experts chose to blame the tool rather than the uneducated users (yes Bellingcat, I'm calling you uneducated). For example, Der Spiegel quoted German image forensics expert Jens Kriese as saying:
I also find it a little ironic that Farid's statements, "mis-characterize authentic images as altered" and "failed to detect alterations", can be explicitly applied to his own "izitru" and "FourMatch" commercial products. Unless the picture is a camera original, izitru will report that it could be altered. In effect, virtually everything online could be altered.
Both Kriese and Farid are correct that the Bellingcat report is bogus. However, they both incorrectly blame the problem on ELA. It's not the tool that is in error, it's the authors of the Bellingcat report.
Update 2015-06-09: Shortly after informing Der Spiegel of the flaws with their coverage of Bellingcat's report, SputnikNews reported an apology from Der Spiegel. (They did not link to Der Spiegel, so I cannot authenticate the apology.) They stated that "yet to be verified explosive news, is exactly what Der Spiegel failed to do at least twice during the initial coverage of the Bellingcat MH17 report" and concluded by saying, "Der Spiegel is on its way back to quality journalism." I can only hope that they continue to make fact checking part of their editorial process.
Occasionally I debunk algorithms published in scientific journals. In the near future, I'll cover a widely deployed forensic algorithm -- that was published in a peer-reviewed journal. This algorithm is used by many forensic analysts and even taught in a few classes. But is so unreliable that it has virtually no practical value.
To help with this learning curve, I developed tutorials and challenges. The tutorials describe how the algorithm works and the challenges allow people to test their knowledge. Since different teaching methods work better for different people, it is good to offer a variety of training methods.
I recently ran the stats on the FotoForensics tutorials. I checked them by weekly and monthly distributions and the results were consistent: about a third of visitors to the site (35% average) visit the tutorials page. However, only about a tenth of them actually spend more than a few seconds on the tutorials pages in a given week. The challenges average about 7%, but those users appear to work on at least one challenge puzzle.
I also looked for longer trends. A solid third (34%) of unique network addresses have spent time with the tutorials and/or challenges in the previous year. (I keep thinking: a free site where a third of the users are actually reading the training materials? WOW!)
I even see people applying what they learned. FotoForensics has been very popular with the Reddit community. A few years ago, people gave a lot of bad interpretations (e.g., "white means modified" or "color means fake" or "ELA doesn't work"). However, those ~30% of users who took the time to learn have become a dominant force at Reddit. When someone posts a link to FotoForensics, it is usually followed by someone asking what it means, and someone else giving an intelligent answer.
On one hand, this tells me that the tutorials and challenges are easy enough for users to find on the site. And about 3 in 10 users are interested enough to take the time to learn how it works. (I am open to suggestions for other possible training options that could engage with more of the other 70%.)
Unfortunately, I still see people misapplying the technology or giving really bad advice on how it works.
It's all about compression
ELA does one thing: it quantifies the lossy compression error potential over the image. It returns a map that shows the compression level over the image. It doesn't return a numerical value ("7") or summary ("green" or "true") because different types of alterations may be considered acceptable. For example, if a picture is 95% unaltered, then would you call it real or fake? With a map of the picture, you can identify the abnormal area.![]()
Over at Reddit, a tom_beale posted to "mildly infuriating" some sidewalk covers that were put back wrong. User Afterfx21 "fixed it". The compression map generated by ELA makes it easy to identify how it was "fixed".
But let's go back a moment and talk about compression...
JPEG is based on a lossy compression system. By "lossy", we mean that the decompressed data does not look exactly like the pre-compressed data. What comes out is similar, but not exactly like what went in. Since there's a little difference, it loses quality. Even saving a JPEG at "100%" will result in a little data loss; what most tools call "100%" is actually closer to 99%. The purpose of the lossy compression is to make as many repetitive zeros and small values as possible in the encoded sequence, while remaining visually similar to the source image. More repitition leads to better compression.
The lossy compression works by quantizing the values; effectively turning a smooth curve into stair steps. For example the quantization value "3" would make the values "40 20 10 5 1" become "13 6 3 1 0". JPEG uses integer math so fractions after dividing by 3 are lost.
To restore the sequence, the values are multiplied by the quantization value: "39 18 9 3 0". Each of these decoded values are close enough to the source values. When talking about pixels, the human eye is unlikely to notice any difference. (The actual JPEG encoding method is a little more complicated and includes 64 quantization values as well as some other compression steps. For much more detail about JPEG encoding, see JPEG Bitstream Bytes.)
Additional JPEG compression loss
If we just use one quantization value and repeatedly cycle between encoding and decoding, then the first encoding will cause data loss but the remainder will not.Encoding "40 20 10 5 1" with quantizer "3" generates "13 6 3 1 0". (Encoding is a division with integer math.)
Decoding "13 6 3 1 0" with quantizer "3" generates "39 18 9 3 0". (Decoding is a multiplication.)
Encoding "39 18 9 3 0" with "3" generates "13 6 3 1 0". (Same value)
Decoding "13 6 3 1 0" with quantizer "3" generates "39 18 9 3 0". (Same value)
If JPEG encoded RGB values, then this would be it. The first encoding would generates a little loss but repeated encoding/decoding cycles would not. Unfortunately, JPEG does not encode RGB values. Instead, it first converts the values from RGB to YUV (an alternate color space). This conversion is lossy and causes values to shift a little. This means two things. First, JPEG cannot store true 24-bit color. Second, the values may shift a little between the first decoding and second encoding steps, so the next encoding may result in values that are a little different.
But JPEG doesn't stop there. It also converts the colors from the 8x8 grid pixel space to an 8x8 frequency space. This conversion uses a discrete cosine transform (DCT). When you see the word "cosine" you should be thinking "floating point values". Except that JPEG does everything with integers so fractions get truncated. Simply repeatedly encoding and decoding the DCT values with integer math will result in constant degradation. When combined with the quantization step, it results in significant degradation.
I say that the compression "constantly" degrades, but it really does stop eventually. With JPEG encoding, the first save at a given quality level (e.g., save as 80%) causes the most data loss. Subsequent decode and re-encode cycles at the same quality level will result in less and less loss. The first save causes the most loss. The second causes some loss, but not as much as the first time. The third save causes less loss than the second, etc. You would probably have to resave a JPEG over a dozen times to see it normalize, but it should eventually stop degrading, unless you use Photoshop. With Adobe products, JPEGs may take thousands of resaves to normalize, and they will look very distorted.
Detecting loss
The impact from this lossy compression is detectable. For this example, I'll use a photo that I took yesterday...![]() | ![]() | Camera original. |
![]() | ![]() | Resaved with Photoshop CS5 at "high" quality (first resave). |
![]() | ![]() | Resaved first resave with with Photoshop CS5 at "high" quality (second resave). With this picture, the second resave is only a little different from the first resave. However, the amount of change between the first and second resaves really depends on the picture. The only consistency is that the second resave will not change more than the first resave. |
Because nothing else was altered between saves, the first and second resaves are very similar. The first save removed most of the artifacts and the second save removed a few artifacts. (If you look in the ELA map at the cup's lid, you may notice that some of the small white squares are gone in the second resave.)
While the picture's content may not be very exciting, it does have a couple of great attributes:
- There are large areas of mostly white and mostly black. Solid colors compress very efficiently. As a result, the white on the lid, white sunlight on the floor, and part of the black border on the laptop's monitor all appear solid black under ELA. These areas were so efficiently compressed in the original image that they didn't change between resaves.
- There are visible high-contrast edges. For example, the white cup against the brown table, black laptop against the brown table, and the ribs in the dark brown chairs against the light brown wall. All of these edges have similar ELA intensities.
- There are lots of mostly flat surfaces. The white cup, the lid, most of the black laptop, the wall in the background, and even the low-contrast table (where the sunlight is not bringing out details). These are all surfaces and they are all at the same ELA intensity.
- There are textured surfaces, denoted as small regions with high-detail patterns: the text on the cup, the computer screen, the keyboard letters are visually similar (white/black) and have similar intensities.
When a picture is edited, the modified areas are likely at a different compression level than the rest of the picture. This is how we know that the sidewalk picture (beginning of this post) was digitally altered. We do not make the determination by saying "white means edited". Instead, we identify that a section of each sidewalk cover is inconsistent with the rest of the picture. This inconsistency permits identifying the edit.
The thing to remember is that ELA maps out the error level potential -- the amount of expected loss during a resave. If a picture is resaved too many times, then the compression level becomes normalized. At that point, subsequent resaves at the same quality level will not alter the picture. This results in a black, or near black ELA map.
![]() | ![]() | Original resaved at a low quality |
Unfortunately, it is still common to see people who don't read the tutorials and claim that ELA does not work by uploading a low quality picture as their proof. Alternately, they upload a picture that has undergone global modifications (e.g., scaling or recoloring) that changes all pixel values, resulting in a higher/whiter ELA compression map. But even in these cases, ELA still functions properly -- it still generates a topology map that represents the potential compression rate across the picture. This may not be useful for identifying if a picture was spliced, but it is useful for detecting what happened during the last save.
Bad Analysis
A few days ago, a group called "Bellingcat" published a report where they tried to do some digital photo forensics. They were trying to show that some satellite photos were digitally altered. They used FotoForensics to evaluate the picture, but unfortunately ended up misinterpreting the results.
In Bellingcat's analysis, they claims that the picture was altered because the five regions (A-E) look different. However, they failed to remember to compare similar attributes:
- Region "A" shows clouds and is uniformly white. Solid colors compress really well, so the ELA result is solid black. This indicates that the uniformly colored region is already optimally compressed.
- Region "E" has a little noise surrounded by black in the ELA -- just like the lid in the coffee cup example. This is where the colors blend from solid white to near white.
- Region "C" has a consistent texture. It shows land and buildings.
- Region "D" has a different texture from C. It is a smoother surface. Clouds with no texture are relatively smooth and compress better than complex textures. This results in the expected lower error level potential. This area also appears consistent with the lower-left region of "C", where the clouds partially cover the land.
- Region "B" has... well, I see no difference between B and D.
Using ELA, we cannot determine the authenticity of this picture: we cannot tell if it is real, and we cannot tell if it is fake. We can only conclude that this is a low quality picture and that the black text on white annotations were added last. If there was a higher quality version of this picture (without the annotations), then we would have a better chance at detecting any potential alterations.
Everyone's a critic
A number of people have pointed out flaws in the Bellingcat analysis. A forensic examiner in Australia used different tools and methods than me and found other inconsistencies in the Bellingcat findings. I think Myghty has one of the most thorough debunkings of the Bellingcat report.Unfortunately, other forensic experts chose to blame the tool rather than the uneducated users (yes Bellingcat, I'm calling you uneducated). For example, Der Spiegel quoted German image forensics expert Jens Kriese as saying:
From the perspective of forensics, the Bellingcat approach is not very robust. The core of what they are doing is based on so-called Error Level Analysis (ELA). The method is subjective and not based entirely on science. This is why there is not a single scientific paper that addresses it.The ignorance spouted by Kriese offends me. In particular:
- Kriese is correct that the results from the ELA system at FotoForensics is subjective -- it is up to the analyst to draw a conclusion from the generated compression map. However, this is no different than requiring a human to look through a microscope to identify cancer in a tissue sample. The scientific method is both objective and subjective. Tools should be repeatable and predictable -- that is objective. ELA generates a consistent, repeatable, and predictable map of JPEG's lossy compression potential.
In order to interpret results, we typically use one of two types of reasoning: deductive and inductive. Deductive is objective, while inductive is subjective. Inductive reasoning is often used for predicting, forecasting, and behavioral analysis. ("Did someone alter this picture?" or "did a camera generate this?" are behaviors.)
As an example, if you have ever broken a bone then you likely had an X-ray. The X-ray permits an analyst to view details that would otherwise go unseen. The X-ray is objective, not subjective. However, the X-ray image does not draw any conclusions about the subject matter. When the X-ray technician says, "I cannot tell you that it is broken because a diagnosis requires a doctor", then you enter the realm of subjective. (This is why you can ask for a "second opinion" -- opinions are subjective.) Similarly, ELA acts like an X-ray, permitting unseen attributes to become visible. The interpretation of the ELA results is not automated and requires a human to make a subjective determination based on specific factors (inductive reasoning).
- Identifying artifacts is part of the scientific process. In fact, it's the first step: observation. Given that ELA works consistently and predictably, it can also be used to test a hypothesis. Specific tests include: Do similar edges have similar ELA intensities? Do similar surfaces appear similar? And do similar textures appear similar? If the hypothesis is that the picture was altered and the ELA generates consistent error level potentials, then it fails to confirm the hypothesis. An alternative is to hypothesize that the picture is real and see an inconsistent ELA image. Inconsistency would prove the hypothesis is false, enabling an analyst to detect alterations.
For Kriese to question whether ELA is based on science, or to criticize the subjective portion of the evaluation, makes me question his understanding of the scientific method.
- Kriese says that "there is not a single scientific paper" covering ELA. Clearly Kriese has not read my blog. Four years ago I wrote about a Chinese researcher who plagiarized my work and had it published in a scientific journal: Lecture Notes in Computer Science, 2011, Volume 6526/2011, 1-11.
ELA is also mentioned in the "Digital Photo Forensics" section of the Handbook of Digital Imaging (John Wiley & Sons, Ltd). I wrote this encyclopedia's section and it was technically reviewed prior to acceptance.
In fact, ELA was first introduced in a white paper that was presented at the Black Hat Briefings computer security conference in 2007. Since computer forensics is part of computer security, this technology was presented to peers.
That makes three scientific papers that cover ELA. I can only assume that Kriese did not bother looking anything up before making this false claim.
- The entire argument, that research is not scientific unless it is published in a scientific paper, is fundamentally flawed. I have multiple blog entries about various problems with the academic publication process. Journal publication is not timely, authors often leave out critical information necessary to recreate or verify results, papers typically lack readability, trivial alterations are considered novel, and papers frequently discuss positives and omit limitations.
There are also significant flaws with the peer review process. Peer reviews often dismiss new discoveries when they conflict with the peer's personal interests. And if peer review actually worked, then why are plagiarism, false reporting, retractions, and even fake peer reviews so prevalent?
In addition, many companies have proprietary technologies that have not been publicly published. This does not mean that the technologies are unscientific. It only means that the details are not public. (In the case of ELA, the details are public.)
It is extremely myopic for Kriese to (1) believe that something is only scientific if it is published, and (2) attribute more creditability to published science articles than they deserve.
The reliance on error level analysis is fatally flawed as this technique is riddled with problems that mis-characterize authentic images as altered and failed to detect alterations.As I have repeatedly stated, the automated portion of ELA does not "detect" anything. Detection means drawing a conclusion. ELA highlights artifacts in the image, explicitly quantifies the JPEG error level potential across the image, and does it in a provable, repeatable, predictable way. The resulting compression map generated by ELA is deterministic, idempotent, and independent of personal opinion.
I also find it a little ironic that Farid's statements, "mis-characterize authentic images as altered" and "failed to detect alterations", can be explicitly applied to his own "izitru" and "FourMatch" commercial products. Unless the picture is a camera original, izitru will report that it could be altered. In effect, virtually everything online could be altered.
Both Kriese and Farid are correct that the Bellingcat report is bogus. However, they both incorrectly blame the problem on ELA. It's not the tool that is in error, it's the authors of the Bellingcat report.
Update 2015-06-09: Shortly after informing Der Spiegel of the flaws with their coverage of Bellingcat's report, SputnikNews reported an apology from Der Spiegel. (They did not link to Der Spiegel, so I cannot authenticate the apology.) They stated that "yet to be verified explosive news, is exactly what Der Spiegel failed to do at least twice during the initial coverage of the Bellingcat MH17 report" and concluded by saying, "Der Spiegel is on its way back to quality journalism." I can only hope that they continue to make fact checking part of their editorial process.
See one, do one, teach one
I do not believe it is possible to teach everyone. Some people have no incentive to learn, while others have ingrained beliefs that are personally biased or based on false premises. However, this does not mean that I will stop trying to help those who want to learn.Occasionally I debunk algorithms published in scientific journals. In the near future, I'll cover a widely deployed forensic algorithm -- that was published in a peer-reviewed journal. This algorithm is used by many forensic analysts and even taught in a few classes. But is so unreliable that it has virtually no practical value.










did not both -> did not bother
Thanks! Fixed!