Lately I have been working on a few new image analysis tools. One of them is based on the "copy-move" algorithm (
PDF).
About Copy-Move
The algorithm is kind of neat: a discrete cosine transformation (DCT) is used to create a fingerprint over a region, such as a 16x16 square. Every DCT is created (that's a lot of DCT's!) and then they are sorted uniquely. The basic concept: duplicate DCTs suggest a copy-paste since they have the same results. However, duplicates can happen. So, the algorithm tracks the vector (direction and length) between all duplicate DCTs. If there are enough duplicates with the same vector, then we can conclude that it is a copy-paste; a region in the picture was cloned to another region in the same picture.
As cool as it is, the algorithm has some big limitations. (Those are what I was fixing by applying additional algorithms and tweaking how the main algorithm works.)
John Graham-Cumming has an
implementation of the algorithm. While this is a good proof-of-concept, it has some serious limitations. For example, it allocates over 1032 BYTES PER PIXEL. A small picture that is 300x300 will require more than 92,880,000 bytes (92 Megs)! Small pictures take minutes to analyze. Medium pictures take hours. Large pictures? Segfault due to not enough memory. (While copymove.c is licensed under the GPL, I used no part of this code. My implementation is based off Fridrich's paper with my own optimizations.)
Busting Myths
I was trying my implementation against a test suite of images, looking for any corner issues. What I found was... surprising.
On September 15, 2000, reporter Dan Rodricks
wrote in the The Baltimore Sun about a boat that had been bisected by a channel marker. In 2007, the TV show MythBusters tested whether a boat traveling at 25 miles per hour could cause that kind of damage; they showed that it
could not happen (myth busted).
The picture above claims to show the boat bisected by the pole. But there are some problems:
- The accident reportedly happened around 2:00am. The picture shows bright daylight. Did they really wait most of the day without touching the boat? And in six hours, the tide would have changed by as much as two feet. But there is no sign of scraping on the pillar.
- The center of the boat appears to be under water, but it does not look flooded. Perhaps it is just the angle...
- Is the pillar made out of wood or metal? If it is wood, then were is the damage to the pillar? If it is metal (as reported), then wouldn't it be even a little dented?
- And speaking of the pillar... The seats have very visible shadows, but I don't see a shadow from the pillar onto the boat.
However, observations are only speculations. Let's use the image clone detection tool... (Mouse over the image to see the analysis!)
The analysis shows a large region that is cloned between the red and blue sections. This is a copy-and-paste in the image.
The algorithm normally uses a 16x16 DCT, leading to a very tight alignment with few false matches. However, it can miss things, like matches that are 15x15 or smaller:
The small matches at the top are spurious. However, the big matches in the water below the boat are definite: the water has been cut-and-pasted. Now that you know where to look, you can actually see rectangles where the waves stop abruptly -- the water is fake. In fact, the reflection of the pillar in the water was created through a series of replications.
Given that the water is fake, we need to question whether the boat and the rest of the picture is fake... or did someone try to cover up something in the water? In any case, the photo is not real because the water has been digitally manipulated.
Tracing the Origins
This report has all of the markings of an urban legend: a great story, a fake photo, and implausible physics. Adding to this, the
first citation that I found was incorrect. (It identifies the source as "Baltimore Sun, 9/18/2000", but it was actually 9/15/2000.)
However, I did get a reply from the author of the report (Dan Rodricks, columnist). He was very kind and sent me the entire text. Unfortunately, when I asked him for the source of the story, he wrote, "Do some legwork. Contact the Maryland DNR police." The
Maryland DNR police have not responded to my query. Right now, I don't know if the incident actually happened, or if the picture is actually from the incident. (It could be an artist's recreation, a sanitized photo, or something else.)
In any case, the picture is fake, the detection algorithm definitely does work, and all of this is very cool. (As an aside, copymove.c took 20 minutes and found some of these matches when using a threshold of 1. My tool took 18 seconds and found many more matches.)
after analysing and implimenting it help the student and research community, i promise you it is used for only reserach reference purpose,do the needful in this regards..
its kind request..
While I cannot release my source code, I did a follow-up posting where I described many of the optimizations that made this possible.
http://www.hackerfactor.com/blog/index.php?/archives/308-Send-In-The-Clones.html