One of my favorite TV shows is
House. I'm not actually interested in the pseudo-medical aspect or the constant stream of psychological manipulation. Instead, I like the main character (Gregory House, played by Hugh Laurie). His philosophy on life seems spot on: Everybody lies.
Audio Forensics
I've been doing some work on audio analysis over the last few years. Audio forensics is a very different field from other types of digital forensics. For example, digital image analysis is such a new field that there are only a handful of researchers, and nearly all are in academics. And while there are few linguistic analysis techniques that been around for decades, it is relatively rare to see brand new things in this field. (As far as I know, I'm the only person to dive into profiling physical attributes based on keyboard usage.)
Exploring audio forensics has been a very interesting change. Unlike linguistic and image analysis, everyone and their dog has tried audio analysis. There are algorithms and techniques for doing everything from enhancing signals to extracting information. Very few algorithms need to be built from scratch.
Having said that, As-Sahab released two audio recordings of "Osama Bin Laden" in the last two days. I've quoted the name because it is not really his voice. In fact, the speech does not even sound like something written by Bin Laden. (Kidos to
Laura Mansfield for distributing the
audio and
video of
both recordings.)
Audio Analysis
The audio quality of these latest recordings is very poor. However, it is not poor due to a bad recording. It is poor because someone inserted a bunch of digital noise into the audio stream.
Here are two views of Bin Laden recordings that show the frequency ranges for the recordings. This first sample comes from the 7-Sept-2007 video message. While most of the video does not feature the real Bin Laden's voice, the first two minutes really are his voice. (But the first two minutes do not mention current events and were likely recorded in 2004 since the video matches the audio and can be
traced to 2004.)
Notice how the 2007 sample has clean gaps between phonemes, words, and sentences. There is very little ambient noise in this recording.
In contrast, here is a sample from Wednesday's recording:
In this sample, the noise in the recording is obvious.
Now, let's compare the noise with
artificial white noise.
In
Gaussian white noise, frequency patterns appear to be random. The Bin Laden sample does appear to be random noise. Instead, it has larger clusters of no noise (white patches) and patches of all noise (chunks of adjacent samples with the same frequency).
- If it is a Gaussian algorithm, then it is a poor implementation. (I kind of doubt that it is steganography.) A Gaussian algorithm would have noise all over the place and no large chunks of no-noise or lots of noise. (The noise pattern should appear uniformly distributed.)
- If this were real ambient noise, then it should interfere with all of the speaker's voice pattern (but it doesn't) and it probably should not be all over the spectrum since the recording device and acoustics limits the spectrum.
- If the noise were natural or mechanical in nature, then it should be restrictive in the frequency range. Also, natural noise should either appear Gaussian or periodic -- this is neither.
- Finally, the ambient noise in the Bin Laden recording does not seem to interfere with the audio speech -- it appears to have been added in and not naturally combined. Moreover, if they had merged the noise and not added it in, then the voice would have been unintelligible.
(NOTE: I do not claim to be an audio expert. Feel free to disagree with me about how the noise was combined or generated. But please provide evidence and not just "you're wrong.")
Alright, so the background noise does not appear to be natural, not uniform, not frequency limited, not periodic, and not blended with the speaker's voice. This leaves digitally added noise.
So why would someone intentionally add in digital noise? They know that people will analyze the audio and try to match the voice print to known samples of the real Bin Laden. Since they know the samples won't match, they attempted to introduce enough noise to defeat the analysis. And while their attempt probably will defeat some analysis methods, other
analysis techniques are not impacted by this approach.
An analysis match cannot confirm that it is the real Bin Laden, even though it is a strong indicator. In contrast, an analysis miss can conclusively rule out an impersonator. In the case of these recordings, some vocal attributes from the real Bin Laden
should be present, but
none are present. For example, even though there is noise in the stream, the
biometric attributes related to the speaker are intact. These can be used to compare vocal attributes from a known subject with the unknown subject. In this case, there is no match. The absence is not due to the random noise inserted into the stream; this is not the voice of the real Bin Laden.
Moreover, if Al Qaeda wanted to prove that it was the real voice of Bin Laden, then they would have tried to make the audio stream as clean as possible for a positive identification. This would have made a much more powerful statement than any content he spoke. However, with the missing vocal qualities, added noise, and intentional attempt to deter confirmation, they have only managed to make one statement: Bin Laden's voice has not been heard since 2004.
Speech Oddities
While the audio clearly shows manipulation (noise was added), the speech itself has oddities that are inconsistent with the real Bin Laden. For example, in the 19-March-2008 recording he says:
"How it saddens us that you target our villages with your bombing: those modest mud villages, which have collapsed onto our women and children. You do that intentionally, and I am witness to that."
and
"Although our tragedy in your killing of our women and children is a very great ones..."
The real Bin Laden was consistent with his rhetoric. The attack on 9/11 did target women and children and non-military installations because, as Al Qaeda has frequently said, there are
no innocent people in this war. So for him to change and start addressing the plight of women and children is inconsistent with
his speeches up through 2004.
The real Bin Laden is very eloquent. Until he vanished in late 2004, he said many things, but he never outright lied. In the 19-March-2008 recording, the voice lies. The voice said:
"And I bring your attention to a more telling matter, which is that despite your publishing of the insulting drawings, you haven't seen any reaction from the one and a half billion Muslims"
This is provably false. When the cartoons were first published, there were
massive riots. People
died. And
death threats were made against the cartoonists and the publishers. When the cartoons were republished last month, they led to more
condemnation and riots. Yet, he says "haven't seen any reaction". This is a lie. Since the real Bin Laden does not directly lie, this voice is not the real Bin Laden.
Flaws and Foes
Good try, As-Sahab. Better luck next time. As far as constructive criticism, As-Sahab should teach their English translator to not use contractions (an informal writing style) in the speech subtitles (a formal forum). Meanwhile, there has been no evidence of Bin Laden being seen or heard since 2004. Bin Laden is still
as alive as Elvis.
The false representation of Bin Laden is not limited to As-Sahab. There have been recent
reports from the White House and CIA that claim the voice is authentic. I don't know how they reached their conclusion, but I do disagree with their findings. As the fictional Dr. Gregory House says, "Everybody lies."
(As an aside, I usually use linguistic, audio, and other forensic techniques on other projects. However, the feedback I have received from my blog postings has turned terrorist media analysis into some kind of hobby. I'm hoping to spend more time on other fun ventures and less time analyzing propaganda from Al Qaeda.)