One Bad Apple
Sunday, 8 August 2021
My in-box has been flooded over the last few days about Apple's CSAM announcement. Everyone seems to want my opinion since I've been deep into photo analysis technologies and the reporting of child exploitation materials. In this blog entry, I'm going to go over what Apple announced, existing technologies, and the impact to end users. Moreover, I'm going to call out some of Apple's questionable claims.
Disclaimer: I'm not an attorney and this is not legal advice. This blog entry includes my non-attorney understanding of these laws.
The article starts with Apple pointing out that the spread of Child Sexual Abuse Material (CSAM) is a problem. I agree, it is a problem. At my FotoForensics service, I typically submit a few CSAM reports (or "CP" -- photo of child pornography) per day to the National Center for Missing and Exploited Children (NCMEC). (It's actually written into Federal law: 18 U.S.C. § 2258A. Only NMCEC can receive CP reports, and 18 USC § 2258A(e) makes it a felony for a service provider to fail to report CP.) I don't permit porn or nudity on my site because sites that permit that kind of content attract CP. By banning users and blocking content, I currently keep porn to about 2-3% of the uploaded content, and CP at less than 0.06%.
According to NCMEC, I submitted 608 reports to NCMEC in 2019, and 523 reports in 2020. In those same years, Apple submitted 205 and 265 reports (respectively). It isn't that Apple doesn't receive more picture than my service, or that they don't have more CP than I receive. Rather, it's that they don't seem to notice and therefore, don't report.
Apple's devices rename pictures in a way that is very distinct. (Filename ballistics spots it really well.) Based on the number of reports that I've submitted to NCMEC, where the image appears to have touched Apple's devices or services, I think that Apple has a very large CP/CSAM problem.
[Revised; thanks CW!] Apple's iCloud service encrypts all data, but Apple has the decryption keys and can use them if there is a warrant. However, nothing in the iCloud terms of service grants Apple access to your pictures for use in research projects, such as developing a CSAM scanner. (Apple can deploy new beta features, but Apple cannot arbitrarily use your data.) In effect, they don't have access to your content for testing their CSAM system.
If Apple wants to crack down on CSAM, then they have to do it on your Apple device. This is what Apple announced: Beginning with iOS 15, Apple will be deploying a CSAM scanner that will run on your device. If it encounters any CSAM content, it will send the file to Apple for confirmation and then they will report it to NCMEC. (Apple wrote in their announcement that their staff "manually reviews each report to confirm there is a match". They cannot manually review it unless they have a copy.)
While I understand the reason for Apple's proposed CSAM solution, there are some serious problems with their implementation.
The cryptographic solution uses a checksum, like MD5 or SHA1, that matches a known image. If a new file has the exact same cryptographic checksum as a known file, then it is very likely byte-per-byte identical. If the known checksum is for known CP, then a match identifies CP without a human needing to review the match. (Anything that reduces the amount of these disturbing pictures that a human sees is a good thing.)
In 2014 and 2015, NCMEC stated that they would give MD5 hashes of known CP to service providers for detecting known-bad files. I repeatedly begged NCMEC for a hash set so I could try to automate detection. Eventually (about a year later) they provided me with about 20,000 MD5 hashes that match known CP. In addition, I had about 3 million SHA1 and MD5 hashes from other law enforcement sources. This might sound like a lot, but it really isn't. A single bit change to a file will prevent a CP file from matching a known hash. If a picture is simple re-encoded, it will likely have a different checksum -- even if the content is visually the same.
In the six years that I've been using these hashes at FotoForensics, I've only matched 5 of these 3 million MD5 hashes. (They really are not that useful.) In addition, one of them was definitely a false-positive. (The false-positive was a fully clothed man holding a monkey -- I think it's a rhesus macaque. No children, no nudity.) Based just on the 5 matches, I am able to theorize that 20% of the cryptographic hashes were likely incorrectly classified as CP. (If I ever give a talk at Defcon, I will make sure to include this picture in the media -- just so CP scanners will incorrectly flag the Defcon DVD as a source for CP. [Sorry, Jeff!])
Perceptual hashes look for similar picture attributes. If two pictures have similar blobs in similar areas, then the pictures are similar. I have a few blog entries that detail how these algorithms work.
NCMEC uses a perceptual hash algorithm provided by Microsoft called PhotoDNA. NMCEC claims that they share this technology with service providers. However, the acquisition process is complicated:
Because of FotoForensics, I have a legitimate use for this code. I want to detect CP during the upload process, immediately block the user, and automatically report them to NCMEC. However, after multiple requests (spanning years), I never got past the NDA step. Twice I was sent the NDA and signed it, but NCMEC never counter-signed it and stopped responding to my status requests. (It's not like I'm a little nobody. If you sort NCMEC's list of reporting providers by the number of submissions in 2020, then I come in at #40 out of 168. For 2019, I'm #31 out of 148.)
Since NCMEC was treating PhotoDNA as a trade secret, I decided to reverse engineer the algorithm using some papers published by Microsoft. (No single paper says how it works, but I cobbled together how it works from a bunch of their marketing blurbs and high-level slides.) I know that I have implemented it correctly because other providers who have the code were able to use my hashes to correctly match pictures.
Perhaps there is a reason that they don't want really technical people looking at PhotoDNA. Microsoft says that the "PhotoDNA hash is not reversible". That's not true. PhotoDNA hashes can be projected into a 26x26 grayscale image that is only a little blurry. 26x26 is larger than most desktop icons; it's enough detail to recognize people and objects. Reversing a PhotoDNA hash is no more complicated than solving a 26x26 Sudoku puzzle; a task well-suited for computers.
I have a whitepaper about PhotoDNA that I have privately circulated to NCMEC, ICMEC (NCMEC's international counterpart), a few ICACs, a few tech vendors, and Microsoft. The few who provided feedback were very concerned about PhotoDNA's limitations that the paper calls out. I have not made my whitepaper public because it describes how to reverse the algorithm (including pseudocode). If someone were to release code that reverses NCMEC hashes into pictures, then everyone in possession of NCMEC's PhotoDNA hashes would be in possession of child pornography.
With perceptual hashes, the algorithm identifies known image attributes. The AI solution is similar, but rather than knowing the attributes a priori, an AI system is used to "learn" the attributes. For example, many years ago there was a Chinese researcher who was using AI to identify poses. (There are some poses that are common in porn, but uncommon in non-porn.) These poses became the attributes. (I never did hear whether his system worked.)
The problem with AI is that you don't know what attributes it finds important. Back in college, some of my friends were trying to teach an AI system to identify male or female from face photos. The main thing it learned? Men have facial hair and women have long hair. It determined that a woman with a fuzzy lip must be "male" and a guy with long hair is female.
Apple says that their CSAM solution uses an AI perceptual hash called a NeuralHash. They include a technical paper and some technical reviews that claim that the software works as advertised. However, I have some serious concerns here:
According to all of the reports I've seen, Facebook has more accessible photos than Apple. Remember: Apple says that they do not have access to users' photos on iCloud, so I do not believe that they have access to 1 trillion pictures for testing. So where else could they get 1 trillion pictures?
An AI-driven interpretation solution tries to use AI to learn contextual elements. Person, dog, adult, child, clothing, etc. While AI systems have come a long way with identification, the technology is nowhere near good enough to identify pictures of CSAM. There are also the extreme resource requirements. If a contextual interpretative CSAM scanner ran on your iPhone, then the battery life would dramatically drop. I suspect that a charged battery would only last a few hours.
Luckily, Apple isn't doing this type of solution. Apple is focusing on the AI-driven perceptual hash solution.
You could argue that you, as the user, have a choice about which AV to use, while Apple isn't giving you a choice. However, Microsoft ships with Defender. (Good luck trying to disable it; it turns on after each update.) Similarly, my Android ships with McAfee. (I can't figure out how to turn it off!)
The thing that I find bothersome about Apple's solution is what they do after they find suspicious content. With indexing services, the index stays on the device. With AV systems, potential malware is isolated -- but stays on the device. But with CSAM? Apple says:
In order to manually review the match, they must have access to the content. This means that the content must be transferred to Apple. Moreover, as one of Apple's tech reviewers wrote, "Users get no direct feedback from the system and therefore cannot directly learn if any of their photos match the CSAM database." This leads to two big problems: illegal searches and illegal collection of child exploitation material.
As noted, Apple says that they will scan your Apple device for CSAM material. If they find something that they think matches, then they will send it to Apple. The problem is that you don't know which pictures will be sent to Apple. You could have corporate confidential information and Apple may quietly take a copy of it. You could be working with the legal authority to investigate a child exploitation case, and Apple will quietly take a copy of the evidence.
To reiterate: scanning your device is not a privacy risk, but copying files from your device without any notice is definitely a privacy issue.
Think of it this way: Your landlord owns your property, but in the United States, he cannot enter any time he wants. In order to enter, the landlord must have permission, give prior notice, or have cause. Any other reason is trespassing. Moreover, if the landlord takes anything, then it's theft. Apple's license agreement says that they own the operating system, but that doesn't give them permission to search whenever they want or to take content.
The laws related to CSAM are very explicit. 18 U.S. Code § 2252 states that knowingly transferring CSAM material is a felony. (The only exception, in 2258A, is when it is reported to NCMEC.) In this case, Apple has a very strong reason to believe they are transferring CSAM material, and they are sending it to Apple -- not NCMEC.
It does not matter that Apple will then check it and forward it to NCMEC. 18 U.S.C. § 2258A is specific: the data can only be sent to NCMEC. (With 2258A, it is illegal for a service provider to turn over CP photos to the police or the FBI; you can only send it to NCMEC. Then NCMEC will contact the police or FBI.) What Apple has detailed is the intentional distribution (to Apple), collection (at Apple), and access (viewing at Apple) of material that they strongly have reason to believe is CSAM. As it was explained to me by my attorney, that is a felony.
At FotoForensics, we have a simple process:
I understand the problems related to CSAM, CP, and child exploitation. I've spoken at conferences on this topic. I am a mandatory reporter; I've submitted more reports to NCMEC than Apple, Digital Ocean, Ebay, Grindr, and the Internet Archive. (It isn't that my service receives more of it; it's that we're more vigilant at detecting and reporting it.) I'm no fan of CP. While I would welcome a better solution, I believe that Apple's solution is too invasive and violates both the letter and the intent of the law. If Apple and NCMEC view me as one of the "screeching voices of the minority", then they are not listening.
Update 2021-08-09: In response to widespread criticism, Apple quickly released an FAQ. This FAQ contradicts their original announcement, contradicts itself, contains doublespeak, and omits important details. For example:
Disclaimer: I'm not an attorney and this is not legal advice. This blog entry includes my non-attorney understanding of these laws.
The Announcement
In an announcement titled "Expanded Protections for Children", Apple explains their focus on preventing child exploitation.The article starts with Apple pointing out that the spread of Child Sexual Abuse Material (CSAM) is a problem. I agree, it is a problem. At my FotoForensics service, I typically submit a few CSAM reports (or "CP" -- photo of child pornography) per day to the National Center for Missing and Exploited Children (NCMEC). (It's actually written into Federal law: 18 U.S.C. § 2258A. Only NMCEC can receive CP reports, and 18 USC § 2258A(e) makes it a felony for a service provider to fail to report CP.) I don't permit porn or nudity on my site because sites that permit that kind of content attract CP. By banning users and blocking content, I currently keep porn to about 2-3% of the uploaded content, and CP at less than 0.06%.
According to NCMEC, I submitted 608 reports to NCMEC in 2019, and 523 reports in 2020. In those same years, Apple submitted 205 and 265 reports (respectively). It isn't that Apple doesn't receive more picture than my service, or that they don't have more CP than I receive. Rather, it's that they don't seem to notice and therefore, don't report.
Apple's devices rename pictures in a way that is very distinct. (Filename ballistics spots it really well.) Based on the number of reports that I've submitted to NCMEC, where the image appears to have touched Apple's devices or services, I think that Apple has a very large CP/CSAM problem.
[Revised; thanks CW!] Apple's iCloud service encrypts all data, but Apple has the decryption keys and can use them if there is a warrant. However, nothing in the iCloud terms of service grants Apple access to your pictures for use in research projects, such as developing a CSAM scanner. (Apple can deploy new beta features, but Apple cannot arbitrarily use your data.) In effect, they don't have access to your content for testing their CSAM system.
If Apple wants to crack down on CSAM, then they have to do it on your Apple device. This is what Apple announced: Beginning with iOS 15, Apple will be deploying a CSAM scanner that will run on your device. If it encounters any CSAM content, it will send the file to Apple for confirmation and then they will report it to NCMEC. (Apple wrote in their announcement that their staff "manually reviews each report to confirm there is a match". They cannot manually review it unless they have a copy.)
While I understand the reason for Apple's proposed CSAM solution, there are some serious problems with their implementation.
Problem #1: Detection
There are different ways to detect CP: cryptographic, algorithmic/perceptual, AI/perceptual, and AI/interpretation. Even though there are lots of papers about how good these solutions are, none of these methods are foolproof.The cryptographic hash solution
The cryptographic solution uses a checksum, like MD5 or SHA1, that matches a known image. If a new file has the exact same cryptographic checksum as a known file, then it is very likely byte-per-byte identical. If the known checksum is for known CP, then a match identifies CP without a human needing to review the match. (Anything that reduces the amount of these disturbing pictures that a human sees is a good thing.)
In 2014 and 2015, NCMEC stated that they would give MD5 hashes of known CP to service providers for detecting known-bad files. I repeatedly begged NCMEC for a hash set so I could try to automate detection. Eventually (about a year later) they provided me with about 20,000 MD5 hashes that match known CP. In addition, I had about 3 million SHA1 and MD5 hashes from other law enforcement sources. This might sound like a lot, but it really isn't. A single bit change to a file will prevent a CP file from matching a known hash. If a picture is simple re-encoded, it will likely have a different checksum -- even if the content is visually the same.
In the six years that I've been using these hashes at FotoForensics, I've only matched 5 of these 3 million MD5 hashes. (They really are not that useful.) In addition, one of them was definitely a false-positive. (The false-positive was a fully clothed man holding a monkey -- I think it's a rhesus macaque. No children, no nudity.) Based just on the 5 matches, I am able to theorize that 20% of the cryptographic hashes were likely incorrectly classified as CP. (If I ever give a talk at Defcon, I will make sure to include this picture in the media -- just so CP scanners will incorrectly flag the Defcon DVD as a source for CP. [Sorry, Jeff!])
The perceptual hash solution
Perceptual hashes look for similar picture attributes. If two pictures have similar blobs in similar areas, then the pictures are similar. I have a few blog entries that detail how these algorithms work.
NCMEC uses a perceptual hash algorithm provided by Microsoft called PhotoDNA. NMCEC claims that they share this technology with service providers. However, the acquisition process is complicated:
- Make a request to NCMEC for PhotoDNA.
- If NCMEC approves the initial request, then they send you an NDA.
- You fill out the NDA and return it to NCMEC.
- NCMEC reviews it again, signs, and revert the fully-executed NDA to you.
- NCMEC reviews your use model and process.
- After the review is completed, you get the code and hashes.
Because of FotoForensics, I have a legitimate use for this code. I want to detect CP during the upload process, immediately block the user, and automatically report them to NCMEC. However, after multiple requests (spanning years), I never got past the NDA step. Twice I was sent the NDA and signed it, but NCMEC never counter-signed it and stopped responding to my status requests. (It's not like I'm a little nobody. If you sort NCMEC's list of reporting providers by the number of submissions in 2020, then I come in at #40 out of 168. For 2019, I'm #31 out of 148.)
Since NCMEC was treating PhotoDNA as a trade secret, I decided to reverse engineer the algorithm using some papers published by Microsoft. (No single paper says how it works, but I cobbled together how it works from a bunch of their marketing blurbs and high-level slides.) I know that I have implemented it correctly because other providers who have the code were able to use my hashes to correctly match pictures.
Perhaps there is a reason that they don't want really technical people looking at PhotoDNA. Microsoft says that the "PhotoDNA hash is not reversible". That's not true. PhotoDNA hashes can be projected into a 26x26 grayscale image that is only a little blurry. 26x26 is larger than most desktop icons; it's enough detail to recognize people and objects. Reversing a PhotoDNA hash is no more complicated than solving a 26x26 Sudoku puzzle; a task well-suited for computers.
I have a whitepaper about PhotoDNA that I have privately circulated to NCMEC, ICMEC (NCMEC's international counterpart), a few ICACs, a few tech vendors, and Microsoft. The few who provided feedback were very concerned about PhotoDNA's limitations that the paper calls out. I have not made my whitepaper public because it describes how to reverse the algorithm (including pseudocode). If someone were to release code that reverses NCMEC hashes into pictures, then everyone in possession of NCMEC's PhotoDNA hashes would be in possession of child pornography.
The AI perceptual hash solution
With perceptual hashes, the algorithm identifies known image attributes. The AI solution is similar, but rather than knowing the attributes a priori, an AI system is used to "learn" the attributes. For example, many years ago there was a Chinese researcher who was using AI to identify poses. (There are some poses that are common in porn, but uncommon in non-porn.) These poses became the attributes. (I never did hear whether his system worked.)
The problem with AI is that you don't know what attributes it finds important. Back in college, some of my friends were trying to teach an AI system to identify male or female from face photos. The main thing it learned? Men have facial hair and women have long hair. It determined that a woman with a fuzzy lip must be "male" and a guy with long hair is female.
Apple says that their CSAM solution uses an AI perceptual hash called a NeuralHash. They include a technical paper and some technical reviews that claim that the software works as advertised. However, I have some serious concerns here:
- The reviewers include cryptography experts (I have no concerns about the cryptography) and a little bit of image analysis. However, none of the reviewers have backgrounds in privacy. Also, although they made statements about the legality, they are not legal experts (and they missed some glaring legal issues; see my next section).
- Apple's technical whitepaper is overly technical -- and yet doesn't give enough information for someone to confirm the implementation. (I cover this type of paper in my blog entry, "Oh Baby, Talk Technical To Me" under "Over-Talk".) In effect, it is a proof by cumbersome notation. This plays to a common fallacy: if it looks really technical, then it must be really good. Similarly, one of Apple's reviewers wrote an entire paper full of mathematical symbols and complex variables. (But the paper looks impressive. Remember kids: a mathematical proof is not the same as a code review.)
- Apple claims that there is a "one in one trillion chance per year of incorrectly flagging a given account". I'm calling bullshit on this.
According to all of the reports I've seen, Facebook has more accessible photos than Apple. Remember: Apple says that they do not have access to users' photos on iCloud, so I do not believe that they have access to 1 trillion pictures for testing. So where else could they get 1 trillion pictures?
- Randomly generated: Testing against randomly generated pictures is not realistic compared to photos by people.
- Videos: Testing against frames from videos means lots of bias from visual similarity.
- Web crawling: Scraping the web would work, but my web logs rarely show Apple's bots doing scrapes. If they are doing this, then they are not harvesting at a fast enough rate to account for a trillion pictures.
- Partnership: They could have some kind of partnership that provides the pictures. However, I haven't seen any such announcements. And the cost for such a large license would probably show up in their annual shareholder's report. (But I haven't seen any disclosure like this.)
- NCMEC: In NCMEC's 2020 summary report, they state that they received 65.4 million files in 2020. NCMEC was founded in 1984. If we assume that they received the same number of files every year (a gross over-estimate), then that means they have around 2.5 billion files. I do not think that NCMEC has 1 trillion examples to share with Apple.
- With cryptographic hashes (MD5, SHA1, etc.), we can use the number of bits to identify the likelihood of a collision. If the odds are "1 in 1 trillion", then it means the algorithm has about 40 bits for the hash. However, counting the bit size for a hash does not work with perceptual hashes.
- With perceptual hashes, the real question is how often do those specific attributes appear in a photo. This isn't the same as looking at the number of bits in the hash. (Two different pictures of cars will have different perceptual hashes. Two different pictures of similar dogs taken at similar angles will have similar hashes. And two different pictures of white walls will be almost identical.)
- With AI-driven perceptual hashes, including algorithms like Apple's NeuralHash, you don't even know the attributes so you cannot directly test the likelihood. The only real solution is to test by passing through a large number of visually different images. But as I mentioned, I don't think Apple has access to 1 trillion pictures.
The AI interpretation solution
An AI-driven interpretation solution tries to use AI to learn contextual elements. Person, dog, adult, child, clothing, etc. While AI systems have come a long way with identification, the technology is nowhere near good enough to identify pictures of CSAM. There are also the extreme resource requirements. If a contextual interpretative CSAM scanner ran on your iPhone, then the battery life would dramatically drop. I suspect that a charged battery would only last a few hours.
Luckily, Apple isn't doing this type of solution. Apple is focusing on the AI-driven perceptual hash solution.
Problem #2: Legal
Since Apple's initial CSAM announcement, I've seen lots of articles that focus on Apple scanning your files or accessing content on your encrypted device. Personally, this doesn't bother me. You have anti-virus (AV) tools that scan your device when your drive is unlocked, and you have file index systems that inventory all of your content. When you search for a file on your device, it accesses the pre-computed file index. (See Apple's Spotlight and Microsoft's Cortana.)You could argue that you, as the user, have a choice about which AV to use, while Apple isn't giving you a choice. However, Microsoft ships with Defender. (Good luck trying to disable it; it turns on after each update.) Similarly, my Android ships with McAfee. (I can't figure out how to turn it off!)
The thing that I find bothersome about Apple's solution is what they do after they find suspicious content. With indexing services, the index stays on the device. With AV systems, potential malware is isolated -- but stays on the device. But with CSAM? Apple says:
Only when the threshold is exceeded does the cryptographic technology allow Apple to interpret the contents of the safety vouchers associated with the matching CSAM images. Apple then manually reviews each report to confirm there is a match, disables the user’s account, and sends a report to NCMEC.
In order to manually review the match, they must have access to the content. This means that the content must be transferred to Apple. Moreover, as one of Apple's tech reviewers wrote, "Users get no direct feedback from the system and therefore cannot directly learn if any of their photos match the CSAM database." This leads to two big problems: illegal searches and illegal collection of child exploitation material.
Illegal Searches
As noted, Apple says that they will scan your Apple device for CSAM material. If they find something that they think matches, then they will send it to Apple. The problem is that you don't know which pictures will be sent to Apple. You could have corporate confidential information and Apple may quietly take a copy of it. You could be working with the legal authority to investigate a child exploitation case, and Apple will quietly take a copy of the evidence.
To reiterate: scanning your device is not a privacy risk, but copying files from your device without any notice is definitely a privacy issue.
Think of it this way: Your landlord owns your property, but in the United States, he cannot enter any time he wants. In order to enter, the landlord must have permission, give prior notice, or have cause. Any other reason is trespassing. Moreover, if the landlord takes anything, then it's theft. Apple's license agreement says that they own the operating system, but that doesn't give them permission to search whenever they want or to take content.
Illegal Data Collection
The laws related to CSAM are very explicit. 18 U.S. Code § 2252 states that knowingly transferring CSAM material is a felony. (The only exception, in 2258A, is when it is reported to NCMEC.) In this case, Apple has a very strong reason to believe they are transferring CSAM material, and they are sending it to Apple -- not NCMEC.
It does not matter that Apple will then check it and forward it to NCMEC. 18 U.S.C. § 2258A is specific: the data can only be sent to NCMEC. (With 2258A, it is illegal for a service provider to turn over CP photos to the police or the FBI; you can only send it to NCMEC. Then NCMEC will contact the police or FBI.) What Apple has detailed is the intentional distribution (to Apple), collection (at Apple), and access (viewing at Apple) of material that they strongly have reason to believe is CSAM. As it was explained to me by my attorney, that is a felony.
At FotoForensics, we have a simple process:
- People choose to upload pictures. We don't harvest pictures from your device.
- When my admins review the uploaded content, we do not expect to see CP or CSAM. We are not "knowingly" seeing it since it makes up less than 0.06% of the uploads. Moreover, our review catalogs lots of types of pictures for various research projects. CP is not one of the research projects. We do not intentionally look for CP.
- When we see CP/CSAM, we immediately report it to NCMEC, and only to NCMEC.
The Backlash
In the hours and days since Apple made its announcement, there has been a lot of media coverage and feedback from the tech community -- and much of it is negative. A few examples:- BBC: "Apple criticised for system that detects child abuse"
- Ars Technica: "Apple explains how iPhones will scan photos for child-sexual-abuse images"
- EFF: "Apple's Plan to 'Think Different' About Encryption Opens a Backdoor to Your Private Life"
- The Verge: "WhatsApp lead and other tech experts fire back at Apple's Child Safety plan"
I understand the problems related to CSAM, CP, and child exploitation. I've spoken at conferences on this topic. I am a mandatory reporter; I've submitted more reports to NCMEC than Apple, Digital Ocean, Ebay, Grindr, and the Internet Archive. (It isn't that my service receives more of it; it's that we're more vigilant at detecting and reporting it.) I'm no fan of CP. While I would welcome a better solution, I believe that Apple's solution is too invasive and violates both the letter and the intent of the law. If Apple and NCMEC view me as one of the "screeching voices of the minority", then they are not listening.
Update 2021-08-09: In response to widespread criticism, Apple quickly released an FAQ. This FAQ contradicts their original announcement, contradicts itself, contains doublespeak, and omits important details. For example:
- The FAQ says that they don't access Messages, but also says that they filter Messages and blur images. (How can they know what to filter without accessing the content?)
- The FAQ says that they won't scan all photos for CSAM; only the photos for iCloud. However, Apple does not mention that the default configuration uses iCloud for all photo backups.
- The FAQ say that there will be no falsely identified reports to NCMEC because Apple will have people conduct manual reviews. As if people never make mistakes.
Read more about AI, Forensics, FotoForensics, Image Analysis, Mass Media, Privacy
| Comments (55)
| Direct Link

Is this correct?
If you look at the page you linked to, content like photos and videos don't use end-to-end encryption. They're encrypted in transit and on disk, but Apple has the key. In this regard, they don't seem to be any more private than Google Photos, Dropbox, etc. That's also why they're able to give media, iMessages(*), etc, to the authorities when something bad happens.
The section underneath the table lists what's actually hidden from them. Keychain (password manager), health data, etc, are there. There's nothing about media.
If I'm right, it's strange that a smaller service like yours reports more content than Apple. Maybe they don't do any scanning server side and those 523 reports are actually manual reports?
(*) Many don't know this, but that as soon the user logs in to their iCloud account and has iMessages working across devices it stops being encrypted end-to-end. The decryption keys is uploaded to iCloud, which essentially makes iMessages plaintext to Apple.
It was my understanding that Apple didn't have the key.
I just double-checked:
https://www.apple.com/legal/privacy/law-enforcement-guidelines-us.pdf
Section J. iCloud. you are correct (I am wrong). Apple can decrypt the data. However, access requires a warrant. The terms of service does not say that they can use your iCloud data for research. https://www.apple.com/legal/internet-services/icloud/
I'll update the text of my blog. Thank you for pointing this out!
1. The iCloud legal agreement you cite doesn't discuss Apple using the photos for research, but in sections 5C and 5E, it says Apple can screen your material for content that is illegal, objectionable, or violates the legal agreement. It's not like Apple has to wait for a subpoena before Apple can decrypt the photos. They can do it whenever they want. They just won't give it to law enforcement without a subpoena. Unless I'm missing something, there's really no technical or legal reason they can't scan these photos server-side. And from a legal basis, I'm not sure how they can get away with not scanning content they are hosting.
On that point, I find it really bizarre Apple is drawing a distinction between iCloud Photos and the rest of the iCloud service. Surely, Apple is scanning files in iCloud Drive, right? The advantage of iCloud Photos is that when you generate photographic content with iPhone's camera, it automatically goes into the camera roll, which then gets uploaded to iCloud Photos. But I have to imagine most CSAM on iPhones is not generated with the iPhone camera but is redistributed, existing content that has been downloaded directly on the device. It's just as easy to save file sets to iCloud Drive (and then even share that content) as it is to save the files to iCloud Photos. Is Apple really saying that if you save CSAM in iCloud Drive, they'll look the other way? That'd be crazy. But if they aren't going to scan files added to iCloud Drive on the iPhone, the only way to scan that content would be server-side, and iCloud Drive buckets are stored just like iCloud Photos are (encrypted with Apple holding decryption key).
We know that, at least as of Jan. 2020, Jane Horvath (Apple's Chief Privacy Officer) said Apple was using some technologies to screen for CSAM. Apple has never disclosed what content is being screened or how it's happening, nor does the iCloud legal agreement indicate Apple will screen for this material. Maybe that screening is limited to iCloud email, since it is never encrypted. But I still have to assume they're screening iCloud Drive (how is iCloud Drive any different from Dropbox in this respect?). If they are, why not just screen iCloud Photos the same way? Makes no sense. If they aren't screening iCloud Drive and won't under this new scheme, then I still don't understand what they are doing.
There's a little more nuance here. For Apple to have plaintext access to messages, two things have to be true:
1. "Messages in iCloud" is on. Note that this a new feature as of a year or two ago, and is distinct from simply having iMessage working across devices: this feature is only useful for accessing historical messages on a device that wasn't around to receive them when they are initially sent.
2. The user has an iPhone, configured to back up to iCloud.
In that case, yes: the messages are stored in iCloud encrypted, but the user's (unencrypted) backup includes the key.
(Source: https://support.apple.com/en-us/HT202303)
I believe that those two settings are both defaults, but I'm not sure; in particular, because iCloud only gives a 5 GB quota by default, I imagine a large fraction of iOS users don't (successfully) use iCloud backup. But yes, it's bad that that's the default.
I'm not so sure that's accurate. In versions of Apple's privacy policy going back to early May 2019, you can find this (from the Internet Archive):
https://web.archive.org/web/20190514021445/https://www.apple.com/legal/privacy/en-ww/
"We may also use your personal information for account and network security purposes, including in order to protect our services for the benefit of all our users, and pre-screening or scanning uploaded content for potentially illegal content, including child sexual exploitation material."
I'm not a lawyer, but that sounds to me like they started assuming the right to access your pictures more than 2 years ago. Which makes it even more puzzling why they went to the trouble of developing this complicated on-device scheme (and took the PR hit that they surely must have anticipated), instead of simply doing what all the other cloud companies do. The only possible explanation that I can think of is that they are laying the groundwork to be able to also scan in situations where they really can't access the user content in the cloud, i.e. end-to-end encrypted services like iMessages. That would be troubling since it would obviously undermine the whole premise of E2E encryption.
I think you're wrong about this. As far as I can tell from their statements, they're doing this only to photos which are being uploaded to iCloud Photos. So, any photo this is happening to is one that you've already asked Apple to copy to their servers.
> In this case, Apple has a very strong reason to believe they are transferring CSAM material, and they are sending it to Apple -- not NCMEC.
I suspect this is a fuzzy area, and anything legal would depend on when they can actually be said to be certain there's illegal material involved.
Their process seems to be: someone has uploaded photos to iCloud and enough of their photos have tripped this system that they get a human review; if the human agrees it's CSAM, they forward it on to law enforcement. There is a chance of false positives, so the human review step seems necessary.
After all, "Apple has hooked up machine learning to automatically report you to the police for child pornograpy with no human review" would have been a much worse news week for Apple.
Apple doesn't upload to their servers on a match, but Apple's able to decrypt an "visual derivative" (which I considered kinda under-explained in their paper) if there was a match against the blinded (asymmetric crypto) database.
So there's no transmit step here.
If anything, there's the question whether their reviewer is allowed to look at "very likely to be CP" content, or if they'd be in legal trouble for that.
I'd assume their legal teams have checked for that.
At face value it seemed like an interesting topic and I was glad I was pointed to it.
But the deeper I dive into it, the more I get the feeling parts of it are based on wrong assumptions and faulty understandings of the implementation.
The update at the end of the post didn't give me any assurance those errors would be revised. Rather it seems to cherry-pick discussing points from Apples FAQ on the matter and seems to contain misleading conclusions.
For instance:
> The FAQ says that they don't access Messages, but also says that they filter Messages and blur images. (How can they know what to filter without accessing the content?)
The sensitive image filter in Messages as part of the Family Sharing Parental Control feature-set is not to be confused with the iCloud Photo's CSAM detection at the center of this blogpost.
They - as in Apple the company - don't need access to the send/received images in order for iOS to perform on device image recognition on them, the same way Apple does not need access to one local photo library in order for iOS to recognise and categorise people, animals and objects.
> The FAQ says that they won't scan all photos for CSAM; only the photos for iCloud. However, Apple does not mention that the default configuration uses iCloud for all photo backups.
Are you sure about this? What is meant with default configuration? As far as I am aware, iCloud is opt-in. I could not find any mentioning of a default configuration/setting in the linked article to back up your claim.
> The FAQ say that there will be no falsely identified reports to NCMEC because Apple will have people conduct manual reviews. As if people never make mistakes.
I agree! People make mistakes. However, the way you have stated it, it looks like Apple claims no falsely identified reports as a result of the manual reviews it conducts and that is not how it is mentioned in the FAQ.
It states that system errors or attacks will not result in innocent people being reported to NCMEC as a result of 1) the conduct of human review in addition to 2) the designed system to be very accurate to the point of a one in one trillion per year likelihood any given account would be incorrectly identified (whether this claim holds any water, is another topic and one already addressed in the post and commented here).
Still, Apple cannot guarantee this.
And
“What Apple is proposing does not follow the law”
Apple is not scanning any images unless your account is syncing them to iCloud - so you as the device owner are transmitting them, not Apple. The scan takes place on device, and they are transmitting the analysis (and a low res version for manual review if required) as part of the image transmission.
Does that bring them into compliance?
Of course, I don't believe it is possible for them to be so confident about their processes. Humans regularly make mistakes, after all.
I paid very close attention to how they worded their "1 in 1 trillion" claim. They are talking about false-positive matches before it gets sent to the human.
Specifically, they wrote that the odds were for "incorrectly flagging a given account". In their description of their workflow, they talk about steps before a human decides to ban and report the account. Before ban/report, it is flagged for evaluation. That's the NeuralHash flagging something for review.
You're talking about combining results in order to reduce false positives. That's an interesting perspective.
If 1 picture has an accuracy of x, then the likelihood of matching 2 pictures is x^2. And with enough pictures, we quickly hit 1 in 1 trillion.
There are two problems here.
First, we don't know 'x'. Given any value of x for the accuracy rate, we can multiple it enough times to reach odds of 1 in 1 trillion. (Basically: x^y, with y being dependent on the value of x, but we don't know what x is.) If the error rate is 50%, then it would take 40 "matches" to cross the "1 in 1 trillion" threshold. If the error rate is 10%, then it would take 12 matches to cross the threshold.
Second, this assumes that all pictures are independent. That usually isn't the case. People often take multiple pictures of the same scene. ("Billy blinked! Everyone hold the pose and we're taking the picture again!") If one picture has a false positive, then multiple pictures from the same photo shoot may have false positives. If it takes 4 pictures to cross the threshold and you have 12 pictures from the same scene, then multiple pictures from the same false match set could easily cross the threshold.
It seems like ensuring one distinct NueralHash output can only ever unlock one piece of the inner secret, no matter how many times it shows up, would be a defense, but they don’t say…
I suspect that distributing CSAM-detecting machine learning models would be dangerous for another reason: they could potentially be re-used as CSAM-authoring tools.
Let's say you start with a machine model that takes an image as input and outputs a number that represents how CSAM-like the image is. (It might be possible to make a model that only outputs a binary yes/no, which would break this method, but I suspect it would be trivial to bypass that by testing to find analog values from inside the model to use as outputs instead). Now write a program that slowly evolves an image and then re-runs the machine model; evolving the image in the aim of getting a higher output number (more CSAM-like). Robots dreaming of CSAM.
The outputs may not look very realistic depending on the complexity of the model (see many "AI dreaming" images on the web), but even if they look at all like an illustration of CSAM then they will probably have the same "uses" & detriments as CSAM. Artistic CSAM is still CSAM.
You might be able to start with non-CSAM sexual images (eg of famous people) and evolve/augment them into CSAM sexual images; this opens a whole other can of worms. It could potentially require less computation time than generating from a random canvas and the results may (?) look more realistic too (if less deviation from the starting image is required?). There are some similarities to deepfaked porn here, but deepfaking using a model that knows only CSAM.
Or in other words: a machine model that identifies CSAM is itself an encoded form of CSAM.
> So where else could they get 1 trillion pictures?
I don't think they're getting 1 trillion images. They estimate this is the average chance of a particular account being flagged in a particular year.
Say Apple has 1 billion existing AppleIDs. That would would give them 1 in 1000 chance of flagging an account incorrectly each year.
I figure their stated figure is an extrapolation, potentially based on multiple concurrent strategies reporting a false positive simultaneously for a given image.
There’s a separate issue of training such a model, which I agree is probably impossible today.
I came to this blog via an aggregation service so I don't know what you do.
You mention a few points in the body.
Apple's reliability claims are statistics, not empirical. To borrow a quote,
"There are lies, damned lies, and statistics."
Their claim is as reliable as a computer Hard drive "mean time between failure guarantee"
> It would help if you stated your credentials for this opinion.
I can't control the content that you see through a data aggregation service; I don't know what information they provided to you.
You might want to re-read the blog entry (the actual one, not some aggregation service's summary). Throughout it, I list my credentials. (I run FotoForensics, I report CP to NCMEC, I report more CP than Apple, etc.)
For more details about my background, you might click on the "Home" link (top-right of this page). There, you will see a short bio, list of publications, services I run, books I've written, etc.
> Apple's reliability claims are statistics, not empirical.
This is an assumption on your part. Apple does not say how or where this number comes from.
Because the local device has an AI / machine learning model maybe? Apple the company doesn’t need to see the image, for the device to be able to identify material that is potentially questionable.
As my attorney described it to me:
It doesn't matter whether the content is reviewed by a human or by an automation on behalf of a human. It is "Apple" accessing the content.
Think of this this way: When you call Apple's customer support number, it doesn't matter if a human answers the phone or if an automated assistant answers the phone. "Apple" still answered the phone and interacted with you.
To put this into perspective:
My FotoForensics service is nowhere near as large as Apple. At about 1 million pictures per year, I have a staff of 1 part-time person (sometimes me, sometimes an assistant) reviewing content. We categorize pictures for lots of different projects. (FotoForensics is explicitly a research service.) At the rate we process pictures (thumbnail images, usually spending far less than a second on each), we could easily handle 5 million pictures per year before needing a second full-time person.
Of those, we rarely encounter CSAM. (0.056%!) I've semi-automated the reporting process, so it only needs 3 clicks and 3 seconds to submit to NCMEC.
Now, let's scale up to Facebook's size. 36 billion images per year, 0.056% CSAM = about 20 million NCMEC reports per year. times 20 seconds per submissions (assuming they are semi-automated but not as efficient as me), is about 14000 hours per year. So that's about 49 full-time staff (47 workers + 1 manager + 1 therapist) just to handle the manual review and reporting to NCMEC.
Update: cramer noted that I did the math wrong.
20 million reports * 20 seconds per report / 3600 seconds per hour / 2000 hours per employee = 56 employees. With manager, therapist, and additional employees to handle the churn, call it 60 employees.
> not economically viable.
Not true. I've known people at Facebook who did this as their full-time job. (They have a high burnout rate.) Facebook has entire departments dedicated to reviewing and reporting.
The math is actually 20mil * 20(s/rep) / 3600 (s/hr) / 2000 (hr/emp), which is 56 employees; let's round up to 60.
Of course there will be churn; no sane person wants to look at CSAM/CP all day, every day.
Yikes! I must have been tired!
You are correct -- it should be 56 people.
And I like the additional people for expected churn.
Ask yourself: would someone who isn't you -- and who is into little kids -- think it is erotica?
Remember: the humans who are reviewing the content before submitting it to NCMEC are biased. They see CP all day, and rarely pictures that are not CP. Any picture that crosses their monitor will be assumed to be CP first, and then needs justification for why it isn't CP. If they think it is CP, it gets submitted. And if a picture is questionable, then it gets submitted.
The basic mentality: submit it to NCMEC and let NCMEC / ICAC / FBI / police sort it out. I've had LEOs reject a few (3?) of my submissions as "not CP". That's fine. (Even if not CP, my site forbids nudity.) But that's rare. I've submitted plenty of suspected CSAM that didn't have nudity, and led to arrests.
(I say "suspected" because I'm not a court of law. Only a judge can make the determination that something is CP. Everyone else can only suspect.)
Then again, I've also been an expert witness on a few cases where one party in a divorce claimed that a nude photo of their infant laughing in a bathtub was CP -- and used it to get custody of the kids. (Divorces get nasty; if you ever get divorced, delete all of your funny-while-nude kid photos first!)
Apple describes NeuralHash as a "perceptual hash". That is a term of art and different from AI interpretation.
As I writer myself I've had people threaten to report me to the FBI for having teenage romance in one of my online fiction stories, not even explicit mind you, just teenagers being in a relationship. Apparently some consider that the equivalent to CP.
I don't trust a database compiled from dozens of different sources from reports by who know who around the world for what is actually a very murky legal area, whose definitions of what constitutes CP is different per nation, sate, agency, organization, or individual, and can change on a whim.
When I shared this article with my friend, he said that your 1 in 1 trillion claim is suspect. He said that you don't need 1 trillion images of training data and that you can just extrapolate the image to create more images.
Could this be possible?
Using a picture to create derivatives for training AI is really a bad idea. What you end up doing is training the AI on lots of similar pictures.
AI is really good at picking up on repetition. If variations of the same picture appear over and over, it will memorize the basis (the similarity attributes). It will end up learning "that picture" and not the generalized attributes.
Note: "extrapolate the image" doesn't just mean random crops, blurs, rotations, coloring, and other simple alterations. It can also refer to merges, blends, and image-to-image transformations. All of these are still variations of the same smaller picture set. There are some systems that try to use AI to generate training data for another AI system. (I've seen the 2nd AI memorize some almost imperceptable artifact that the first AI system was unknowingly introducing into the manufactured training data.)
Correct. You don't want to train the AI on a suite of pictures that are variants of the same picture -- unless you specifically want to identify that kind of variant.
For example, if I want to identify pictures of New York Drivers licenses, then I could probably get away with artifically generating drivers licenses and inserting them randomly into pictures. The license may change rotation, angle, etc., but they all have the same template. In effect, I would be training the AI to identify the template.
With pictures of people, you do not want to use the exact same cut-out person randomly pasted into different pictures at different angles. The AI will assume the "person" is a fixed template and will memorize the template. If it later sees the same person but in a different pose (e.g., trained on "seen from left, sitting" and tested with "standing up and walking away"), then it's not the same template so the AI will miss it.
Or worse: The AI will make a gross generalization, such as "anyone sitting looks like the template of someone sitting."
I'm pretty sure "V.E. Access to Your Account and Content" would let them do that. It boils down to "Apple can do whatever it wants with your content if it thinks doing so would help enforce the agreement". Naturally, the agreement includes not posting CSAM.
At a high level, you are correct. However, Apple has not provided specific details about how this work. (You may have the specific workflow incorrect.)
Let's use your example that 3 positives would need to trigger a safety review.
Have you ever use burst mode when taking pictures? 3-5 pictures is easy to do. If one picture is a false positive, then all of the pictures in the burst will likely be false positives. It is very easy to accidentally cross an arbitrary threshold like that.
The number we need to know is the false positve rate per picture, not per (unspecified size) set of pictures. Unfortunately, Apple has not released that information.
This also flags another point: What variables are introduced by using non-Apple photographic equipment, and how does that affect detection?
350 million images in a day gives an estimate of 127.75 billion images per year in 2013. Facebook only had 1.15 billion active users in 2013. It now has 2.89 billion active users in 2021.
If 127.75 billion images per year was to be considered as being today's estimates, then it means, on average, that each of their current 2.89 billion active users (from a pool of 3.51 billion total users) have posted about 44 images per year. That already seems like an under-estimate.
Going by your estimates it means that each of their 2.89 billion active users only posted 12 images per year in 2020. That seems like a gross under-estimate.
So I don't think your criticism about the 1 trillion number is as valid as you purport it to be, and Facebook users very likely post significantly more than 127.75 billion images per year, which would make it so they could get to 1 trillion in less than 8 years. you start counting now. But that was already 8 years ago and Facebook has existed for longer than that so Facebook has very likely surpassed the 1 trillion number a long time ago.
I’d guess most of their reports came from tech support, researchers testing on live data, and reports from customers about items accidentally put into a shared library, whereas Facebook is actively classifying everything they come across and so would automatically detect CSAM along with everything else.
You are absolutely correct that service providers are not required to search for CP. However, there is a catch.
First, the relevant laws (insert "I am not a lawyer" disclaimer):
18 USC § 2258A(f): If you see it or know it exists, then you must report it.
18 USC § 2258A(a)(2): You must report actual and "apparent" violations. Normally the laws say that you only need to report if you "know" (knows, knowingly, etc.) that the content is there. However, that "apparent" can be interpreted as "or have a strong reason to suspect."
18 USC § 2258A(e): Failure to report is a felony, punishable by a large fine.
Not explicit, but 18 USC § 2251 and 2252: If you fail to report, then you must be collecting and/or knowingly in possession, and that means you can be charged with bigger punishments. (If you make it a habit of not reporting, then you can be charged as a child sex offender or trafficker.)
18 USC § 2258A(h): After reporting, they want you to "preserve the content" for 90 days, in case law enforcement has questions. The 90-days is a recommendation, but not mandatory. (Why does FotoForensics ban user who upload any kind of porn for 90 days? I use the same clean-up script!)
18 USC § 2258B(a): This is the get-out-of-jail-free card. If you report to NCMEC then you cannot be charged under any of the other laws for collection, distribution (to NCMEC), viewing it (in order to validate), etc. When you report to NCMEC, you are given a tracking number. That is the proof-of-reporting receipt. (Note: This doesn't mean that you can use the collected CP for any other purpose, like training a new detector.)
18 USC § 2258B(c): As a service provider, a "minimum number of employees" are permitted to see and evaluate for the purpose of reporting. ("minimum" is undefined, but don't show it to your company of 10,000 employees.)
Here's the catch:
If you run a service that receives very little CP/CSAM, then it could very well be that you don't know it's there. (Not "knowingly", not "apparent".)
If you run a service that receives lots of CP/CSAM, then there comes a point where you have to know how people are using your site. With 4chan and craigslist, the volume of CP/CSAM was so large, that the admins must have known it was going on, even if they were not explicitly monitoring the content. With the benchmark of "beyond a reasonable doubt", there was no way a reasonable jury would believe that the admins didn't know it was happening. (They could debate "to what extent", but not that it was happening in high volume.) As the laws say, if you have knowledge that it is happening, then you are mandated to report it.
Initially, 4chan and craigslist were not reporting. After a friendly chat from "someone", they were reminded that they could be charged with aiding and abetting in crimes related to CP/CSAM. In effect: (A) start reporting, or (B) go to jail and permanently register as a sex offender. Thus, both now moderate and report CP/CSAM.
With nearly a third of the picture I am reporting appearing to come from Apple services and/or devices, there is no way a reasonable person can believe that Apple (corporate / management) does not know how their services are being used. Thus, they have to do something to show a good-faith effort to report.
So how do you stop the apparent illegal use of your service from unknown people? You need some way to detect that it is happening. That means creating a scanner or detector.
Note: Everyone else does server-side scanning. Apple still hasn't provided any reasonable explanation as to why they want to do client-side scanning. Remember: When you buy an iPhone, you own the hardware. Apple has no liability or responsibility for how you use the hardware. I think Apple should be scanning on the server-side and not the client-side since that is where their liability and responsibility for content begins.
You can't report what you don't know and what you don't suspect. So that is the legal loophole. A server owner can't be in violation of the law for the presence of CP on their server, if they literally had ZERO way to know what content was on their server.
There comes a point where the volume of reports are so great that Apple can't pretend that it isn't happening on Apple's system. They must show that they are taking steps to address the problem or they can be held accountable for facilitating distribution.
As I mentioned in my blog, not only do I make more reports than Apple, I've submitted many reports for CP that originated from Apple. I certainly doubt that I'm the only person reporting (especially when Facebook submits millions of reports) and I doubt that I'm the only person seeing CP distributed from Apple's services. This means that Apple has a big problem that they are willfully ignoring.
1) It's not scanned on the cloud, nor is every photo stored locally scanned. They are only scanned locally, and only when the you attempt to upload them to the cloud (or when locally stored photos are auto-backed-up, which is a feature you can disable).
2) If the found hash matches a known hash, the actual picture is sent for manual review, and only if the person looking at it determines that it's CP will the authorities be notified. You won't get FBI showing up at your door because a perfectly legal picture got misidentified as illegal by the hash algorithm.
3) I think I know why Apple doesn't scan it on the cloud. Ever since they promised to be committed to privacy and no longer keep copies of decryption keys to your phone (and possibly the cloud too), to avoid being forced by govt to decrypt your stuff, they needed another way to scan possible CP pictures. Without the keys to decrypt your cloud storage, they can't scan the cloud. So instead they have the cloud uploader app itself do the scanning. And so that your privacy isn't violated, they don't scan all locally stored images, only the ones being uploaded to the cloud (and you can disable cloud backups, so that NOTHING gets scanned).
4) All the hashes from the NCMEC are stored on your iPhone already (hashes are very small compared to full pictures, so even a few million of the hashes would only take up to a gigabyte of space), I think in the firmware or a hidden folder in the OS. As such, the picture doesn't need to get sent to anybody to scan, so it truly is local (nobody else sees your pictures, and if hash of these pics don't match the hash of any illegal pics, nobody ever will see your pics).
Second, I think your worry about not distributing the method to reverse these hashes is overblown.
1) The pics are at such a low resolution 26x26 (smaller than the 32x32 resolution of desktop icons) that they wouldn't be able to identify an individual by name (no possibility of revealing the identity of a victim of sex abuse), nor identify any of the private body parts that would define the pic as CP.
2) If your concern is correct that such a reversing algorithm would make the hashes into CP themselves, than that means they are already CP. CP law in the US broadly says that CP pics stored in any form are illegal. This means that even if they are scrambled, encrypted, or whatever, if there's any possibility of retrieving a picture from them, then the encryption, obfuscation, etc, is considered just one form of storage of CP. It doesn't matter if the tool to extract the actual image isn't publicly available. Under the law those hashes (due to the fact they CAN be reversed) would already be illegal pictures if your argument is correct (though I don't think it is, based on my explanation above about their low resolution). And therefore your withholding of such software/methods is a useless gesture in terms of keeping the hashes themselves from being defined as illegal pictures.