Fake Photos and Fraud
Friday, 24 August 2018
Pictures have always had power. But with the ubiquity and speed of social media, pictures have more influence on public opinion than ever before. It is very easy for a fake picture to go viral, and very difficult to disseminate any kind of correction.
Lots of people want a one-button solution. Something that will quickly tell them whether a picture is real or fake. Sadly, there is almost never a simple yes/no answer. But this desire for a fast, simplified solution opens the door for lots of snake-oil solutions and charlatan products.
This time, the product is called SurfSafe. This is a brand new Chrome plugin that claims to catch fake news by spotting fake photos. News outlets like Wired, BoingBoing, and Mac Observer recently touted this wonderful, new plugin. (I can only assume that none of those reporters actually reviewed the product before summarizing the PRweb press release.)
TL;DR: In my professional option, stay away from the SurfSafe plugin. It has serious privacy violations and does not do what it claims.
After that, you start surfing the web. Every picture gets a little icon above it. Clicking on the icon queries the SurfSafe servers for information about the picture. It returns the number of sightings, links to the sightings, and number of people who have reported the image. (Reports identify the number of people who think it is fake.)
Hovering the mouse over each picture also reveals a "Report" button. This is where you can report a picture as being "Propaganda", "Misleading", or "Photoshopped". This way, other people who see the same picture can quickly see how it has been reported.
Then I looked at my web browser. With Chrome, you can press Control-Shift-i and pull up the developer panel. Then you can click on the "Network" tab and see all of the network requests. Here's what my FotoForensics service normally looks like:

Normally, there are 5 network requests: the web page, style sheet, banner picture, fonts, and my little IPv4/IPv6 experiment.
And here's what happens when I have SurfSafe enabled:

Every web page gets passed through the SurfSafe extension. However, that final 'query' is the troubling issue. It submitted the URL for the banner image to SurfSafe. If my web page has 5 pictures, then all five image URLs would be sent from my browser to SurfSafe.
The number of privacy issues here is stunning:
For my own testing, I visited a couple of web pages on the public FotoForensics site. Then I queried SurfSafe about my banner. Here's the result:
My banner logo was found on three sources. Those are the three web pages at FotoForensics that I visited over HTTPS. (I checked my logs: SurfSafe only visited those pages after I went there with my browser.) Granted, on their public "sources" list, they removed the URL parameters from the source URLs (meaning that any URL requiring parameters will be a broken source if you click on it). However, they still list the web page and the page's HTML title; the title alone may give away too much personal information.
In any case, if you use the SurfSafe plugin, then you are feeding the URL of every picture you come across into their service for their collection. This is a huge privacy issue.
I uploaded the picture to FotoForensics. The metadata clearly states that the picture had been processed using Adobe Photoshop CS3. So, I used SurfSafe to 'report' the picture as 'photoshopped'. I reloaded the page and now there is one report for the picture (mine). However, nowhere does SurfSafe mention the cause of the report. There's just "Reports 1".
Clicking on Kavanaugh's Wikipedia picture takes you to Wikimedia Commons, where another copy of the picture is hosted. And when I say it's another copy, I mean it has the exact same SHA1 checksum -- bit-per-bit, it is the same picture. At Wikimedia Commons, SurfSafe reports 4 sources: 3 at Wikipedia (which appear to be the same URLs) and one at Wikimedia. However, the Wikimedia page doesn't have any "reports".
I then went back to the Wikipedia page and reloaded. Now it has 3 sources listed and one report. I waited about a minute and reloaded again: 4 sources and still one report. So even through a photoshopped picture is a photoshopped picture wherever it lives, SurfSafe only tags the photoshopped report to the one URL where I reported it as photoshopped.
What this means: SurfSafe won't flag a picture as being propaganda, misleading, or photoshopped unless someone else already flagged the picture at that specific URL. And since many web sites, including Facebook and Twitter, use user-specific URLs, it's very possible for other people to see the same picture and not be told that it's been altered. For SurfSafe to claim that they help detect fake photos is misleading at best.
misleading/photoshopped has a name: it's a mechanical turk. The mechanical turk will be just as accurate as the humans who categorize the photos. And the results will be just as consistent as the various human opinions. (Unfortunately, most humans are really bad at evaluating photos just by looking at them. And many people have different opinions.) Crowdsourcing the determination about whether a picture is real or fake (propaganda, misleading, photoshopped, etc.) is neither scientific nor accurate.
Crowdsourcing usually results in the most popular solutions. Consider the case of Boaty McBoatface. In March 2016, the Natural Environment Research Council ran a poll to name one of their ships. Through the miracle of crowdsourcing, the name "Boaty McBoatface" was selected. (This created a controversy when the NERC decided to assign the name to a different vehicle -- one with a less public image.) However, this shows how crowdsourcing works: it's not always the best solution.
There are other issues with this crowdsourcing solution. For example, nobody vetted me. I didn't have to login or create any kind of unique ID to submit my 'report' to SurfSafe. This means that anyone can poison pictures at SurfSafe. It doesn't take much for a bot to submit thousands of reports against lots of pictures. (And you can always use a list of open proxies if you want to come from ten thousand different addresses.) Organizations with specific agendas can easily have real pictures marked as untrusted in order to cause confusion or attempt to discredit negative publicity.
Perhaps SurfSafe should consider a Facebook-style user rating system. Facebook users who forward known-false stories will earn a lower trustworthiness score than people who forward vetted stories. This reputation management system is Facebook's attempt to combat fake news. The idea is that users with low trustworthiness scores won't be able to propagate new stories very far. If you have a reputation for promoting false stories, then any new content from you will be treated with a healthy amount of skepticism. Then again, SurfSafe will still need a way to vet stories before assigning user trustworthiness.
The problem is that no news outlet is perfect. Trusted news outlets may occasionally make mistakes. (Some news outlets have more mistakes than others. Right, Fox?) This is particularly important in this era of rapid news coverage; for many news outlets, speed is more important than accuracy. Without vetted news reporting, we end up with situations like the Boston Marathon bombing, where the wrong person was named by some media outlets.
There's also a false equivalency and false balance problems that some media outlets experience. A few journalists think "fair and balanced" means that they must provide equal coverage to both sides of a reported topic. If they say something negative about Nazis, then they must also find at least something positive to say. Let me be blunt: Nazis are bad. Period. They should not be making up a straw man argument just to appear balanced.
Finally, there's the entire clickbait news issue. Some news outlets intentionally post controversial topics in order to generate clicks and views for advertisers. SurfSafe has no method to represent issues related to bad reporting or ulterior motives.
The domain "getsurfsafe.com" was registered last month, on 2018-07-03. There are various domain reputation reporting systems out there. One of the big red flags is a new domain name. So this is a red flag. Nothing in their domain name or SSL certificate identifies the owner of this service. (And thanks to the GDPR, there's even less information available.)
How about their web page? They don't list any of their people and there's no "About Us" or "Who we are". The only thing it says at the bottom of the page is that it's powered by Robhat Labs. Robhat Labs is almost as mysterious: no names, no about us, nothing identifiable.
In fact, I found more information about the people behind Robhat Labs via LinkedIn and their press release. When it comes to trustworthiness, I have a lot of issues with site owners who don't identify themselves on their sites.
Sadly, some groups try to take advantage of this void and fill it with bogus analysis tools. SurfSafe is not a solution for identifying fake news.
The only thing worse than fake Fake-News detectors are news outlets that want us to trust them, but promote "new tools" without vetting them first. (I'm talking to you, Wired, BoingBoing, and Mac Observer.)
Thanks to R and M for pointing this app out to me. And thanks to The Boss for the lively discussions.
Lots of people want a one-button solution. Something that will quickly tell them whether a picture is real or fake. Sadly, there is almost never a simple yes/no answer. But this desire for a fast, simplified solution opens the door for lots of snake-oil solutions and charlatan products.
This time, the product is called SurfSafe. This is a brand new Chrome plugin that claims to catch fake news by spotting fake photos. News outlets like Wired, BoingBoing, and Mac Observer recently touted this wonderful, new plugin. (I can only assume that none of those reporters actually reviewed the product before summarizing the PRweb press release.)
TL;DR: In my professional option, stay away from the SurfSafe plugin. It has serious privacy violations and does not do what it claims.
How it works (in theory)
According to their web site, snazzy video, and the application (when you install it), SurfSafe helps identify pictures associated with fake news. First, you select the news outlets you trust from a long list of sources. Everything from ABC, NBC, and BBC to Breitbart and Fox are listed. SurfSafe appears to make no assumptions about the validity of information put out by the various sources.After that, you start surfing the web. Every picture gets a little icon above it. Clicking on the icon queries the SurfSafe servers for information about the picture. It returns the number of sightings, links to the sightings, and number of people who have reported the image. (Reports identify the number of people who think it is fake.)
Hovering the mouse over each picture also reveals a "Report" button. This is where you can report a picture as being "Propaganda", "Misleading", or "Photoshopped". This way, other people who see the same picture can quickly see how it has been reported.
Privacy issues
The first significant issue I saw is related to how SurfSafe gathers pictures for their collection. I thought that they would just harvest pictures from news outlets. I mean, they have a huge list of news outlets that you can select from as authoritative sources. However, that doesn't seem to be how it works. I went to a couple of news outlets on their list, and most of the pictures were not already indexed by their system.Then I looked at my web browser. With Chrome, you can press Control-Shift-i and pull up the developer panel. Then you can click on the "Network" tab and see all of the network requests. Here's what my FotoForensics service normally looks like:
Normally, there are 5 network requests: the web page, style sheet, banner picture, fonts, and my little IPv4/IPv6 experiment.
And here's what happens when I have SurfSafe enabled:
Every web page gets passed through the SurfSafe extension. However, that final 'query' is the troubling issue. It submitted the URL for the banner image to SurfSafe. If my web page has 5 pictures, then all five image URLs would be sent from my browser to SurfSafe.
The number of privacy issues here is stunning:
- The folks at SurfSafe know every web page I visit, when I visit it, and every picture on every web page.
- If any of the URLs contain personal information, such as access tokens or keys or names, then they get that, too. Since I tested against my FotoForensics web site, I watched the logs for their accesses. They definitely keep and use all URL parameters when retrieving pictures.
- If any of the URLs point to personal pictures -- things that you don't want other people to see -- then that's too bad. SurfSafe retrieves the pictures and adds them to their collection of known sources. Remember: lots of URLs are not publicly known but are still publicly accessible if you have the right URL-based parameters. For example, Facebook pictures can be marked as private. But the URLs are accessible if you have all of the URL parameters. This means that SurfSafe can access your private Facebook pictures as soon as you visit your private Facebook page.
- As you'll notice from my screenshots, I accessed my site using HTTPS. HTTPS is supposed to be secure, but it's not secure from local browser extensions. SurfSafe grabs your secure URL and passes it to their third-party service.
For my own testing, I visited a couple of web pages on the public FotoForensics site. Then I queried SurfSafe about my banner. Here's the result:
My banner logo was found on three sources. Those are the three web pages at FotoForensics that I visited over HTTPS. (I checked my logs: SurfSafe only visited those pages after I went there with my browser.) Granted, on their public "sources" list, they removed the URL parameters from the source URLs (meaning that any URL requiring parameters will be a broken source if you click on it). However, they still list the web page and the page's HTML title; the title alone may give away too much personal information.
In any case, if you use the SurfSafe plugin, then you are feeding the URL of every picture you come across into their service for their collection. This is a huge privacy issue.
Accuracy issues
To test SurfSafe's accuracy, I needed a sample picture. I searched Google for Supreme Court wannabe Brett Kavanaugh and his Wikipedia page came right up. On the Wikipedia page is his picture. According to SurfSafe, the picture had zero sources and zero reports.I uploaded the picture to FotoForensics. The metadata clearly states that the picture had been processed using Adobe Photoshop CS3. So, I used SurfSafe to 'report' the picture as 'photoshopped'. I reloaded the page and now there is one report for the picture (mine). However, nowhere does SurfSafe mention the cause of the report. There's just "Reports 1".
Clicking on Kavanaugh's Wikipedia picture takes you to Wikimedia Commons, where another copy of the picture is hosted. And when I say it's another copy, I mean it has the exact same SHA1 checksum -- bit-per-bit, it is the same picture. At Wikimedia Commons, SurfSafe reports 4 sources: 3 at Wikipedia (which appear to be the same URLs) and one at Wikimedia. However, the Wikimedia page doesn't have any "reports".
I then went back to the Wikipedia page and reloaded. Now it has 3 sources listed and one report. I waited about a minute and reloaded again: 4 sources and still one report. So even through a photoshopped picture is a photoshopped picture wherever it lives, SurfSafe only tags the photoshopped report to the one URL where I reported it as photoshopped.
What this means: SurfSafe won't flag a picture as being propaganda, misleading, or photoshopped unless someone else already flagged the picture at that specific URL. And since many web sites, including Facebook and Twitter, use user-specific URLs, it's very possible for other people to see the same picture and not be told that it's been altered. For SurfSafe to claim that they help detect fake photos is misleading at best.
Crowdsourcing issues
Having humans evaluate pictures and categorize them as real/correct (not reported) or propaganda/Crowdsourcing usually results in the most popular solutions. Consider the case of Boaty McBoatface. In March 2016, the Natural Environment Research Council ran a poll to name one of their ships. Through the miracle of crowdsourcing, the name "Boaty McBoatface" was selected. (This created a controversy when the NERC decided to assign the name to a different vehicle -- one with a less public image.) However, this shows how crowdsourcing works: it's not always the best solution.
There are other issues with this crowdsourcing solution. For example, nobody vetted me. I didn't have to login or create any kind of unique ID to submit my 'report' to SurfSafe. This means that anyone can poison pictures at SurfSafe. It doesn't take much for a bot to submit thousands of reports against lots of pictures. (And you can always use a list of open proxies if you want to come from ten thousand different addresses.) Organizations with specific agendas can easily have real pictures marked as untrusted in order to cause confusion or attempt to discredit negative publicity.
Perhaps SurfSafe should consider a Facebook-style user rating system. Facebook users who forward known-false stories will earn a lower trustworthiness score than people who forward vetted stories. This reputation management system is Facebook's attempt to combat fake news. The idea is that users with low trustworthiness scores won't be able to propagate new stories very far. If you have a reputation for promoting false stories, then any new content from you will be treated with a healthy amount of skepticism. Then again, SurfSafe will still need a way to vet stories before assigning user trustworthiness.
Trusted Sources
One of the first things that the SurfSafe browser extension requires is a list of trusted news outlets. Users select from a preset list of known news entities. At that point, it assumes that pictures from these sources are considered safe. This shows up in the JSON code seen in the network query. I marked CNN as a safe news outlet and then visited CNN.com. Each of the bulk query images uploaded to SurfSafe was automatically marked with the classification of "Safe".The problem is that no news outlet is perfect. Trusted news outlets may occasionally make mistakes. (Some news outlets have more mistakes than others. Right, Fox?) This is particularly important in this era of rapid news coverage; for many news outlets, speed is more important than accuracy. Without vetted news reporting, we end up with situations like the Boston Marathon bombing, where the wrong person was named by some media outlets.
There's also a false equivalency and false balance problems that some media outlets experience. A few journalists think "fair and balanced" means that they must provide equal coverage to both sides of a reported topic. If they say something negative about Nazis, then they must also find at least something positive to say. Let me be blunt: Nazis are bad. Period. They should not be making up a straw man argument just to appear balanced.
Finally, there's the entire clickbait news issue. Some news outlets intentionally post controversial topics in order to generate clicks and views for advertisers. SurfSafe has no method to represent issues related to bad reporting or ulterior motives.
Attribution issues
Okay, so now we know how SurfSafe works (or doesn't), we can start looking at who is behind this service.The domain "getsurfsafe.com" was registered last month, on 2018-07-03. There are various domain reputation reporting systems out there. One of the big red flags is a new domain name. So this is a red flag. Nothing in their domain name or SSL certificate identifies the owner of this service. (And thanks to the GDPR, there's even less information available.)
How about their web page? They don't list any of their people and there's no "About Us" or "Who we are". The only thing it says at the bottom of the page is that it's powered by Robhat Labs. Robhat Labs is almost as mysterious: no names, no about us, nothing identifiable.
In fact, I found more information about the people behind Robhat Labs via LinkedIn and their press release. When it comes to trustworthiness, I have a lot of issues with site owners who don't identify themselves on their sites.
Misleading at best
So let's look at what SurfSafe really does and compare it to their claims (on their web page):- Claim: SurfSafe protects users from misleading photoshopped and fake news throughout the Internet.
Fact: Nope. At best it finds other sources for the picture. It relies on unvetted people to flag pictures as real or not, and those flags do not carry over to other URLs that host the exact same picture. Even as a search-by-picture service, you're better off using Google Images or TinEye. - Claim: They mark the level of safety of an image or article on the corners of the pictures.
Fact: Nope. They add widgets to the corners, where you can then query their service. The "level of safety" is based on crowdsourcing and not any scientific analysis. Moreover, their crowdsourcing approach is significantly biased and easy for attackers to manipulate. - Claim: They defend the Internet and bring the world one step closer to a fake-news-free Internet.
Fact: Nope. They violate user privacy by collecting URLs and pictures that may be sensitive in nature. Not only does this service not identify fake news, it permits attackers to mark real news as fake. - Claim: You choose who to trust by deciding which organizations to base the truth off of.
Fact: You can select organizations that you think are trustworthy, but their selection process does not determine accuracy. Regardless of whether I think the BBC is good journalism or you think Fox is trustworthy, this says nothing about whether the reporting is actually accurate.
This is one of the big problems a silo mentality. Everyone is already using algorithms to show you what you will probably want to see. Facebook shows you similar articles to ones you like. Amazon shows you similar products that you might like. Twitter makes recommendations about who to follow based on your previous actions. This doesn't mean that SurfSafe is showing you the truth; it's only showing you what it thinks you want. - Claim: "SurfSafe uses image and textual analysis to catch when fake news tries to mislead you."
Fact: Nope. This may be what they think they are doing, but it is definitely not what they are doing.
Sadly, some groups try to take advantage of this void and fill it with bogus analysis tools. SurfSafe is not a solution for identifying fake news.
The only thing worse than fake Fake-News detectors are news outlets that want us to trust them, but promote "new tools" without vetting them first. (I'm talking to you, Wired, BoingBoing, and Mac Observer.)
Thanks to R and M for pointing this app out to me. And thanks to The Boss for the lively discussions.
Read more about Forensics, Image Analysis, Mass Media, Network, Politics, Privacy
| Comments (5)
| Direct Link
Did someone say Russia?
Saturday, 11 August 2018
Russia seems to be all over the news these days.
It's like, whenever anything goes bad, it must be "Russia".
The worst whack-a-mole group right now is the Russian Circle Group (RCG). That's our name for them, not their real name. I wrote about them a few weeks ago -- they are a darknet drug distribution group. Showing them a notice about not using my public site for their commercial purpose did nothing. My logs show that they saw the message. Some of their members even paused and a few did stop. But most of their group continued to use my site. A few RCG folks became so engaged with the whack-a-mole avoidance approach that they would change addresses before I could ban them.
After a few months of this, I decided to escalate the issue. I created a tutorial that included their pictures of drugs and drop-off locations. In my automated warning to them (in both English and Russian), I even included a link to the tutorial. Many of their members visited the tutorial and saw their pictures. A few members stopped, but most kept going. They didn't care that their pictures were being made public.
Late last month, I change tactics. I no longer censored their pictures. Besides posting pictures of their drugs and drop-off locations, I also posted pictures of people, receipts, homes, and anything else they uploaded. This had the desired impact. Their upload volume dropped from roughly 20 pictures per day to 1-2 a day, and some days with zero uploads. (The few days with multiple uploads are all from the same person.) I guess that they don't mind people knowing their drug locations as long as they remain anonymous.

Similarly, I run a homegrown honeypot. This system looks for common attack patterns and occasionally warns my other services of changes in the background noise level of constant attacks.
There's always been a steady amount of ongoing attacks. Various bots are continually scanning the Internet for various exploits. However, recently I've been seeing an increase in attacks from Russia. I reached out to Troy Mursch (@bad_packets). Among other things, he runs the Bad Packets Report and his own honeypot. I asked if he's been seeing an increase in attacks from Russia. It took him a few minutes to consult his data, then he came back with a resounding "Yes." For a comparison, he shared his list of attackers over time with me, and I graphed them.
Note: These graphs are from his data, not mine. I have his permission to include these pictures here, but not to distribute the raw data. (My data forms similar graphs, but he has more data.)
I graphed every IP address using a Hilbert curve. This is a cute algorithm that places adjacent subnets into tight squares. (I had to build my own tool because I didn't like the labels used by tools like ipv4-heatmap and glheatmap. I've updated the labels to match the current Class-A allocations.)
You can click on any of these pictures to see the full-sized map. In the full maps, each pixel represents a "/20" subnet -- that's 4,096 IPv4 addresses. Each colored pixel represents one or more attacks during the given month. (I'm using a rainbow heatmap: black=none, red, orange, yellow, ..., blue, white=max seen.)
This first map shows all attacks (from anywhere in the world) received by the honeypot during the first 7 months of this year.

The dots change per month, but the overall intensity doesn't really change. This mainly shows lots of infected hosts, botnet nodes, and hostile cloud nodes coming and going. This is typical for the Internet in general. Similarly, some subnets are more hostile than others. This is expected because some hosting providers do little to prevent outgoing attacks.
The only real surprising things I saw when I generated this graph:
The graph of China's network attacks mainly shows originations from APNIC -- because China is managed by APNIC. While there is steady flow of attacks from the same subnets, there's no significant change pattern per month. (Personally, I think this shows how bad the infection rate is in China. China has a big problem with malware and vulnerable IoT devices.)
Finally, I graphed attacks that geolocated to Russia. (This excludes Ukraine and other former Soviet Republics.)

There are a few random dots from infected hosts. However, unlike the other graphs, this has a steady increase in sightings. Interestingly, many of these come from AS12389. They appear as growing white spots in Net5 (far-left, 4th row), Net31 (4th row, 5th column), and the various white clusters in the bottom half. This particular domain appears to have strong ties to the Russian government, and even hosts domains like rkn.gov.ru (I think this is like their FCC), mchs.gov.ru (Russian Emergency Situations Ministry), and rt.com (Russian TV).
It's one thing to think of these Russian links as coincidental or a convenient excuse for anything bad in the world. It's another thing to see a steady pattern and quantifiable results. These attacks from Russia are attacks; they are real and not a myth. Some appear to be unrestricted organized crime, while others have close links or direct ties to the Russian government.
P.S. Thanks, Troy for your assistance. As you requested, I colorized and animated the data. Woot!
- Politics? Russia.
- Sanctions? Russia.
- Nerve gas attacks? Russia.
- Manafort trial? Russia.
- Fake news? Russia.
- Upcoming elections? Russia.
- Facebook ads? Russia.
- NRA? Russia.
- Cyber attacks? Russia.
- Places for US Senators to have secret meetings over the 4th of July? Russia.
It's like, whenever anything goes bad, it must be "Russia".
FotoForensics
Over at FotoForensics, we get a lot of visitors. Most adhere to the terms of service, but a few do not. One of the biggest abuses comes from people using the public service for commercial purposes. With any other country, we show them a "not for commercial use" notice and they go away. But Russians? They interpret this as "change your IP address and continue where you left off." We call it 'whack-a-mole' when people change their IP addresses to avoid a ban; we've had to develop a lot of anti-whack-a-mole solutions to combat this problem.The worst whack-a-mole group right now is the Russian Circle Group (RCG). That's our name for them, not their real name. I wrote about them a few weeks ago -- they are a darknet drug distribution group. Showing them a notice about not using my public site for their commercial purpose did nothing. My logs show that they saw the message. Some of their members even paused and a few did stop. But most of their group continued to use my site. A few RCG folks became so engaged with the whack-a-mole avoidance approach that they would change addresses before I could ban them.
After a few months of this, I decided to escalate the issue. I created a tutorial that included their pictures of drugs and drop-off locations. In my automated warning to them (in both English and Russian), I even included a link to the tutorial. Many of their members visited the tutorial and saw their pictures. A few members stopped, but most kept going. They didn't care that their pictures were being made public.
Late last month, I change tactics. I no longer censored their pictures. Besides posting pictures of their drugs and drop-off locations, I also posted pictures of people, receipts, homes, and anything else they uploaded. This had the desired impact. Their upload volume dropped from roughly 20 pictures per day to 1-2 a day, and some days with zero uploads. (The few days with multiple uploads are all from the same person.) I guess that they don't mind people knowing their drug locations as long as they remain anonymous.
The Cybers
My web services use a homemade automated attack identification and defense system. Depending on the type of action, you might receive restricted access (e.g., view but no uploads), a blank page, a temporary ban, or a long-term ban. There's even a mode where it will automatically track specific users for a short time, just in case they try to escalate an attack.Similarly, I run a homegrown honeypot. This system looks for common attack patterns and occasionally warns my other services of changes in the background noise level of constant attacks.
There's always been a steady amount of ongoing attacks. Various bots are continually scanning the Internet for various exploits. However, recently I've been seeing an increase in attacks from Russia. I reached out to Troy Mursch (@bad_packets). Among other things, he runs the Bad Packets Report and his own honeypot. I asked if he's been seeing an increase in attacks from Russia. It took him a few minutes to consult his data, then he came back with a resounding "Yes." For a comparison, he shared his list of attackers over time with me, and I graphed them.
Note: These graphs are from his data, not mine. I have his permission to include these pictures here, but not to distribute the raw data. (My data forms similar graphs, but he has more data.)
I graphed every IP address using a Hilbert curve. This is a cute algorithm that places adjacent subnets into tight squares. (I had to build my own tool because I didn't like the labels used by tools like ipv4-heatmap and glheatmap. I've updated the labels to match the current Class-A allocations.)
You can click on any of these pictures to see the full-sized map. In the full maps, each pixel represents a "/20" subnet -- that's 4,096 IPv4 addresses. Each colored pixel represents one or more attacks during the given month. (I'm using a rainbow heatmap: black=none, red, orange, yellow, ..., blue, white=max seen.)
This first map shows all attacks (from anywhere in the world) received by the honeypot during the first 7 months of this year.
The dots change per month, but the overall intensity doesn't really change. This mainly shows lots of infected hosts, botnet nodes, and hostile cloud nodes coming and going. This is typical for the Internet in general. Similarly, some subnets are more hostile than others. This is expected because some hosting providers do little to prevent outgoing attacks.
The only real surprising things I saw when I generated this graph:
- The honeypot received lots of queries from APNIC (Asia-Pacific), AfriNIC (Africa), LACNIC (from Mexico to South America), and RIPE (Europe). In contrast, ARIN (North America: United States and Canada) has relatively few sightings. Considering that the US doesn't censor Internet traffic, I was pleasantly surprised by this low volume of sightings.
- The DoD and US government are eerily silent. But the same can't be said for other governments.
- Apple's network has a few addresses that are regularly scanning the Internet. (I wonder what for?)
The graph of China's network attacks mainly shows originations from APNIC -- because China is managed by APNIC. While there is steady flow of attacks from the same subnets, there's no significant change pattern per month. (Personally, I think this shows how bad the infection rate is in China. China has a big problem with malware and vulnerable IoT devices.)
Finally, I graphed attacks that geolocated to Russia. (This excludes Ukraine and other former Soviet Republics.)
There are a few random dots from infected hosts. However, unlike the other graphs, this has a steady increase in sightings. Interestingly, many of these come from AS12389. They appear as growing white spots in Net5 (far-left, 4th row), Net31 (4th row, 5th column), and the various white clusters in the bottom half. This particular domain appears to have strong ties to the Russian government, and even hosts domains like rkn.gov.ru (I think this is like their FCC), mchs.gov.ru (Russian Emergency Situations Ministry), and rt.com (Russian TV).
A well-earned reputation
I'm don't do elections, I'm not heavily vested in politics, and I'm not affiliated with any news outlet. Yet, in my own little world, I'm seeing an increase in "Russia" explanations. If this is what I'm seeing, then imagine what this looks like for news, governments, and other organizations that have more skin in the game.It's one thing to think of these Russian links as coincidental or a convenient excuse for anything bad in the world. It's another thing to see a steady pattern and quantifiable results. These attacks from Russia are attacks; they are real and not a myth. Some appear to be unrestricted organized crime, while others have close links or direct ties to the Russian government.
P.S. Thanks, Troy for your assistance. As you requested, I colorized and animated the data. Woot!
Read more about Forensics, FotoForensics, Mass Media, Network, Politics, Security
| Comments (2)
| Direct Link
Temporarily Offline
Saturday, 4 August 2018
I've been doing a lot of traveling lately. (In July, I was gone more than I was here.) Oddly, one topic came up repeatedly on each trip: my cellphone. I do have a cellphone and I do use it when I travel. However, I don't have a data plan and I don't have text messaging (SMS). As a result, I get interactions like:
Airlines
Every airline out there wants my phone number. "We want your cellphone number so we can text you about any flight delays."
My reply: I live 2 hours away from the airport, and they don't notify people about flight delays until minutes before the flight is officially "late". By then, I'm already at the airport.
There was one time when a blizzard was rolling in and they were expecting flight cancellations. I received an email notifying me about the issue and giving me the option to reschedule without a change fee. But if they're going to email me, then they don't need my cellphone number. And for me, email is more convenient.
Hotel
Lately, hotels have been emailing me days ahead of time, asking me to check in now so I don't have to do it later. They also want me to use my cellphone to get my room key.
My reply: I don't want to check in days early because the airline might suddenly delay or cancel my flight. If I don't check in days early, then I can still exercise the option to cancel the reservation within 24 hours of check-in without a fee. But if I remotely check in early, then I can't cancel the reservation.
As far as getting my room key early... The only time I've ever had to wait more than a few minutes at the front desk has been when checking in at Las Vegas. (Most Vegas hotels there are horribly slow.) And with many/most hotels, checking in remotely doesn't stop you from having that same wait to get to the reception desk.
Rental Car
The rental car company wants to text me about the parking space where my car is located. Inevitably, the person driving the rental car shuttle asks something like:
My name always is on the big digital display at the parking garage. (Incidentally, I view this as a privacy issue since, as far as I know, I'm the only "N. Krawetz" in the United States.)
The longest it's ever taken me to find my name on the display and to find the car has been under a minute. Do I really need them to tell me the location in advance? After finding the car, I always walk around with my smartphone's video camera as I document every scratch and dent. (Using the video camera doesn't require Internet access. And I've never been charged for vehicle damage because I can show that any damage was there when I picked up the vehicle.) There's also the delay at the rental car exit and randomness of the highway traffic speeds before I get to the hotel. That extra minute to find the car isn't saving me anything.
In fact, telling the driver your space number may actually take more time. The first thing the drivers do is drop people like me off at the big digital display in the parking garage (so I can find my space number). Then they drive around everyone else to their car locations. I'm usually video recording the scratches on the car before the first person gets dropped off by the rental car shuttle. (If I care about speed, then I don't want to be personally dropped off at the car by the shuttle.)
Restaurants
A lot of crowded restaurants want your cellphone number so they can text you when your table is ready. Oddly, if I tell them that I don't have a cellphone, then they will give me a puzzled look (likely because they can see the phone in my shirt pocket) and then give me one of those light-up coasters. I find that those light-up vibrating coasters work fine. Plus, I don't have to think about how else the restaurant will use my phone number.
Coworkers
Occasionally I get coworkers who mention that they want to email me or text me when I'm traveling. But without a data plan and without text messaging, they have to either call me or wait until I can check my email on my laptop. I think this is more for their convenience than for mine.
I don't use the cellphone while driving. So if they did try to contact me while I'm on the road, I wouldn't notice until I arrived at my destination. On my last trip, I had a coworker tell me that, had I had text messaging, the message would be waiting on my phone when I got there. But this really doesn't speed anything up. If he calls me, my phone will tell me that he called me and I can call him back.
The same goes for airplane mode when I'm flying. When I'm not in front of a computer, I'm temporarily offline.
Driving
I have a lot of friends who use Google Maps to help them drive around town. Having that real-time traffic feature certainly is useful. But me? As a digital luddite, I travel with a portable GPS. I've found that a TomTom or Garmin can find the basic route much faster than a smartphone. One time in Vegas, I had a car full of geeks who were all trying to give me directions. By the time their cellphone data plans loaded Google maps, acquired GPS satellites, and found the directions, I was already pulling into the restaurant's driveway. A standalone GPS always loads faster than Google maps.
The only time Google Maps is more convenient is when there's a lot of unexpectedly bad traffic. As an alert driver, I can usually see the traffic slowing down. And it's easy enough for me to hit the "detour" button on the my GPS. Then again, with everyone else already using Google Maps, most other people have already diverted, clogging up the available detours.
All of this comes down to the basic conclusion: I really don't need a data plan or text messaging.
A few years ago, I revisited the super-cookie issue. Still today, many ISPs and cellphone providers insert unique tracking tokens into every web request. For example: today I found a half-dozen users who sent their cellphone numbers to my FotoForensics web site. They didn't do it intentionally. They probably didn't know that it even happened. Most of these came from HTTP "X-IMSI" and "Msisdn" headers that were inserted by their service providers. With my own phone, AT&T will happily insert my phone number into every web request. (It's not just AT&T. I saw Proximus NV in Belgium do it the exact same way last month. Do I really want to call someone in Belgium just because they visited my web site?)
These supercookies that identify cellphone numbers only appear when using the phone's data plan. They do not get inserted if you use a wifi access point. Personally, I have no problem with using my phone with a public wifi access point. (Then again, I'm not at Defcon. For folks going to Defcon: the wifi there is too hostile; don't use it with your cellphone.)
One of the things I've noticed with every smartphone I've tested is that they all reach out to various web sites when they connect online. Some sites are related to apps that are running. Some check the time or look for updates. And some are... well, I don't know. If I use a data plan, then my phone is sending its phone number to a bunch of sites that I don't recognize. What are these sites? Why does my cellphone or carrier think these sites need my phone number? Are these sites collecting phone numbers or ignoring the data?
With a desktop computer, I have the option of configuring a firewall or using software to block or filter network traffic. I even figured out how to stop Windows 10 from reporting all of my computer usage to Microsoft. However, I don't have same level of control over my smartphone. There are apps that I cannot disable. There are permissions that I cannot revoke. And there are default options that are far from secure -- and with few mitigation options.
I see lots of people check their email or access banking information from their smartphones. They say that they trust their phones. But I have to ask: why? Why do they trust a device that doesn't disclose what it's doing? Why do they trust their carriers who are obviously tracking them? (E.g., Why are some carriers man-in-the-middling some SSL connections? I can tell this happens when the device's SSL fingerprint doesn't match the device.) Similarly, why are Verizon, AT&T Mobility, AT&T Worldnet, and Time Warner using transparent proxies for mobile devices? (The TCP signatures change when a proxy is used.) I see no reason to trust these devices and no reason to trust these carriers.
Some political associations are warning people away from certain smartphone brands. However, I think the warning is too weak: no smartphone is secure enough for handling personal or sensitive information. If you have to use email with your smartphone, then use a separate email address that is only used by the smartphone. If you have to use a social app, then create a special account specifically for that social app.
I have to think that there's a correlation here: the more you and your phone give out the phone number, the more spam calls you receive. By not having a data plan and not giving out the number, I limit the exposure and end up only receiving calls from spammers who war dial every phone number.
Okay, so this explains why I don't have a data plan. But why don't I have text messaging? My carrier tied texting to the data plan. I can't have texting without data. And since I don't have a data plan, I don't have texting.
Airlines
Every airline out there wants my phone number. "We want your cellphone number so we can text you about any flight delays."
My reply: I live 2 hours away from the airport, and they don't notify people about flight delays until minutes before the flight is officially "late". By then, I'm already at the airport.
There was one time when a blizzard was rolling in and they were expecting flight cancellations. I received an email notifying me about the issue and giving me the option to reschedule without a change fee. But if they're going to email me, then they don't need my cellphone number. And for me, email is more convenient.
Hotel
Lately, hotels have been emailing me days ahead of time, asking me to check in now so I don't have to do it later. They also want me to use my cellphone to get my room key.
My reply: I don't want to check in days early because the airline might suddenly delay or cancel my flight. If I don't check in days early, then I can still exercise the option to cancel the reservation within 24 hours of check-in without a fee. But if I remotely check in early, then I can't cancel the reservation.
As far as getting my room key early... The only time I've ever had to wait more than a few minutes at the front desk has been when checking in at Las Vegas. (Most Vegas hotels there are horribly slow.) And with many/most hotels, checking in remotely doesn't stop you from having that same wait to get to the reception desk.
Rental Car
The rental car company wants to text me about the parking space where my car is located. Inevitably, the person driving the rental car shuttle asks something like:
Driver: Did you received your space number?
Me: No.
Driver: They should have texted it to you.
Me: I don't have text messaging on my phone.
Driver: Oh. They should have emailed it to you.
Me: I don't have email on my phone. (No data plan.)
Driver: Oh... Weird. Okay, then your name should be up on the big digital display at the parking garage.
My name always is on the big digital display at the parking garage. (Incidentally, I view this as a privacy issue since, as far as I know, I'm the only "N. Krawetz" in the United States.)
The longest it's ever taken me to find my name on the display and to find the car has been under a minute. Do I really need them to tell me the location in advance? After finding the car, I always walk around with my smartphone's video camera as I document every scratch and dent. (Using the video camera doesn't require Internet access. And I've never been charged for vehicle damage because I can show that any damage was there when I picked up the vehicle.) There's also the delay at the rental car exit and randomness of the highway traffic speeds before I get to the hotel. That extra minute to find the car isn't saving me anything.
In fact, telling the driver your space number may actually take more time. The first thing the drivers do is drop people like me off at the big digital display in the parking garage (so I can find my space number). Then they drive around everyone else to their car locations. I'm usually video recording the scratches on the car before the first person gets dropped off by the rental car shuttle. (If I care about speed, then I don't want to be personally dropped off at the car by the shuttle.)
Restaurants
A lot of crowded restaurants want your cellphone number so they can text you when your table is ready. Oddly, if I tell them that I don't have a cellphone, then they will give me a puzzled look (likely because they can see the phone in my shirt pocket) and then give me one of those light-up coasters. I find that those light-up vibrating coasters work fine. Plus, I don't have to think about how else the restaurant will use my phone number.
Coworkers
Occasionally I get coworkers who mention that they want to email me or text me when I'm traveling. But without a data plan and without text messaging, they have to either call me or wait until I can check my email on my laptop. I think this is more for their convenience than for mine.
I don't use the cellphone while driving. So if they did try to contact me while I'm on the road, I wouldn't notice until I arrived at my destination. On my last trip, I had a coworker tell me that, had I had text messaging, the message would be waiting on my phone when I got there. But this really doesn't speed anything up. If he calls me, my phone will tell me that he called me and I can call him back.
The same goes for airplane mode when I'm flying. When I'm not in front of a computer, I'm temporarily offline.
Driving
I have a lot of friends who use Google Maps to help them drive around town. Having that real-time traffic feature certainly is useful. But me? As a digital luddite, I travel with a portable GPS. I've found that a TomTom or Garmin can find the basic route much faster than a smartphone. One time in Vegas, I had a car full of geeks who were all trying to give me directions. By the time their cellphone data plans loaded Google maps, acquired GPS satellites, and found the directions, I was already pulling into the restaurant's driveway. A standalone GPS always loads faster than Google maps.
The only time Google Maps is more convenient is when there's a lot of unexpectedly bad traffic. As an alert driver, I can usually see the traffic slowing down. And it's easy enough for me to hit the "detour" button on the my GPS. Then again, with everyone else already using Google Maps, most other people have already diverted, clogging up the available detours.
All of this comes down to the basic conclusion: I really don't need a data plan or text messaging.
Did someone say cookies?
Most of my friends (and coworkers) think that I don't have a data plan or SMS on my phone because I'm too thrifty. And while I certainly am a miserly cheap-ass, that isn't my main reason. Rather, I'm more concerned about privacy.A few years ago, I revisited the super-cookie issue. Still today, many ISPs and cellphone providers insert unique tracking tokens into every web request. For example: today I found a half-dozen users who sent their cellphone numbers to my FotoForensics web site. They didn't do it intentionally. They probably didn't know that it even happened. Most of these came from HTTP "X-IMSI" and "Msisdn" headers that were inserted by their service providers. With my own phone, AT&T will happily insert my phone number into every web request. (It's not just AT&T. I saw Proximus NV in Belgium do it the exact same way last month. Do I really want to call someone in Belgium just because they visited my web site?)
These supercookies that identify cellphone numbers only appear when using the phone's data plan. They do not get inserted if you use a wifi access point. Personally, I have no problem with using my phone with a public wifi access point. (Then again, I'm not at Defcon. For folks going to Defcon: the wifi there is too hostile; don't use it with your cellphone.)
One of the things I've noticed with every smartphone I've tested is that they all reach out to various web sites when they connect online. Some sites are related to apps that are running. Some check the time or look for updates. And some are... well, I don't know. If I use a data plan, then my phone is sending its phone number to a bunch of sites that I don't recognize. What are these sites? Why does my cellphone or carrier think these sites need my phone number? Are these sites collecting phone numbers or ignoring the data?
Trust is earned
I view my smartphone as an untrusted associate. In general, I don't know what information it is giving out and I don't know who is receiving the data. For the network traffic I can monitor, I see way too much data that I can't block. As a result, I try to limit what I do with the phone.With a desktop computer, I have the option of configuring a firewall or using software to block or filter network traffic. I even figured out how to stop Windows 10 from reporting all of my computer usage to Microsoft. However, I don't have same level of control over my smartphone. There are apps that I cannot disable. There are permissions that I cannot revoke. And there are default options that are far from secure -- and with few mitigation options.
I see lots of people check their email or access banking information from their smartphones. They say that they trust their phones. But I have to ask: why? Why do they trust a device that doesn't disclose what it's doing? Why do they trust their carriers who are obviously tracking them? (E.g., Why are some carriers man-in-the-middling some SSL connections? I can tell this happens when the device's SSL fingerprint doesn't match the device.) Similarly, why are Verizon, AT&T Mobility, AT&T Worldnet, and Time Warner using transparent proxies for mobile devices? (The TCP signatures change when a proxy is used.) I see no reason to trust these devices and no reason to trust these carriers.
Some political associations are warning people away from certain smartphone brands. However, I think the warning is too weak: no smartphone is secure enough for handling personal or sensitive information. If you have to use email with your smartphone, then use a separate email address that is only used by the smartphone. If you have to use a social app, then create a special account specifically for that social app.
No Data
Since I don't have a data plan, I rarely use my cellphone to browse the web or do other things online. Still, I get at least one spam caller per day. In contrast, my friends who use their data plans often seem to receive many more spam calls. They need their data plan so they can use an app like "Mr. Number" to filter out spam calls.I have to think that there's a correlation here: the more you and your phone give out the phone number, the more spam calls you receive. By not having a data plan and not giving out the number, I limit the exposure and end up only receiving calls from spammers who war dial every phone number.
Okay, so this explains why I don't have a data plan. But why don't I have text messaging? My carrier tied texting to the data plan. I can't have texting without data. And since I don't have a data plan, I don't have texting.
Just Say No To Drugs
Saturday, 7 July 2018
Over a month ago, I wrote a blog entry about travel companies, bitcoin bankers, and others that use the public FotoForensics site for commercial purposes. Since this is against the terms of service, I've been showing them a notice and blocking their ability to upload content. Most of them have moved their commercial use off of the public site. Some have moved to the private FotoForensics Lab service. (I did have one person write in, asking how he could use the public service to evaluate pictures containing very personal information. When I explained that the public site was public and that other people could see his pictures, he responded with "oh.")
I really only have one group right now that is being a problem child. I had briefly mentioned them in an earlier blog entry. I call them the "Russian circle group" (RCG). But that's my name for them; I don't know what they call themselves. "Russian" because their text is in Russian and their GPS locations are either in Russia or one of the old Russian states (Belarus, Ukraine, etc.). "Circle" because they initially annotated their pictures with big red circles, and "group" because it's more than one person.
RCG uploads pictures of locations with annotations. Many of the pictures contain GPS coordinates. I used to think that it could be a geo-caching group, or maybe spycraft. But more recently they began uploading pictures of drugs. I believe they are a drug group and they are uploading pictures of their drop locations. (That's definitely a commercial use, and should not be on the public site.) Since they started in November 2017, they have uploaded over 400 pictures to the public service. But most of the pictures have been uploaded in the last 3 months; they are increasing their upload rate.
Some of their pictures include GPS locations, annotations, directions, or drugs. For example: (Click on each picture to view it at FotoForensics.)
My primary option is to detect and block. For the last 4 months, this group has been seeing a personal notice: I have been telling them to stop using my public site. Rather than stopping, they have spent time and effort to avoid the simple blocks that I've put in place. I could spend time developing more sophisticated methods to automatically detect and ban their usage. However, that's not worth my effort.
Instead... They know that it is a public site and they know that other people can see their pictures. So, I've just created a dedicated web page for their content and turned them into a training example. Anyone can visit and see RCG's most recent content. Some pictures contain instructions, some contain GPS, and some are just interesting. So far, it includes about 400 pictures -- I'm not including the ones that show people, receipts, and financial statements.
From the public site, to the public: have fun!
Be Smart, Don't Start
I have a couple of profiling techniques that can identify patterns associated with commercial use. So far, these patterns have been spot on. I don't know the false-negative rate, but there have been zero false positives.I really only have one group right now that is being a problem child. I had briefly mentioned them in an earlier blog entry. I call them the "Russian circle group" (RCG). But that's my name for them; I don't know what they call themselves. "Russian" because their text is in Russian and their GPS locations are either in Russia or one of the old Russian states (Belarus, Ukraine, etc.). "Circle" because they initially annotated their pictures with big red circles, and "group" because it's more than one person.
RCG uploads pictures of locations with annotations. Many of the pictures contain GPS coordinates. I used to think that it could be a geo-caching group, or maybe spycraft. But more recently they began uploading pictures of drugs. I believe they are a drug group and they are uploading pictures of their drop locations. (That's definitely a commercial use, and should not be on the public site.) Since they started in November 2017, they have uploaded over 400 pictures to the public service. But most of the pictures have been uploaded in the last 3 months; they are increasing their upload rate.
Some of their pictures include GPS locations, annotations, directions, or drugs. For example: (Click on each picture to view it at FotoForensics.)
This is your brain on drugs
What I know about RCG:- They have multiple members. It's not just one person, it's a group that all use the same basic approach. However, each member has their own slightly different behavioral patterns. There's at least a dozen people.
- They annotate pictures with arrows, dots, or circles to mark locations. However, they also appear to leave empty soda bottles or large rocks as physical markers.
- They practice good operational security (or "OPSEC" as they say in the government). When they see the ban notice that tells them to stop using the public site, they immediately stop for the rest of the day. During that time, they would switch IP addresses and sometimes even switch mobile devices.
- They learn (kind of). They used to upload a bunch of pictures and then get banned. But now that they know they will be banned, each member typically uploads 1-3 pictures before changing devices or IP addresses. They know it is a public site, they know they will be banned, so they jump before they are banned. For me, this becomes a game of whack-a-mole. But it also tells me that they don't care if other people see their content, and they don't care if they get banned.
- Financially, they have deep pockets. They use a variety of network addresses. None appear to be open proxies or part of any popular commercial VPN service. Instead, they are burning through IP addresses -- in a wide range of countries -- that have not been used by other people. That kind of usage costs money, and they don't mind spending it.
- They are a layered organization. One set of people take the photos. A different set annotate the pictures. (The same annotation method is not consistently used on the same kind of pictures.) These annotated pictures are usually uploaded to file sharing sites, like image.ibb.co, s7.uploads.ru, and imagizer.imageshack.us. (I say "usually", because sometimes they use file uploads.) A different person (at least 4 people) retrieves the pictures and upload them to FotoForensics.
- They may be using FotoForensics as a second blind drop location for anonymously transferring their pictures to other group members. However, most of their pictures have only been accessed by the person who has uploaded it. Since other people do not access the pictures, I doubt that this is the case.
- They could be using FotoForensics to double-check their pictures, making sure they are clean before continuing on. They may be checking for personal information (other than GPS coordinates).
- This could be a double-check to make sure the picture hasn't been altered. This would be something like checking that the picture is valid before providing payment to a middleman. There have been a few cases where the same picture (visually the same but different bits) has been uploaded multiple times. They may be looking for fraud. For example, are their own people trying to scam them? (If this is the case, then the answer is a resounding "yes". Especially from the person who uses the Xiaomi Redmi Note 4 device. He's sent some pictures to a person with an iPhone, and I doubt the iPhone user has noticed the content reuse.)
Become a positive example
As I understand it, there is not much I can do to stop this violation of my terms of service. For example, US law enforcement has no jurisdiction to investigate drug-related activities in foreign countries when it doesn't impact US citizens.My primary option is to detect and block. For the last 4 months, this group has been seeing a personal notice: I have been telling them to stop using my public site. Rather than stopping, they have spent time and effort to avoid the simple blocks that I've put in place. I could spend time developing more sophisticated methods to automatically detect and ban their usage. However, that's not worth my effort.
Instead... They know that it is a public site and they know that other people can see their pictures. So, I've just created a dedicated web page for their content and turned them into a training example. Anyone can visit and see RCG's most recent content. Some pictures contain instructions, some contain GPS, and some are just interesting. So far, it includes about 400 pictures -- I'm not including the ones that show people, receipts, and financial statements.
From the public site, to the public: have fun!
Tor and Tracking
Friday, 29 June 2018
While I like Tor and use Tor, I do not like how the Tor Project grossly misrepresents their products. Every time they falsely promote their capabilities, it reminds me of snake-oil peddlers. I feel a strong urge to set the record straight and expose the lies.
Honestly, there is no reason for the Tor Project to falsely represent what their products provide. Tor does a few things really well. It provides network route anonymity without the need for trust, and full end-to-end encryption between Tor nodes and hidden services. In contrast, the Tor Browser does a good job of making all browsers have similar signatures. The Tor Browser is not perfect, but it's good enough for most tasks.
Unfortunately, nine days ago, the Tor Project sent out one of their snake-oil tweets:

In this tweet, the Tor Project implies that the Tor Browser is not vulnerable to browser fingerprinting. Moreover, they make the bold claim that they protect against tracking and surveillance.
While the Tor Browser is better than most other browsers, it isn't perfect. If you know what to look for, you can determine much more than just "the user is on the Tor Browser". But the thing that really sets me off is that the Tor Project knew that it was a lie when they wrote their tweet.
For example, there is an old bug for browser fingerprinting based on fonts. The approach is pretty simple:
Then you do it with a set of known fonts. If the font exists, then the dimensions will change. (It is unlikely for two different fonts to have the exact same spacing.) If the dimensions match the default font, then you know that the font does not exist.
With normal browsers, an attacker can iterate through a list of a thousand common fonts. If you've ever added fonts to your computer, then your collection is distinct and potentially unique. (And if you've never added fonts, then you're still distinct and probably in the minority.)
To mitigate this attack, the Tor Browser limits fonts to a small list. However, it includes different fonts based on the platform!
And keep in mind, this exploit isn't new. It's been a documented, open issue for over two years (correction: three years). Of course, this problem will get worse if the Tor Browser begins adapting fonts based on locale or when Firefox changes fonts. If your Tor Browser doesn't have the exact same signature as everyone else, then I can track you.
Here's some code that actually performs this font test:
But if you don't think identifying your platform is enough detail, then let's try a different exploit. This one distinctly identifies attributes about your computer via the Tor Browser. This web page just ran the test on your browser. Here's the results:
(This entire test is written in JavaScript and does not transmit the results to anyone. I don't even see your results.)
What you should see from this test depends on your browser:
The best part is: you will see the same icons with both the Tor Browser and (pre-59) Firefox browser on the same computer. This is because they represent the default icons on the computer, and are not limited to the web browser. As an attacker, I could map a Tor Browser to a Firefox browser. And since Firefox doesn't typically use Tor, it means that I can breach your anonymity.
Of course, one of my critics pointed out that getting the icons from your display to my server requires extracting the icon from the image tag. That's something that the Tor Browser forbids. So really, this is a good exploit in theory, but it doesn't work in practice. (In response to him, I just smiled broadly and said, "I'm not disclosing all of my exploits today." That's not the only way to use an image.)
I had disclosed this moz-icon vulnerability to the Tor Project via their HackerOne bug bounty program 7 months ago. I included working code. The response was that "Yes, moz-icon:// is problematic. This has been known for years" and "It's on our radar but we did not have the time yet to deal with it." So again, the Tor Project knows that they have a vulnerability that can be used for tracking. Yet, they proudly claim on Twitter that they "protect against tracking and surveillance on the web". This is a known-false claim. (Seriously: I'm a good guy. And if I know these exploits, then you know that the bad guys also know them.)
Two days ago, the Tor Project announced that they have a new, alpha version of the Tor Browser. As alpha code, it isn't stable yet and you shouldn't trust it for protecting your privacy. (They are looking for volunteers to test it and work out the bugs.) This alpha release is supposed to be based on Firefox 60, so it might contain this moz-icon fix. (I haven't tested it because I don't have time to test alpha code right now.) I hope it does fix this issue. And when they finally release it as production code, I hope everyone upgrades.
In their efforts to censor the Internet, Venezuela began blocking access to the Tor network. Personally, I think Tor is a great use for free speech. It is a valuable tool for avoiding censorship. And I'm glad that the Tor Project is trying to help break the censorship. (The current solution is to use a meek bridge. Don't use the default servers -- those are blocked. Instead, get bridges a different way.)
My issue isn't with the censorship between the Tor Project and Venezuela. Rather, my issue is with the metrics. How do they know how many people use Tor and where they are located? The answer is: some Tor nodes are run by researchers. They collect metrics about who connects to them and report it to the Tor Project. This is explicitly tracking.
As the Tor Project documented in their FAQ:
In other words, some Tor nodes collect metrics about Tor clients -- including where they come from. The Tor Project receives aggregate data. So while the Tor Project doesn't receive your IP address, they have requested IP address tracking at some nodes. These nodes convert addresses to countries and then upload the country information to the Tor Project. In order to do this, the nodes must collect information about you -- that's tracking. Moreover, we know that they are collecting information about your country of origin. However, we don't know what else they might be collecting or who is doing the collection or how long they retain the data. (I suspect that most of these Tor nodes don't keep much data. But a few of the research groups could surprise you about what they collect. I still wonder about the nodes operated by Team Cymru, Kaspersky, and other "researchers".)
I'm all for collecting metrics about Tor's use. I'm even thrilled that they can identify regional outages in order to report on government censorship. This collected information can help direct additional efforts to breach any censorship. But when the Tor Project claims that they don't collect data about users when they are absolutely directing the tracking and collection of user data? That's misleading at best.
Honestly, there is no reason for the Tor Project to falsely represent what their products provide. Tor does a few things really well. It provides network route anonymity without the need for trust, and full end-to-end encryption between Tor nodes and hidden services. In contrast, the Tor Browser does a good job of making all browsers have similar signatures. The Tor Browser is not perfect, but it's good enough for most tasks.
Unfortunately, nine days ago, the Tor Project sent out one of their snake-oil tweets:
In this tweet, the Tor Project implies that the Tor Browser is not vulnerable to browser fingerprinting. Moreover, they make the bold claim that they protect against tracking and surveillance.
While the Tor Browser is better than most other browsers, it isn't perfect. If you know what to look for, you can determine much more than just "the user is on the Tor Browser". But the thing that really sets me off is that the Tor Project knew that it was a lie when they wrote their tweet.
For example, there is an old bug for browser fingerprinting based on fonts. The approach is pretty simple:
- Select a font and write some long text in an HTML "span" or "div" element. The element doesn't even need to be visible for this test to work.
- Measure the width and height of the span element.
Then you do it with a set of known fonts. If the font exists, then the dimensions will change. (It is unlikely for two different fonts to have the exact same spacing.) If the dimensions match the default font, then you know that the font does not exist.
With normal browsers, an attacker can iterate through a list of a thousand common fonts. If you've ever added fonts to your computer, then your collection is distinct and potentially unique. (And if you've never added fonts, then you're still distinct and probably in the minority.)
To mitigate this attack, the Tor Browser limits fonts to a small list. However, it includes different fonts based on the platform!
- Tor for Linux includes "Cousine", "Noto Emoji", and "Tinos".
- Tor for Windows includes "Noto Sans Buginese", "Noto Sans Khmer", "Noto Sans Lao", "Noto Sans Myanmar", and "Noto Sans Yi".
- Tor for Mac includes "Noto Sans Khmer", "Noto Sans Lao", "Noto Sans Myanmar", "Noto Sans Yi", and "STIX Math".
And keep in mind, this exploit isn't new. It's been a documented, open issue for over two years (correction: three years). Of course, this problem will get worse if the Tor Browser begins adapting fonts based on locale or when Firefox changes fonts. If your Tor Browser doesn't have the exact same signature as everyone else, then I can track you.
Here's some code that actually performs this font test:
And here's the actual test. NOTE: It only works for the Tor Browser.<blockquote id='FontTestResult'></blockquote>
<span id='TestDiv' style='visibility:hidden'></span>
<script language='javascript'>
var DivObj = document.createElement("div");
DivObj.style.visibility = "hidden";
DivObj.style.fontSize = "75pt";
document.body.appendChild(DivObj);
function CheckFont(name)
{
DivObj.style.fontFamily = '"' + name + '"';
DivObj.innerHTML = "mmmmmmmmwwwww";
DivObj.style.display = "inline";
var S = '';
S += DivObj.offsetWidth + "x" + DivObj.offsetHeight;
DivObj.innerHTML = "abcdefghijklnopqrstuvwxyz";
S += " " + DivObj.offsetWidth + "x" + DivObj.offsetHeight;
DivObj.innerHTML = "Quick brown dog";
S += " " + DivObj.offsetWidth + "x" + DivObj.offsetHeight;
return(S);
}
var Result=document.getElementById('FontTestResult');
Result.innerHTML="None of the fonts detected; not Tor Browser on Windows, Linux, or Mac.";
CheckFont("doesnotexist");
var FontDefault=CheckFont("DefaultDoesNotExist");
var t;
t=CheckFont("Noto Sans Buginese");
if (t!=FontDefault) { Result.innerHTML='Tor on Windows: font "Noto Sans Buginese" found'; }
t=CheckFont("STIX Math");
if (t!=FontDefault) { Result.innerHTML='Tor on Mac: font "STIX Math" found'; }
t=CheckFont("Cousine");
if (t!=FontDefault) { Result.innerHTML='Tor on Linux: font "Cousine" found'; }
</script>
Everyone's a critic
I fully expect some people in the Tor community to point out that this exploit only identifies the Tor Browser's platform. However, there are not many Tor Browser users out there. (Most people do not use Tor.) Having a Tor Browser visit a site is rare enough. But being able to subdivide the Tor Browser community into three groups make it even easier to track.But if you don't think identifying your platform is enough detail, then let's try a different exploit. This one distinctly identifies attributes about your computer via the Tor Browser. This web page just ran the test on your browser. Here's the results:
(This entire test is written in JavaScript and does not transmit the results to anyone. I don't even see your results.)
What you should see from this test depends on your browser:
- If you're running any Chrome-based browser (Chrome, Chromium, Opera, etc.) or any other non-Firefox browser (e.g., IE or Edge), then you should see a list of broken icons. That's because this test uses a browser attribute that in only supported by Firefox.
- If you are running the latest-greatest Firefox, then you should see text and no images. That's because Firefox 59 (March 2018) disabled the moz-icon attribute. They realized it was a privacy and tracking issue. (See CVE-2018-5140.)
- If you're running an older version of Firefox (anything older than Firefox 59) -- including any stable version of the Tor Browser -- then you should see icons above each of the text. For example, my browser shows:
The best part is: you will see the same icons with both the Tor Browser and (pre-59) Firefox browser on the same computer. This is because they represent the default icons on the computer, and are not limited to the web browser. As an attacker, I could map a Tor Browser to a Firefox browser. And since Firefox doesn't typically use Tor, it means that I can breach your anonymity.
Of course, one of my critics pointed out that getting the icons from your display to my server requires extracting the icon from the image tag. That's something that the Tor Browser forbids. So really, this is a good exploit in theory, but it doesn't work in practice. (In response to him, I just smiled broadly and said, "I'm not disclosing all of my exploits today." That's not the only way to use an image.)
I had disclosed this moz-icon vulnerability to the Tor Project via their HackerOne bug bounty program 7 months ago. I included working code. The response was that "Yes, moz-icon:// is problematic. This has been known for years" and "It's on our radar but we did not have the time yet to deal with it." So again, the Tor Project knows that they have a vulnerability that can be used for tracking. Yet, they proudly claim on Twitter that they "protect against tracking and surveillance on the web". This is a known-false claim. (Seriously: I'm a good guy. And if I know these exploits, then you know that the bad guys also know them.)
Two days ago, the Tor Project announced that they have a new, alpha version of the Tor Browser. As alpha code, it isn't stable yet and you shouldn't trust it for protecting your privacy. (They are looking for volunteers to test it and work out the bugs.) This alpha release is supposed to be based on Firefox 60, so it might contain this moz-icon fix. (I haven't tested it because I don't have time to test alpha code right now.) I hope it does fix this issue. And when they finally release it as production code, I hope everyone upgrades.
"Protects against tracking"
Another tweet from the Tor Project that really set me off concerns Venezuela:In their efforts to censor the Internet, Venezuela began blocking access to the Tor network. Personally, I think Tor is a great use for free speech. It is a valuable tool for avoiding censorship. And I'm glad that the Tor Project is trying to help break the censorship. (The current solution is to use a meek bridge. Don't use the default servers -- those are blocked. Instead, get bridges a different way.)
My issue isn't with the censorship between the Tor Project and Venezuela. Rather, my issue is with the metrics. How do they know how many people use Tor and where they are located? The answer is: some Tor nodes are run by researchers. They collect metrics about who connects to them and report it to the Tor Project. This is explicitly tracking.
As the Tor Project documented in their FAQ:
Q: How do you know which countries users come from?
A: The directories resolve IP addresses to country codes and report these numbers in aggregate form. This is one of the reasons why tor ships with a GeoIP database.
In other words, some Tor nodes collect metrics about Tor clients -- including where they come from. The Tor Project receives aggregate data. So while the Tor Project doesn't receive your IP address, they have requested IP address tracking at some nodes. These nodes convert addresses to countries and then upload the country information to the Tor Project. In order to do this, the nodes must collect information about you -- that's tracking. Moreover, we know that they are collecting information about your country of origin. However, we don't know what else they might be collecting or who is doing the collection or how long they retain the data. (I suspect that most of these Tor nodes don't keep much data. But a few of the research groups could surprise you about what they collect. I still wonder about the nodes operated by Team Cymru, Kaspersky, and other "researchers".)
I'm all for collecting metrics about Tor's use. I'm even thrilled that they can identify regional outages in order to report on government censorship. This collected information can help direct additional efforts to breach any censorship. But when the Tor Project claims that they don't collect data about users when they are absolutely directing the tracking and collection of user data? That's misleading at best.
Read more about Forensics, Network, Privacy, Programming, Security, Tor
| Comments (7)
| Direct Link
(Page 1 of 161, totaling 804 entries)
next page »

