|
The Hacker Factor BlogTools, Techniques, and Tangents |
Home Blog |
Caller IDThursday, August 19. 2010
Over the last week, a bunch of friends have forwarded to me stories about the risks of GPS information embedded in pictures. For example, MythBuster Adam Savage apparently took a picture of his car at his home and forgot to disable the GPS information. Rabid fans quickly identified where Adam lived. Granted, I doubt most celebrities have secret homes, but the fact is: pictures tell much more about you than just the photo's content.
The GPS data in JPEGs is nothing new. It was part of JPEG's EXIF 2.1 Standard back in 1998. (And that may not be the earliest version...) However, it wasn't until the last few years that cameras, cell phones, and other portable devices began to incorporate GPS technologies. Today, it is hard to find a cell phone without a camera, and many of them include GPS as a feature. While GPS information embedded in a picture may tell people where you were, Facebook has decided to use your GPS for telling people where you are. Called Facebook Places, they will broadcast your GPS location to all of your Facebook friends. While they do have options for limiting distribution, Facebook is well-known for abruptly changing policies. iPhone, iPad, iTouch, iMac, iSpyToday's ever-smarter portable devices are not designed for privacy-oriented people. While the embedding and publishing of GPS information may be an overt example, there are many other cases of your device leaking information about you. I've been collecting photos from various hand-held devices. I use them to populate a photo ballistics database. My friend, Bum, recently purchased an iPad. He sent me a screenshot from the device. (His iPad doesn't have a camera.) While the picture's ballistics wasn't very interesting, the email header was! From: Bum <b...@...com> The first thing to notice is the X-Mailer header. It identifies the device (iPad), application (Mail), and version (7B405). This isn't too exciting since most MUAs (mail user agents) include this type of information. However, it was the content boundary that got my attention: Apple-Mail-1-186804698. I dug through my email archives and found a bunch of other examples: Apple-Mail-11-1034880980 With a little help from the DC3, I finally understand what these non-random numbers describe. The big number is actually the most uninteresting value. It is the time in milliseconds stored in a signed 32-bit register. (Negative numbers have the double hyphens.) Since it is a 32-bit register, the value rolls over about every 24.86 days. However, the zero date isn't the Unix epoch (00:00:00 on 1970-01-01). Instead, if you assume the timestamp represents today's date (from the email Date header) and repeatedly subtract 231 microseconds until you reach the Unix epoch, then you'll notice that it is off... The value closest to the epoch (without going under) is 128397792ms, or Jan 2 11:39:57 1970. (You might see it vary by a second, 11:39:58, if the clock happened to roll over between generating the Date and content boundary.) I'm not sure why Apple chose this date, but it is consistent. The Mail program on the iPhone, iPad, iTouch, and Mac OS X all use the same date. From a forensics viewpoint, this is useful. This is a quick way to identify forged emails that claim to be from Macs. (I actually had a use for this last week!) The more interesting number is the smaller value. It took me a while to identify the purpose. That is the number of attachments sent by the mailer (Apple Mail) since the program was started. If you see "-1-" then it means that you received the first attachment that they sent since they started the program. The "-15-" means that person had started Apple Mail and sent 14 attachments before sending one to me. (Winn Schwartau sent me an email that had "-245-"!) This is very useful, particularly if you receive multiple emails from the person over a short duration. For example, Bum always sends me with "-1-". This means he closes the Mail program frequently. (Make sense for an iPad that can't multitask.) I also received emails from a friend, M., who clearly loves attachments -- in 30 minutes he went from "--12--" to "--28--". From a forensics viewpoint, this is awesome. Let's say the person has a couple of different Apple computers. I should be about to look over his computer and see how many attachments he sent on each system and match the count to the emails. Even if you delete a specific email, I can still determine how many attachments were included in the deletion. Android SpiesThe information leakage is not limited to Apple products. At Defcon, my friend Factor sent me a sample picture from his Android phone. The problem is, it crashed my analysis tool! ![]() The problem was a poorly formed JPEG. Specifically, every JPEG should begin with 0xffd8, contain a stream that starts with 0xffda, and end with 0xffd9. Between the 0xffd8 and 0xffda are various other settings, including APP records (0xffe0 to 0xffef for APP0 to APP15). In his case, his Android was storing additional APP records after the end of stream (0xffd9). I added a check for this situation (so my code no longer crashes). However, these APP5 records (0xffe5) turned out to be really interesting. They only appear in one type of Android phone: the Motorola Android. I have observed these fields from photos taken with:
They probably appear in other phones as well. However, I have not seen them with any other type of Android phone. These extra APP fields like: tag='0xffe5' length='32' field='APP5' value='HPQ-MetaData' That's right, every picture has over 95K of additional APP5 data after the picture! That is as much as 8% of the file size! So far, I can only decode one of the fields: HPQ-Capture. This has 3-5 records (depending on the version) and the records identify your phone. Here's an example from a decoded block from a Motorola, Droid, 2.2: field='Build Version' value='4719:5353' The kernel information is the same as running "uname -r" and "uname -v" from a command prompt. The Build Version looks like a SVN string, but it could be some other source code revision system. I sent an email to "kraigp" asking for more information about these undocumented fields, but got a bounced email: This is an automatically generated Delivery Status Notification. Different Android versions include different information. For example, the Motorola DROIDX 2.1-update1 says: field='Build Version' value='5476' All of these HPQ fields appear to be part of the HPAndroidHAL driver. Since only Motorola seems to use this driver, only Motorola photos get tagged. (If I'm wrong here, I hope someone will tell me. I'll be sure to make corrections.) It kind of makes sense that Hewlett-Packard would embed their stock symbol (HPQ) in the APP field... Most of the HPQ records have fixed lengths. Some values don't change regardless of camera version. Some change between versions but not between cameras, some change with each photo (e.g., White balance and focus), and some seem to change between specific cameras. It is these last fields that seem interesting. Not only can I tell what camera took the picture, but I can tell you if two photos were taken by the exact same camera. Unfortunately, I don't know the meaning of these fields since the "changes between cameras" could be coincidental based on my minimal sample size. The only variable-sized field seems to be the HPQ-LRGEBUFF record. It looks like some kind of fractional memory dump. (I really suspect debugging code that was not disabled before release.) If you have an Android phone and want to know if your pictures have the HPQ tags, then try this:
In any case, until we learn what "HPQ" is embedding in each photo taken by a Motorola Android, I'm going to stay on the paranoid side. If you happen to know how to decode the other fields, please let me know! The End?Smarter devices do not mean smarter users or smarter programmers. Unless you know how to disable every undesirable feature (and remember to disable it), you are probably going to leak information. While online anonymity isn't dead, it is getting harder and harder to protect our privacy.
Posted by Dr. Neal Krawetz
in Forensics, Image Analysis, Privacy, Programming, Security
at
22:25
| Comments (0)
| Permlink
Deja VuTuesday, July 13. 2010
You know that feeling you get when someone gives you advice that you don't care about at the time but turns out to be prophetic? I just had that experience...
BoxesEven though my background includes a significant amount of experience with artificial intelligence algorithms, I rarely use AI systems in my day-to-day work. The reason has to do with repeatability and provability. The various types of neural networks are relatively easy to construct and train, but act as black-box systems. You know the input, you see the output, but you don't know how the system generated the output from the input. Moreover, if you train a neural network with different initial weights or a different order through the training set, then it will result in a different learned configuration. While black-box AI systems may generate accurate results, the training process is NP-complete -- you don't know ahead of time how much training it will take or whether it can actually learn. Moreover, these systems can be very good at memorizing training sets. Don't over-train your black box unless you want it to memorize the training set and completely screw up on the testing set. In contrast to neural networks, fuzzy logic and genetic algorithms are gray box systems. You kinda know how they work. Given the input, it generates output and you can see how it came up with the output decision. However, barring very simple fuzzy logic systems, you cannot really tell what the output will be until you run the input though the system. You can see how it made the decision, but not before running it. Finally, there are white-box AI systems like Bayesian networks. You know the input, the output, and how it will make the decision. The only real problem here is configuring the system. Since you need to know the probabilities, you really only have two choices. You could compute the probabilities before hand, but this requires you to have enough data to statistically compute the probabilities and be able to characterize the various statistical factors. The other choice is to use a gray-box or black-box system to learn the probabilities, in which case the probabilities may not be provable or optimal. Dusting OffI recently had a need for "a solution", where "provable" and "deterministic" are not requirements. This is a perfect situation for using AI. I wrote my own AI library many years ago. Basically, I didn't like any of the existing systems (not flexible enough for my own needs) and it was easier to build my own than adapt around existing systems. However, it has been years since I used it and I only vaguely remember the configuration options. A couple of things really surprised me. First, my AI library was written in 1990 and last maintained in 1996. (Last bug fix was in 1994.) I didn't even know if it would compile with the latest GCC. My first surprise was that it compiled cleanly with "gcc -Wall". It even passed its benchmark and regression tests. As I gawked at the output, I thought, "This is great! I wish I remembered how it worked!" Then I looked at the source code... There are huge paragraphs that describe how every function works and how to use it. Completely documented. Even the variables have reasonable names: no "int i,j" or "float q[12]" or "double phi,theta". Instead the variables have names like 'CutoffThreshold' and 'float *weights; /* network weight matrix */'. The comments even cite books and pages as references. Way Back When...I had a professor back in college who drilled "style" into all of us. He had three basic rules that, if broken, would result in a zero on your homework.
We obeyed because we wanted to pass the class. However, the lesson was never lost on me. I still "over-comment" my code. I looked up my notes and found a great quote from the professor (from notes I took in 1988): "Always comment your code because you never know when you will refer to something you wrote 20 years earlier." Wow -- he even nailed the duration. After The FactSaturday, July 10. 2010
Over the last few months I have had friends and associates contact me about hacked web sites. In each case, someone (or something) planted hostile URLs on their web pages. These URLs would redirect visitors to porn sites or serve up viruses. Worse: these URLs would be embedded everywhere -- in HTML, in PHP, and in back-end databases.
The question they always ask me: What should I do? It is easy to tell people that they should have a disaster recovery plan in place. However, few people have one. Other pre-attack advice, like hardening servers, changing defaults, and installing filters is great advice, but is usually ignored. In my experience, the sites that have taken simple steps and have plans in place are not the ones usually compromised. The common compromises are directed at non-technical users who installed default software and ignored even basic maintenance. Post-CompromiseSo let's say you have a default WordPress or Wiki or Blogger installation. It isn't a question on whether your site will be compromised or infected. The only question is when. And like most people, you haven't maintained your software (applying patches, upgrading as needed), don't have backups (your ISP does that, uh, right?), and haven't removed default files or hardened the system. What should you do after a compromise? There are plenty of good checklists out there. Some examples include:
While each of these sites gives good advice, there is no single consensus regarding appropriate steps. My own checklist is a little more detailed and extreme. Neal's Post-Compromise ChecklistNobody wants to have their site compromised. However, like auto accidents, bad things happen. If you were not paying attention (like texting while driving or not applying system patches) then bad things are more likely to happen to you. Here are the steps that I usually recommend to people with compromised web sites:
Having your site compromised isn't fun, but it isn't the end of the world either. Stay calm and address the problem. Treat it as you would any other learning experience.
Posted by Dr. Neal Krawetz
in Network, Privacy, Programming, Security
at
20:06
| Comments (0)
| Permlink
Great Firefox PluginsTuesday, June 15. 2010
Last week was entertaining. I had the opportunity to assist in an interesting project -- part development, part forensics, and part penetration testing. Fortunately for me, I had a couple of Firefox plugins that really made the work easier. All of these plugins can be found by using the Tools -> Add-Ons menu under the Firefox web browser, or by going to https://addons.mozilla.org/en-US/firefox/.
NoScriptThe NoScript plugin is an absolute must-have. As far as I am concerned, it should be part of the default Firefox installation. This plugin stops all JavaScript, Flash, and other objects from automatically starting. You can also block access to some web servers, or if you really like a site, then you can add it to a white-list of permitted, trusted sites. If there happens to be something you want to run, you can permit it on a case-by-case basis. From a user's viewpoint, this is awesome. You don't have to worry about an unknown site sending malware to your browser. In my case, I didn't want to download videos, Java, and other stuff that would waste my CPU cycles and bandwidth. HttpfoxWhen evaluating any kind of web-based service, either as a developer or as an auditor, you need to know what is being transmitted across the network. Usually I use Wireshark or Snort. The problem is, these only work well if you use HTTP and not HTTPS. With HTTPS, you cannot see the traffic inside the tunnel (without compromising the tunnel). Fortunately, I had Httpfox. This plugin is like having Wireshark in the browser! It shows you all data that the browser sends and receives -- the URLs, request and response headers, cookies, post data, and query parameters. This plugin is great for auditing, but does have a few minor limitations. Specifically, if any of the values are longer than the visible fields, you don't get scroll bars. You can work around this by copying values to the clipboard, but that isn't an ideal solution. FirebugWhile Httpfox shows the network traffic, Firebug shows the HTML content. And this isn't just the HTML that was sent to your browser... it is the HTML that is displayed. If the web page includes JavaScript or active CSS content that alters the web page, then Firebug will show you the rendered values. Besides viewing the page, you can also edit the currently-displayed web page. If you are testing parameters, playing with web forms, or trying out different style sheet settings, then this is a must-have. Finally, you can click on the little arrow icon and it enables an inspector. As you hover the mouse over various elements on the web page, Firebug displays the active HTML elements (both HTML code and style sheet values). As a web developer, you've probably had times where you wondered "Where do I define that border?" Well, the inspector quickly answers this. Add N Edit CookiesThis plugin is an oldie but goodie. Httpfox shows you queries, but does not allow you to edit. Firebug allows you to change the active HTML, so you can edit query parameters and URLs, but you cannot alter cookies. The "Add N Edit Cookies" plugin completes the set by allowing you to view and edit cookie values. (There are two versions of it. One is for older browsers and the other is for newer browsers.) There are a couple of other plugins for editing cookies. However, I like this one because it is simple to use. All TogetherWith these four plugins, we were able to easily access our web services, debug the network traffic, view and test dynamic web content, and even validate cookie settings. With NoScript, we were able to restrict the content that the server sent to the browser and control exactly when different calls were made. In the old days, we would need to hack the SSL tunnel and use custom scripts to manage queries. Today, we can evaluate and modify the system in real-time and with just a few plugins.
Posted by Dr. Neal Krawetz
in Forensics, Network, Programming, Security
at
17:30
| Comments (3)
| Permlink
Use Your WordsTuesday, May 11. 2010
No operating system is perfect. In my latest book, Ubuntu: Powerful Hacks and Customizations, I provide tips and tricks to enhance an already awesome operating system. (However not all hacks, like this one, made it into the book.) Some of these modifications add functionality to make the system better, while others are workarounds for common problems.
Watch Your LanguageOne such common problem is the built-in spell checker. With Dapper Drake (6.06 LTS) and Hardy Heron (8.04 LTS), the spell checker uses the default system language. If you installed using Australian English then it will use Australian English. However, this isn't the case with later Ubuntu releases. If you are in the United States and use gedit, then you've probably noticed that the spell checker uses British English instead of American English. (For example, it thinks "color" is spelled "colour".) While you can change this using the Set Language option under the gedit Tools menu, changes are not retained when you open a new document. This problem is actually much larger. Debian users have had this issue since at least 2004, but Ubuntu didn't incorporate this problem until 2008. Intrepid Ibex (8.10), Jaunty Jackalope (9.04), Karmic Koala (9.10), and the newest release -- Lucid Lynx (10.04) -- all have this spell checker problem. He Who Spelt It...The core problem actually isn't gedit. This simple notepad uses an open-source package called "enchant" for dictionary support. Enchant is a wrapper around a variety of back-end spell checker systems. The search list is found in /usr/share/enchant/enchant.ordering. The contents should look something like:*:myspell,aspell,ispell This says that the Finnish language (fi) should use voikko as the primary spell checker first, then ispell, myspell, and aspell. If no specific language definition line is available, then it will default to the "*" entry: myspell then aspell then ispell. And that's the problem. Myspell does not use the system-wide dictionary (see the man page section "Directories Important To Enchant"), so it ignores the default system language. In contrast, aspell does used the system-wide setting and it is installed by default on Ubuntu. Changing the enchant default to use aspell before myspell fixes the problem: *:aspell,myspell,ispell Alternately, you can leave the default and specify an alternate order for English: *:myspell,aspell,ispell Word UpFinally, make sure that the language dialect is enabled on the system. For American English (en_US), use: sudo locale-gen en_US.UTF-8 You might need to logout and login in order to refresh the language settings in your various running shells and applications. The locale-gen program can take a specific dialect (en_US or en_US.UTF8), base language (en for generating all English dialects), or no options to use the default base language. The set of installed languages is found under /usr/share/locale/. This trick also works for other languages and dialects, including Canadian English (en_CA), Hong Kong English (en_HK), Korean (ko_KR.eucKR), Lithuanian (lt_LT.ISO-8859-13), and any other language on the system.
(Page 1 of 12, totaling 56 entries)
» next page
|
SearchCalendar
ArchivesCategoriesPopular PostsLinksSecurity
Internet Storm Center Security Focus CyberSpeak Happy as a Monkey Cybercrime Images Photoshop Disasters Food In Real Life Worth1000 CG Society Awkward Family Photos Media Stinky Journalism Unnecessary "Quotes" Oh No They Didn't Obama Conspiracies Barackryphal Blogs Fergie's Tech Blog Xenon's Isotopia James Carrion Mark Shuttleworth |
|||||||||||||||||||||||||||||||||||||||||||||||||
