How To Win At Java Code Audit

Reviewing Java source code can pose a challenge for a security auditor, as methods used to exploit programs in C or C++, namely memory corruption bugs, are mitigated by Java itself, which hides the details of memory management from the programmer.  This same tendency to hide implementation details with a layer of abstraction leads to an entire class of common Java programming errors which can have a critical impact on the security of the application.

Java vulnerabilities are most commonly found in places where unsanitized user input is passed, directly or indirectly, on to an underlying library or service.  To put it another way, vulnerabilities aren’t found in the Java code itself, they are found by following user input through the Java source and out the other side.

The tendency of Java to hide implementation details from the developer actually creates these vulnerabilities in places where it might not otherwise exist.  Java developers use wrapper libraries for backend services, such as SQL or LDAP, and assume that they automatically sanitize their inputs, when usually they do not.  In most cases, Java wrapper libraries themselves are simply classes that store and manipulate strings which are just passed directly on to the wrapped service.  In many of these implementations, such as the ORM library Hibernate, there are architectural reasons why this behavior can not be changed.

In this post, I will describe a class of extremely common Java vulnerabilities, specifically these “pass-through” bugs, characterized by user input passing directly through Java unexamined.

Continue reading “How To Win At Java Code Audit”

Dehydra-GCC: Static Analysis for Poor People

Over the past few months, I’ve been playing with a new static analysis tool from Mozilla called Dehydra.

Dehydra is a GCC plugin that allows you to write Javascript that can perform queries on the Abstract Syntax Tree (AST) that GCC generates from source files.  This lets you write a script that can notify you when it sees any type of code construct that you can describe in script.

There are a number of code constructs that might be interesting to a code auditor, for example:

  • Calls to asnprintf, malloc, or calloc with unchecked return values.
  • Assignment operations where the datatype of the Left Hand Side is signed and the Right Hand Side is unsigned, or vice versa.
  • Assignment operations where the datatypes of both sides have different bit-lengths.

The possibilities are much greater than my short list of examples!

I will be the first to admit that static analysis has its faults.  For one thing, it has been proven that static analysis cannot discover all possible bugs in any given program.  Commercial static analysis tools, such as Coverity, are expensive and have not proven to be a particularly effective method of finding bugs by themselves.  I have heard many accounts of nasty bugs discovered by code auditors when looking through source code routinely scanned by Coverity.

That said, on Day One of a code audit, 4 out of 5 code auditors find themselves reaching for Grep.

Grep is great, it lets you search for regular expressions across many files very quickly, but Grep has no awareness of the syntax of the C++ programming language.  I’m really more interested in searching for specific code constructs and less interested in searching for substrings, which is Grep’s purpose.

When looking for vulnerabilities, I’m not interested in searching for the string “malloc”.  What I really want to know is more along the lines of “Where are all the calls to malloc where the return value is not checked”.  I don’t want to know all the locations of the string “int” as much as I want to know every location that a variable of type int is implicitly cast to an unsigned int when passed in as a function argument.

This is the great thing about Dehydra.  It lets you query the parsed syntax tree of C++ source code and ask the kinds of questions that can’t be easily answered by Grep.

Scripts for Dehydra are written in Javascript by way of the SpiderMonkey engine.  Javascript is a nice, small language that is good for operations on tree-like data structures.  In a browser, this would mean the DOM, but in GCC this means the AST!

Dehydra is still in development, but the developers have been extremely responsive to feature requests from security auditors ( well, mine anyway… *grin* ).

It would be great to see a bunch of people contribute scripts and build a big set of security scanning scripts to replace the venerable regular-expression-based FlawFinder as the king of no-budget security-oriented static analysis.

Try it out and get back to me.

Setup and Installation Instructions for Dehydra on Linux or OSX

I’ve included a sample Dehydra script below that logs a message anytime it sees certain assignment operations.

The full sample script, along with a test file, is available here.

function assignVisitor(node) {
   for(var i in node.statements) {
      var loc = node.loc
      var lhs = node.statements[i].type
      var rhs = node.statements[i].assign

      if( rhs && lhs ) {
         if( lhs.unsigned ) {
            if(parseInt(rhs[0].value) > 0) {
               print( "ASSIGN: negative to unsigned at:"+loc+"\n" )
            else if(rhs[0].type && !rhs[0].type.unsigned) {
               print( "ASSIGN: signed to unsigned at:"+loc+"\n" )
         else if(rhs[0].type) {// lhs is signed
            if( rhs[0].type.unsigned ) {
               print( "ASSIGN: unsigned to signed at:"+loc+"\n" );

function process_function(decl,body) {
   iter(assignVisitor, body)

Groo: Fully Automated WEP Cracking

Updates Below!

I don’t know about the rest of you, but I have an entire room of my house which is simply a huge pile of electronics scrap.  A hacked Tivo, some chipped XBoxes, an old VCR, a pile of PCI video cards, a full shoebox of 64MB Compact Flash cards…  You get the idea.

One day, I decided to put some of this junk to good use and I wandered into the scrap heap looking for inspiration.

Inspiration came in the form of an Atheros wireless card and an old ITX barebones system that had been picked up from a junk table at DefCon for $60 the year before.  The ITX box has a single PCI slot, perfect for a decent Atheros wireless card with an external SMC antenna connector.  It also runs on 12V DC power, so I can run it off the car battery.

Over the next few weeks, I built a small embedded Linux system for the sole purpose of cracking WEP keys.

First, I added a USB wireless network card to use as a control interface that I could access from my iPhone.

I also built a small web service that completely automates the process using the Python web framework TurboGears, aircrack-ng, and screen.

The web interface is incredibly simple – it uses only a single combo box.  This makes it ideal for using from the iPhone.

Now, instead of being the sketchy guy sitting in my car with a laptop, I’m just another Seattle-ite staring into my iPhone while the computer doing the WEP cracking is running off my car battery halfway down the block.

Everything in the web interface is fire-and-forget.  You can view a list of available networks, select one for cracking, and it will automatically:

  1. Reconfigure the wireless interface to the correct channel
  2. Begin dumping packets with airodump-ng in a screen session
  3. Begin an ARP replay attack with aireplay-ng in a screen session
  4. Automatically kick off the actual WEP cracking by starting aircrack-ng in a screen session
  5. Once the crack has succeeded, save the ESSID, BSSID, and cracked WEP key in a SQLite database

Since each of the aircrack-ng tools are running in a separate screen session, you can disconnect from the control interface as soon as the crack starts.  You can also reconnect at any time during the crack and view each screen session separately.

When close enough to a target for the ARP replay attack to work, this script averages only 3 minutes to crack a WEP key.  This is on an ITX box with a wimpy Cyrix C3 processor with only 256MB of RAM!

My scripts and installation instructions available here.


I have ported these scripts to the EEE pc (I use Ubuntu Netbook Remix on a 900A), available here.

However, I can’t get airodump-ng to actually capture any packets!  I believe this is a problem with my madwifi driver, but I haven’t sorted it out yet.  Hopefully, if I post the scripts one of you can help me out 🙂

Another Update (October 2010):

Hello Hackaday!  Since writing this initial version, I’ve since learned a lot about Python job control.  Check out the Jabbercracky project, also on this site.  I’m planning on a much-improved version of Groo, using what I’ve learned from Jabbercracky, which will also add some new tricks, including some available WPA cracks.  I’d also like to improve the installer, and to also provide builds for Ubiquiti networks hardware.  If anyone is interested in helping out, please email me at awgh at awgh dot org.

Stay tuned…

EFI and Evil

There is a legend you may have heard of a lowly system administrator who notices a bunch of extra network traffic coming from one of his workstations.  It appears that every packet sent from the workstation is copied and forwarded to an IP address in a country with no extradition treaty.  The admin figures that some kind of rootkit is installed, so he completely reformats the hard drive and re-installs everything.  Fires the thing up anew, and sure enough all packets are still being forwarded overseas!  He can’t wrap his brain around that, so he tries completely removing the hard drive and starting with a new one.  When he gets through the second re-install, all sent packets are STILL being sent overseas.

The admin in this story made the same mistake that many people do when thinking about their computer – he assumed that all user-modifiable executable code is only stored on the hard drive.

This story is at least 10 years old, and I’ve heard it in various different forms, but the technological culprit was certainly a BIOS-based rootkit.  A BIOS-based rootkit can hide in one of two places: either embedded in the BIOS itself or in a PCI Option ROM.  As part of BIOS start-up, it will load executable code from a special ROM chip on each PCI device and execute this code with Ring 0 privileges, meaning the code can control absolutely anything the computer can do.  PCI Option ROMs are intended to let peripheral developers include some driver code that it wants to be run at BIOS execution time.  This code needs Ring 0 access, because it might need to interact directly with the ICs on the PCI device itself.  This is a feature, not a bug.

Delivering a BIOS-based rootkit is a bit more complicated than a traditional rootkit.  There are two methods for this.  The first involves gaining root privileges on the target machine by some means and flashing a new BIOS using SPI or tools provided by the motherboard manufacturer.  If your rootkit goes on a PCI Option ROM, you can also flash those with root access.  Many devices can also be reflashed over the network via PXE.  Most PCI cards have read-only Option ROMs, so this will only work for high-end network cards, video cards, or any other kind of PCI device that touts upgradeable firmware.

The second method is more interesting, and involves physically accessing the hardware, installing the rootkit, and then selling or giving the modified hardware to the intended victim.  A variation on this would be perhaps simply leaving a stack of shrink-wrapped network cards in the hallway outside a system administrators office with a Post-It saying “For IT”.  An advantage of this method is that you can flash the Option ROMs using a proper EEPROM programmer, so you can alter PCI cards that could not be altered using a software tool.  There can be several different Option ROM data segments on one physical chip, so you don’t necessarily have to stomp the existing driver code – you can just add another segment.

In the old BIOS standard, now referred to as Legacy BIOS, doing anything useful in a BIOS-based rootkit took an insane amount of time and skill.  Since BIOS runs before the operating system comes up, a rootkit developer would have no access to system libraries, filesystem drivers, or a network stack.  If the rootkit needed any of these features, the developer would have to write everything from scratch and interact with all of the involved chips at the lowest level (try to do TCP with only peeks and pokes using numbers gleaned from blueprints of chips, in short: HARD-FREAKING-CORE).

That was the past.

In 2009, AMD will start shipping EFI-compatable chipsets by default.  Intel, the main proponent of EFI, has been using it in the Itanium series, but will be using it in just about everything by the end of 2009.  Apple has already been using EFI in all of their Intel Macs since 2006.

This will spell the end of Legacy BIOS.  Most will say good riddance.

EFI has a couple of shiny new advantages.  The main reason that EFI was developed in the first place was to overcome several limitations of Legacy BIOS, including the use of 16-bit processor mode and having only one megabyte of addressable space.

As EFI has been developed over the years, it’s turned into something slightly different.  Intel has open-sourced almost all of a firmware development kit called the EFI Developer Kit or EDK, available here.

The EDK includes a large amount of sample code as well as a large set of utility libraries, all written in straight C.  The utility libraries include a variety of networking functions, including an optional full TCP/IP stack, as well as filesystem access libraries for FAT and NTFS.

Although the EDK doesn’t come with everything required to build a complete firmware, it does come with enough to build the two items of interest to an attacker: PCI Option ROMs and EFI Modules.  EFI Modules are distinct modules of firmware code that can be easily combined with each other to produce a firmware.  If you’ve ever installed the rEFIt boot menu for Intel Macs, you’ve flashed an EFI module into your BIOS.

All of these new features are designed to make life easier for firmware developers, and they do.  They also make life easier for attackers who wish to use EFI for evil.  Now, you can write a rootkit in C instead of assembler, and you can make use of pre-made network and file system libraries.

The only trick remaining is how to keep your rootkit running once BIOS execution has ended and the operating system is running.  There are a few methods available.  The easiest, and lamest, method is to simply write rootkit code out to the hard drive at every boot.  Cooler than that is using Intel’s System Management Mode (SMM) or ACPI event handlers to stash code to log keystrokes and then send it out on the network.  I have yet to see a working demo of this, and I’m told that a successful implementation requires specific knowledge of the Southbridge chip on a specifically targeted motherboard.

Update:  From talking to people at 25c3, I’ve learned that setting up a keystroke logger for just PS2 keyboards is pretty easy.  According to Peter Stuge of coreboot, to capture PS2 keystrokes you can use SMM to trap reads from io port 60, which is guaranteed to be the same on all platforms.  When the SMM trap triggers, it can run code you set up from, say, an Option ROM earlier.  There are examples of the use of SMM in the coreboot source!

There is also a common misconception that TPMs somehow prevent malicious PCI Option ROMs, but this isn’t so.

Apparently capturing USB keyboard keystrokes would be quite a bit more difficult.  Once I get PS2 working maybe… 😉