How To Win At Java Code Audit

Reviewing Java source code can pose a challenge for a security auditor, as methods used to exploit programs in C or C++, namely memory corruption bugs, are mitigated by Java itself, which hides the details of memory management from the programmer.  This same tendency to hide implementation details with a layer of abstraction leads to an entire class of common Java programming errors which can have a critical impact on the security of the application.

Java vulnerabilities are most commonly found in places where unsanitized user input is passed, directly or indirectly, on to an underlying library or service.  To put it another way, vulnerabilities aren’t found in the Java code itself, they are found by following user input through the Java source and out the other side.

The tendency of Java to hide implementation details from the developer actually creates these vulnerabilities in places where it might not otherwise exist.  Java developers use wrapper libraries for backend services, such as SQL or LDAP, and assume that they automatically sanitize their inputs, when usually they do not.  In most cases, Java wrapper libraries themselves are simply classes that store and manipulate strings which are just passed directly on to the wrapped service.  In many of these implementations, such as the ORM library Hibernate, there are architectural reasons why this behavior can not be changed.

In this post, I will describe a class of extremely common Java vulnerabilities, specifically these “pass-through” bugs, characterized by user input passing directly through Java unexamined.

Continue reading “How To Win At Java Code Audit”

Dehydra-GCC: Static Analysis for Poor People

Over the past few months, I’ve been playing with a new static analysis tool from Mozilla called Dehydra.

Dehydra is a GCC plugin that allows you to write Javascript that can perform queries on the Abstract Syntax Tree (AST) that GCC generates from source files.  This lets you write a script that can notify you when it sees any type of code construct that you can describe in script.

There are a number of code constructs that might be interesting to a code auditor, for example:

  • Calls to asnprintf, malloc, or calloc with unchecked return values.
  • Assignment operations where the datatype of the Left Hand Side is signed and the Right Hand Side is unsigned, or vice versa.
  • Assignment operations where the datatypes of both sides have different bit-lengths.

The possibilities are much greater than my short list of examples!

I will be the first to admit that static analysis has its faults.  For one thing, it has been proven that static analysis cannot discover all possible bugs in any given program.  Commercial static analysis tools, such as Coverity, are expensive and have not proven to be a particularly effective method of finding bugs by themselves.  I have heard many accounts of nasty bugs discovered by code auditors when looking through source code routinely scanned by Coverity.

That said, on Day One of a code audit, 4 out of 5 code auditors find themselves reaching for Grep.

Grep is great, it lets you search for regular expressions across many files very quickly, but Grep has no awareness of the syntax of the C++ programming language.  I’m really more interested in searching for specific code constructs and less interested in searching for substrings, which is Grep’s purpose.

When looking for vulnerabilities, I’m not interested in searching for the string “malloc”.  What I really want to know is more along the lines of “Where are all the calls to malloc where the return value is not checked”.  I don’t want to know all the locations of the string “int” as much as I want to know every location that a variable of type int is implicitly cast to an unsigned int when passed in as a function argument.

This is the great thing about Dehydra.  It lets you query the parsed syntax tree of C++ source code and ask the kinds of questions that can’t be easily answered by Grep.

Scripts for Dehydra are written in Javascript by way of the SpiderMonkey engine.  Javascript is a nice, small language that is good for operations on tree-like data structures.  In a browser, this would mean the DOM, but in GCC this means the AST!

Dehydra is still in development, but the developers have been extremely responsive to feature requests from security auditors ( well, mine anyway… *grin* ).

It would be great to see a bunch of people contribute scripts and build a big set of security scanning scripts to replace the venerable regular-expression-based FlawFinder as the king of no-budget security-oriented static analysis.

Try it out and get back to me.

Setup and Installation Instructions for Dehydra on Linux or OSX

I’ve included a sample Dehydra script below that logs a message anytime it sees certain assignment operations.

The full sample script, along with a test file, is available here.

function assignVisitor(node) {
   for(var i in node.statements) {
      var loc = node.loc
      var lhs = node.statements[i].type
      var rhs = node.statements[i].assign

      if( rhs && lhs ) {
         if( lhs.unsigned ) {
            if(parseInt(rhs[0].value) > 0) {
               print( "ASSIGN: negative to unsigned at:"+loc+"\n" )
            }
            else if(rhs[0].type && !rhs[0].type.unsigned) {
               print( "ASSIGN: signed to unsigned at:"+loc+"\n" )
            }
         }
         else if(rhs[0].type) {// lhs is signed
            if( rhs[0].type.unsigned ) {
               print( "ASSIGN: unsigned to signed at:"+loc+"\n" );
            }
         }
      }
   }
}

function process_function(decl,body) {
   iter(assignVisitor, body)
}