A black-box approach against obfuscated regular expressions using Pin

After reading the “Regular Expressions Obfuscation Under the Microscope” blogpost by Axel “0vercl0k” Souchet less than two weeks ago, I decided to try the approach detailed by Jonathan Salwan in his blogpost “A binary analysis, count me if you can“.

[NOTE: You probably want to read the two blogposts mentioned above before continuing reading.]

In the latter post, Jonathan explains how to find the right password in a crackme program with a black-box approach, by instrumenting the program with Intel’s Pin, a Dynamic Binary Instrumentation (DBI) framework. The basic idea of the approach is that (depending on how the target program was coded), you can analyze the execution flow of the program by counting the number of instructions executed when processing different input strings, and based on that instruction count, determine whether the program is accepting or not the input characters. Quoting Jonathan, this approach works for programs that rely “on a simple string-compare function that returns as soon as the function sees a difference” between your input string and the correct string; that is, it works for programs in which “as long as the comparison is correct, the number of instruction executed increases“.

Part 1 – Simple, non-obfuscated regular expression

Let’s take a look at the first C program that accepts the regular expression “Hi-[0-9]{4}”, taken from the “Regular Expressions Obfuscation Under the Microscope” post:

Continue reading