If you’ve ever done a lot of work with Regular Expressions, you’re probably familiar with the concept of catastrophic backtracking, which means that the engine is forced to calculate permutations of exponential proportions. For instance, click here run this example and see how long it takes (should be about 5-10 seconds to time out):

However, a small change results in near instantaneous response. Why?  

To quote Andreas Haufler from Dzone, in his article How to Kill Java with a Regular Expression, from where this very concise example came:

 

 What at first glance looks like an infinite loop turns out to be catastrophic backtracking.

What this basically means is that the matcher detects that no “A” was found at the end of the input. Now the outer quantifier goes one step back, the inner one forward, and again, no result.

Therefore, the matcher goes back step by step and tries all combinations to find a match. It will eventually return (without a match), but the complexity (and therefore the runtime) of this is exponential (adding one character to the input doubles the runtime).

So just how might we protect ourselves from such a devastating effect, should we encounter a scenario that could cause the Java process to run amuck for days, or even years?

You might be able to simply take care to avoid this situation in your patterns, if you have that luxury, but if you are hosting an application that accepts regular expressions as input – for example an online java visual regex tester, then you’ll need to protect from this or be vulnerable to denial of service attacks.

Read the full article and solution at:

http://ocpsoft.org/regex/how-to-interrupt-a-long-running-infinite-java-regular-expression/