How Regular Expressions within IDE saved my time

7 minute read

Recently, I had a use case in one of my project which seemed quite painful but was ultimately really easy thanks to modern Integrated Development Environments and regular expressions. This case was quite specific to my needs, and the solution was to use something I was used to. All in all, you might think the solution was obvious after reading this article, yet I think it is worth detailing.

Before sharing this trick with you, I will explain the basics about regular expressions for those who are not aware of it.

Definition

From wikipedia, a regular expression (often called regexp) is a sequence of characters that defines a search pattern. Usually this pattern is used by string searching algorithms to perform “find”, “find and replace” operations on strings or input validation. You probably already use it without knowing it when searching for a file.

As an example, in this unix command :

find . -type f -name foo*

foo* is a regexp

There are several regexp implementations but most specifications are shared with minor variations.

Each subsequence of a regexp is called a class

Special characters

In regular expressions some characters are used by the regexp engine and so can be considered as “keywords”.

Special characters followed by ° are those that are recognized by most implementations but are considered as extended by some unix basic commands (like grep). Usually you must use the option --E to allow their usage.

CharacterDescriptionExample
*Indicates we want the preceding expression 0 to infinite timea* will match “a”, “aaa” but also “”
^Indicates “sequence starts with”^foo will match “foobar” but not “barfoo” or “whatever foobar”
$Indicates “sequence ends with”foo$ will match “barfoo” or “whatever barfoo” but not “foobar”
.Replace any character.* will match “foo”, “bar”, “foo bar” and so on.
[]Defines a bracket expression. It will match any character contained in the brackets.
You can define range by using -, for example [a-z] represents any lower-cased-latin letter.
If you put the - at the beginning or at the end, it will be interpreted as “litteral”.
[aft] will match “abc” or “feg” or “rst” but not “ghi”
[-aft] will not match “abc” but will match “-abc”
[^]Same as above but will match if none of the sequence characters are in those between the brackets[^aft] will match “ghi” but not “abc” or “feg” or “rst”
()Defines a block.
It’s also called a capturing group as it allows to reuse the content later (syntax depending of the implentation, for example in perl: $n gives the content of the nth block)
(ab)* will match “ab” or “abab”
+ °Indicates we want the preceding expression at least once.
This special character is not always recognized by old unix commands but I indicate it here as it’s mainly common accross all implementations
a+ will match “a” or “aaa” but not “”
{n}Indicates we want the preceding expression exactly n times
n must be a positive integer
(a){1} will match “a” but not “”, “aa” and so on
{n,k}Indicates we want the preceding expression at least n times and at most k times
n and k must be positive integers
if k is not present, it means infinite
(a){1,3} will match “a”, “aa”, “aaa” but not “”, nor “aaaa” and so on
\It’s called escape character.
In front of a special character it means: the litteral special character
In front of a character, it can transfom it into a special one (cf. advanced class below)
To match the litteral character it must be preceded by itself or put into brackets
\(a\) will match “(a)”`\a` will match “\a”
| °operator or(ab|fg)+ will match “abcd” and “fgcd”
? °Indicates we want the preceding expression zero or one time(ab)? will match “ab” and “” but not “abab”

All system special characters like \t (tabulation), \n (end of line), \r (chariot return) are also special characters in regexp patterns with the same meaning.

For example : a\nb will match :
“a
b”

Advanced class

Modern regexp engines follow the Perl programming language regexp specification and add some advanced classes to help.

Advanced classDefinitionbasic class equivalent
\sany character that is a “white space”[ \t\r\n\v\f]
\Snegation of the above[^ \t\r\n\v\f]
\walpha numeric character or ‘_’[a-zA-Z0-9_]
\Wnegation of the above[^a-zA-Z0-9_]
\ddigit[0-9]

How Regular expressions can save your life (or at least your time)

Now that you are aware of the basics, you will see how simple the resolution of the problem at the origin of this article.

Use case

Our project is a spring-cloud micro-services project using spring-cloud OpenFeign.

Each feign api was described like this:

@FeignClient(name = "name", path = "/api/foofeign")
public interface FooFeignApi {

In the new spring-cloud “greenwich” version, they introduced the new “contextId” attribute for the @FeignClient annotation that allows to declare multiple feign clients sharing the same name.

The recommendation is to use the bean name as value for this new attribute. So we have to edit our signature:

@FeignClient(contextId="fooFeignApi", name = "name", path = "/api/foofeign")
public interface FooFeignApi {

Resolution

As we have a lot of micro-services, each including many feign api, the brainless way consisting of:

  1. opening each api file in IDE
  2. updating the file

will probably take ages. So we need automation.

As developpers, the obvious way is to develop a script (say in perl) to do the job. So, what we want is:

  1. Find each occurence of the @FeignClient annotation
  2. get the associated className
  3. lower the first className letter
  4. add 'contextId="\<updated class name\>",' at the beginning of the parenthesis content

Nothing that perl doesn’t do. But wait, basically we want to do a simple “search and replace” with a regexp. Doesn’t all the IDEs in the world propose this kind of functionnality ?

The only doubt is the first case lowering, but a quick reading of our IDE regexp engine indicates us that we could use \l in the replace input to lower the next character.

So now we don’t want to search @FeignClient but the pattern:

@FeignClient(<whatever content>)
public interface WhateverFeignApi {

to be replace by:

@FeignClient(contextId="whateverFeignApi", <whatever content>)
public interface WhateverFeignApi {

Who says search pattern says regexp. So first step is to represent the general pattern:

\@FeignClient\(\)\s*\npublic\s*interface\s*\w+

Note :

  1. we have to escape @ because the IDE regexp engine want it.
  2. we use \s* because we’re not sure every api are well formatted

Now we want to reuse the name of the interface, so we have to capture it:

\@FeignClient\(\)\s*\npublic\s*interface\s*(\w+)

So now, in the replaced output, we have access to the following variable:

variablecontent
$1WhateverFeignApi

Now we have to capture the actual content of the paranthesis. Because we don’t know exactly what is the content for each api, we want to rely only on what we know which is whatever follows the opening parenthesis until the closing one.

That leads to the following regexp:

\@FeignClient\(([^)]*)\)\npublic\s*interface\s*(\w+)

So now in the replace output, we have access to the following variable:

variablecontent
$1the content of the parenthesis (might be empty because of *)
$2WhateverFeignApi

The variable name is deduced from the order of appearance of the capture groups. For example, if we don’t want to rewrite “public interface” with the correct space in the replace input, we can also capture it:

\@FeignClient\(([^)]*)\)\s*\n(public\s*interface\s*(\w+))

or

\@FeignClient\(([^)]*)\)\s*\n(public\s*interface\s*)(\w+)

So now, in the replace output, we have access to the following variable :

variablecontent
$1the content of the parenthesis (may be empty because of ‘*’)
$2public\s*interface\s*WhateverFeignApi or public\s*interface\s*
$3WhateverFeignApi

Finally, we can set the replace input with what we want :

@FeignClient(contextId = "\l$3", $1)\n$2

or

@FeignClient(contextId = "\l$3", $1)\n$2$3

Because $1 could be empty, we make a second replacement as below :

\@FeignClient\((contextId = "[^"]+"),\s*\)

replaced by

@FeignClient($1)

And that’s it. Our painful usecase was made in just about 10 minutes.

Conclusion

As you can see, the final method for the resolution was finally quite obvious but, as a developer at ease in batch, I had never considered my IDE as a tool able to do more than autocompletion ^^.

It’s certain now that I will consider this usage as a alternative for this kind of use case.

By the way, I hope I have give you a good preview of what a powerfull tool regexp are

Going further

Categories: ,

Updated:


Written by

Simon Henry

Architect, DevOps fan