Creating more readable regular expressions with Simple Regex Language

What Next?

You might be wondering what you can do with the finished SRL expression, since Grep and most other tools only digest conventional regular expressions.

If you just need a regular expression on the fly, you can always enter the SRL expression into the test at the SRL website (as described earlier in this article), and then copy the resulting regular expression to Grep or another tool.

Also, some languages have already begun to implement SRL support. You will find special SRL libraries for JavaScript, PHP, Python, and C++ at GitHub under the MIT license [3]. Java and C# libraries are still pending.

The functions and classes of these SRL libraries accept an SRL expression, evaluate it, and convert it into a regular expression. In the case of PHP, the user just has to create an SRL object and check the text using the method isMatching():

$srl = new SRL('one of "eE" literally "rror"');$srl?isMatching('Error'); // is True

In addition to featured keywords like literally, SRL also offers the keywords shown in Tables 1 to 6. You'll find a detailed reference and many other examples on the official SRL homepage [1].

Table 1

Character Strings

Keyword

Description

literally "string"

Representative of the character string string.

one of "abc"

One of the characters a, b, or c.

letter from a to d

One of the characters a, b, c, or d. letter without the appendix from ... represents any lowercase letter.

uppercase letter

Any uppercase letter.

any character

Uppercase or lowercase letter from A to Z, a number from 0 to 9, or an underscore (_).

no character

All other (special) characters.

digit from 1 to 4

One of the digits 1, 2, 3, or 4, whereby digit without the appendix from ... represents any digit between 0 and 9.

anything

Any character with the exception of a line break.

new line

Line break.

whitespace

A whitespace character (this includes the space character, the tabulator, and the line break).

no whitespace

Character that is not a whitespace character.

tab

Tabulator.

backslash

Backslash character (\).

raw "[a-z]"

Stands for the result of the regular expression [a-z].

Table 2

Quantifiers

Keyword

Description

exactly 4 times

Something repeats exactly four times. The expression exactly 1 time can be abbreviated to once; exactly 2 times to twice.

between 2 and 4 times

Something repeats between two and four times; the following keyword times is optional.

optional

Something may occur, but does not have to.

once or more

Something must occur at least once.

never or more

Something must occur multiple times or not at all.

at least 2 times

Something must occur at least twice.

Table 3

Groups

Keyword

Description

capture (condition)

Captures the condition and can be returned from the engine. Anyone who uses capture multiple times can also give names to the individual captured parts: capture (anything once or more) as "first".

any of (condition)

Each condition within the brackets could apply.

capture (condition1) until (condition2)

Captures the expression condition1 if condition2 does not already apply.

Table 4

Lookarounds

Keyword

Description

if followed by

Checks whether something particular follows (lookahead).

if not followed by

Check whether something does not follow.

if already had

Checks whether something was preceding (lookbehind).

if not already had

Checks whether something was not preceding.

Table 5

Flags

Keyword

Description

case insensitive

Uppercase and lowercase are not of any importance.

multi line

The text to be checked runs over multiple lines.

all lazy

The evaluation is performed according to the Lazy principle.

Table 6

Anchors

Keyword

Description

start with

Something explicitly refers to the start of a string.

must end

Something refers to the end of a string.

Conclusions

SRL is built on the philosophy that, if regular expressions are easier to read, errors will stand out more quickly. It is worth noting, however, that SRL expressions are also complex and difficult to understand if you aren't accustomed to the syntax. SRL does not currently feature comments, which would help to add clarity, and recursion is also missing. Classic Unix text tools such as Grep do not yet provide SRL support; however, you can convert the expression at the SRL website or use an SRL library with some programming languages.

Despite the problems, SRL is still worth a look. If you only use regular expressions occasionally, or if you are using them for the first time, you will get to your destination much faster with SRL. Even regex old-timers might find they can make their expressions more legible with SRL.

In the future, developer Karim Geiger wants to add support for additional programming languages and also standardize the SRL language to define the syntax and commands more clearly. In the long run, he imagines a kind of compiler that translates regular expressions into SRL. He has stated that a Bash version is not planned but is conceivable.

SRL commands are based on English. Geiger has resisted the suggestion to translate the natural language commands to other written languages. He fears that language proliferation would introduce a complexity that could lead to incompatible versions.

Infos

  1. Simple Regex Language: https://simple-regex.com
  2. Build Tool for the SRL: https://simple-regex.com/build
  3. SRL Libraries: https://github.com/SimpleRegex

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Regex Generators

    As regular expressions grow in complexity, regex generators can make the job easier by computing the patterns for you.

  • rename

    The rename command is a powerful means to simultaneously rename or even move multiple files following a given pattern.

  • Command Line – tre-agrep

    Tre-agrep has all of grep's functionality but can also do ambiguous or fuzzy searches without deep knowledge of regular expressions.

  • Solving Wordle with Regexes

    Five letters, one word, six tries – that's Wordle. You can solve any Wordle in just a few steps and gain practical experience using grep and regular expressions.

  • Command Line: Grep

    Once you understand the intricacies of grep, you can find just about anything.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News