Bash Pattern Matching Operator

GNU Bash (4.2 as of this post)’s manual page says that its pattern matching operator =~

value =~ pattern

uses the ERE (Extended Regular Expression syntax) specification used by regex(3) which is the POSIX 1003.2 regular expressions format. Also, =~ has the same precedence as the == and != operators.

This is documented in the SHELL GRAMMAR section – Compound Commands – [[ expression ]].

awk – regexp constants v.s. string constants

The linked to guide section below (“Using Dynamic Regexps”) is well worth the reading to understand the differences between regexp and string constants used while matching and the 3 main pros in using regexp constants over string constants while performing any matching operation:

http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_5.html#SEC32

Gawk’s gensub Regexp Replacement Function

This is a general substitution function provided as an extension by the GNU Gawk utility.

Syntax:

gensub(regexp, replacement, how[, target])

Unlike sub and gsub, the target is not modified, but just contains the original text input. The result of the operation is the return of the function, instead. The regular-expression pattern regexp will be searched for in target (default is $0 i.e. the entire record) and each one of these matches (greedy) will be substituted for the replacement text according to the value held by the how argument: “g” or “G” will cause the replacement for all the matches; or it can be a number that indicates which occurrence to replace, starting from one (“1″).

A very interesting functionality provided by this extension is the group capturing ability in the same fashion as widely used for regexp’s i.e. via enclosing parenthesis. You specify which group to use in the replacement by the n notation, where n is a digit from one to nine. (Reminder: To put a backslash in the string you must represent it as ‘\’.) An example is as follows:

$ gawk ‘
> BEGIN {
>     a = “dog beautiful”
>     b = gensub(/([^ ]+) ([^ ]+)/, “\2 \1″, “g”, a)
>     print b
> }’
-| beautiful dog

Now an example specifying which match to replace (as opposed to “g” or “G”, denoting global substitution, as a value for the 3rd argument):

$ gawk ‘
> BEGIN {
>     a = “such a dog beautiful”
>     b = gensub(/([^ ]+) ([^ ]+)/, “\2 \1″, “2″, a)
>     print b
> }’
-| such a beautiful dog