GNU awk manual
Converting awk scripts to Perl scripts with a2p
See hereHow awk works ?
awk is basically the UNIX cut utility on steriods.awk reads in the input line by line (a line is called a "record" in awk parlance), and for each line, tokenizes it into "fields" (the default delimiter is whitespace, but this can be changed), and runs the script on these fields.
Invoking awk
awk 'script' [list of files] |
Run the script on list of files.
Note the script is better quoted with single quotes to avoid issues. |
awk -Fdelim 'script' [list of files] |
Similar to above, except the delimiter for tokenizing a line being delim. |
awk -f scriptFile [list of files] |
Run the scriptFile on list of files |
Basic syntax
An awk script consists a series of the followingpattern { procedure }Both pattern and { procedure } are optional.
If pattern is omitted, awk applies procedure to all lines.
If { procedure } is omitted, awk prints lines which match pattern, that is, awk will pretend { procedure } is
{ print $0 }
pattern syntax
pattern can take the following formats:
Format | Meaning |
---|---|
begin,end | Process lines within the range, which includes both
the first line that matches begin and
the last line that matches end.
begin and end are awk patterns too, and usually they are the following two formats: expression or /regex/. |
/regex/ | Process lines which match the regular expression regex.
Strictly speaking, this format is just a shorthand of the following awk expression: $0 ~ /regex/ |
expression | Process lines which match expression |
BEGIN | Run procedure before the first line is read. |
END | Run procedure after the last line is read. |
procedure syntax
procedure are statements, which have a C-like syntax.For example, variables are assigned and referred in the same way as in C; there is no need to use the dollar sign $ as in many scripting languages (except $0, $1, ... which have special meaning.)
Regular expressions, when used in conjuction with ~ or ~! operators or sub/gsub/gensub/match functions, are always quoted in slashes /, as in Perl.
Operators
Most of the operators are similar to those in C. In addition, awk accepts
~ ~! |
Match/Don't match a regular expression |
/ | Floating-point division. |
(space) | String concatenation. Parentheses should be used around concatenation to avoid issues |
** ^ |
Exponentiation |
Since awk is a weakly-typed language, it is important to check the type
of a variable before doing any operation on it. To check if a variable
is a number or not, use
if ((v+0)==v) ...
Special variables
$n | n-th word ("field") of the current line.
Note that n can be an awk variable, e.g. awk 'i=1; print $i' foo |
$0 | The entire current line. |
NF | Number of words ("fields") in the current line. |
NR | Current line number. |
FS OFS |
Input/Output delimiter. The default is space. |
RS ORS |
Input/Output line delimiter. The default is newline. |
FILENAME | The input file name. |