GNU awk manual
Converting awk scripts to Perl scripts with a2p
See hereHow awk works ?
awk is basically the UNIX cut utility on steriods.awk reads in the input line by line (a line is called a "record" in awk parlance), and for each line, tokenizes it into "fields" (the default delimiter is whitespace, but this can be changed), and runs the script on these fields.
Invoking awk
awk 'script' [list of files] |
Run the script on list of files.
Note the script is better quoted with single quotes to avoid issues. |
awk -Fdelim 'script' [list of files] |
Similar to above, except the delimiter for tokenizing a line being delim. |
awk -f scriptFile [list of files] |
Run the scriptFile on list of files |
Basic syntax
An awk script consists a series of the followingpattern { procedure }Both pattern and { procedure } are optional.
If pattern is omitted, awk applies procedure to all lines.
If { procedure } is omitted, awk prints lines which match pattern, that is, awk will pretend { procedure } is
{ print $0 }
pattern syntax
pattern can take the following formats:
Format | Meaning |
---|---|
begin,end | Process lines within the range, which includes both
the first line that matches begin and
the last line that matches end.
begin and end are awk patterns too, and usually they are the following two formats: expression or /regex/. |
/regex/ | Process lines which match the regular expression regex.
Strictly speaking, this format is just a shorthand of the following awk expression: $0 ~ /regex/ |
expression | Process lines which match expression |
BEGIN | Run procedure before the first line is read. |
END | Run procedure after the last line is read. |
procedure syntax
procedure are statements, which have a C-like syntax.For example, variables are assigned and referred in the same way as in C; there is no need to use the dollar sign $ as in many scripting languages (except $0, $1, ... which have special meaning.)
Regular expressions, when used in conjuction with ~ or ~! operators or sub/gsub/gensub/match functions, are always quoted in slashes /, as in Perl.
Operators
Most of the operators are similar to those in C. In addition, awk accepts
~ ~! |
Match/Don't match a regular expression |
/ | Floating-point division. |
(space) | String concatenation. Parentheses should be used around concatenation to avoid issues |
** ^ |
Exponentiation |
Since awk is a weakly-typed language, it is important to check the type of a variable before doing any operation on it. To check if a variable is a number or not, use
if ((v+0)==v) ...
Special variables
$n | n-th word ("field") of the current line.
Note that n can be an awk variable, e.g. awk 'i=1; print $i' foo |
$0 | The entire current line. |
NF | Number of words ("fields") in the current line. |
NR | Current line number. |
FS OFS |
Input/Output delimiter. The default is space. |
RS ORS |
Input/Output line delimiter. The default is newline. |
FILENAME | The input file name. |
Built-in functions
print item1, item2, ... |
Display item1, item2 ... (separated by space or whatever that is specified by the special variable OFS) followed by a newline. |
printf format, item1, item2, ... |
Formatted print |
sprintf(format, item1, item2, ...) |
Get the formatted string |
atan2 cos exp |
Arithmetic functions |
and or xor |
Bitwise functions |
rand srand |
Random number generator |
int |
Truncate a number toward 0 |
[g]sub(regex,replacement,target) |
Replace the first (sub) or all (gsub) occurrences
of regex with replacement
in the target string. If target is omitted, defaults to $0 |
gensub(regex,replacement,n,target) |
(GNU awk) Replace the n-th occurrence of regex with replacement
in the target string. If target is omitted, defaults to $0 |
match(target,regex) |
Get the beginning position of the longest, leftmost substring which matches regex |
index(haystack,needle) |
Get the position of needle in haystack |
length(string) |
Get the string length |
substr(string,m,n) |
Get the substring starting at m and for length n. n is optional. |
tolower toupper |
Convert to upper/lower cases. |
split(string,array,delim) |
Tokenize the string using the delimiter delim and put the result in array.
Returns the size of array.
If delim is omitted, use the value of the special variable FS |
asort(src,dest) |
(GNU awk) Sort the src array and put the result in dest array.
Returns the size of src.
dest is optional. Set the special variable IGNORECASE to 1 to enable case-insensitive sorting. |
system(command) |
Execute command. |
systime() |
Get UNIX timestamp. |
mktime(datespec) |
Get UNIX timestamp in datespec format.
See here for details. |
strftime(datespec) |
Get UNIX timestamp in datespec format.
See here for details. |