awk
awk is a programming language for processing text field by field (column by column). It excels at aggregating, extracting, and transforming structured text such as CSV files and log files.
Syntax
awk [options] 'program' [file...]
awk -F delimiter 'program' [file...]
Syntax and Variables
| Syntax / Variable | Description |
|---|---|
| '{print $1}' | Prints the first field ($0 refers to the entire line). |
| -F ':' | Specifies the field delimiter (default is whitespace). |
| NR | The line number of the record currently being processed. |
| NF | The number of fields in the current line. Use $NF to reference the last field. |
| FS | The field separator (Field Separator). |
| OFS | The output field separator (Output Field Separator). |
| RS | The record separator (Record Separator; default is a newline). |
| BEGIN { } | A block that runs once before any input is read. |
| END { } | A block that runs once after all input has been read. |
| /pattern/ { } | Applies the action only to lines matching the pattern. |
Sample Code
The following files are used in the examples below.
scores.txt
alice 85 Tokyo
bob 92 Osaka
charlie 78 Tokyo
diana 95 Nagoya
eve 88 Osaka
sales.csv
name,amount,region
alice,1200,east
bob,800,west
charlie,1500,east
name,amount,region alice,1200,east bob,800,west charlie,1500,east
Prints the first field of each line.
awk '{print $1}' scores.txt
alice
bob
charlie
diana
eve
Filters rows where the second column is 90 or greater.
awk '$2 >= 90' scores.txt bob 92 Osaka diana 95 Nagoya
Prints specific fields only for rows that match the condition.
awk '$2 >= 90 {print $1, $2}' scores.txt
bob 92
diana 95
Uses NR to print each line with its line number.
awk '{print NR": "$0}' scores.txt
1: alice 85 Tokyo
2: bob 92 Osaka
3: charlie 78 Tokyo
4: diana 95 Nagoya
5: eve 88 Osaka
Uses the END block to sum a column (summing the second column).
awk '{sum += $2} END {print "Total:", sum}' scores.txt
Total: 438
Uses -F to specify a delimiter and process CSV input.
awk -F',' '{print $1, $2}' sales.csv
name amount
alice 1200
bob 800
charlie 1500
Uses BEGIN and END to add a header and footer.
awk 'BEGIN{print "=== Results ==="} {print $1, $2} END{print "=== End ==="}' scores.txt
=== Results ===
alice 85
bob 92
charlie 78
diana 95
eve 88
=== End ===
Processes only lines matching a pattern (rows where the third column is Tokyo).
awk '$3 == "Tokyo" {print $1, $2}' scores.txt
alice 85
charlie 78
Uses an associative array to remove duplicates.
awk '{print $3}' scores.txt | awk '!seen[$0]++'
Tokyo
Osaka
Nagoya
Notes
awk is a small programming language available in the shell. It supports variables, conditionals, loops, arrays, and built-in functions such as gsub, split, substr, and sprintf. For complex processing, Python or Perl may be more readable, but for one-liner aggregations and transformations, awk is the fastest choice.
For simple line searching, see grep. For text substitution, see sed.
If you find any errors or copyright issues, please contact us.