awk
| Since: | 全Linux | |
|---|---|---|
| macOS(2001 Cheetah) | ||
| Bash 1.0(1989) |
awk is a programming language for processing text field by field (column by column). It excels at aggregating, extracting, and transforming structured text such as CSV files and log files.
Syntax
awk [options] 'program' [file...]
awk -F delimiter 'program' [file...]
Syntax and Variables
| Syntax / Variable | Description |
|---|---|
| '{print $1}' | Prints the first field ($0 refers to the entire line). |
| -F ':' | Specifies the field delimiter (default is whitespace). |
| NR | The line number of the record currently being processed. |
| NF | The number of fields in the current line. Use $NF to reference the last field. |
| FS | The field separator (Field Separator). |
| OFS | The output field separator (Output Field Separator). |
| RS | The record separator (Record Separator; default is a newline). |
| BEGIN { } | A block that runs once before any input is read. |
| END { } | A block that runs once after all input has been read. |
| /pattern/ { } | Applies the action only to lines matching the pattern. |
Sample Code
The following files are used in the examples below.
scores.txt
Yagami 85 Tokyo Kusanagi 92 Osaka Terry 78 Tokyo Mary 95 Nagoya Goenitz 88 Osaka
sales.csv
name,amount,region Yagami Iori,1200,east Kusanagi Kyo,800,west Terry Bogard,1500,east
Prints the first field of each line.
awk '{print $1}' scores.txt
Yagami
Kusanagi
Terry
Mary
Goenitz
Filters rows where the second column is 90 or greater.
awk '$2 >= 90' scores.txt Kusanagi 92 Osaka Mary 95 Nagoya
Prints specific fields only for rows that match the condition.
awk '$2 >= 90 {print $1, $2}' scores.txt
Kusanagi 92
Mary 95
Uses NR to print each line with its line number.
awk '{print NR": "$0}' scores.txt
1: Yagami 85 Tokyo
2: Kusanagi 92 Osaka
3: Terry 78 Tokyo
4: Mary 95 Nagoya
5: Goenitz 88 Osaka
Uses the END block to sum a column (summing the second column).
awk '{sum += $2} END {print "Total:", sum}' scores.txt
Total: 438
Uses -F to specify a delimiter and process CSV input.
awk -F',' '{print $1, $2}' sales.csv
name amount
Yagami Iori 1200
Kusanagi Kyo 800
Terry Bogard 1500
Uses BEGIN and END to add a header and footer.
awk 'BEGIN{print "=== Results ==="} {print $1, $2} END{print "=== End ==="}' scores.txt
=== Results ===
Yagami 85
Kusanagi 92
Terry 78
Mary 95
Goenitz 88
=== End ===
Processes only lines matching a pattern (rows where the third column is Tokyo).
awk '$3 == "Tokyo" {print $1, $2}' scores.txt
Yagami 85
Terry 78
The following is a more advanced example. seen is an associative array that records each line as a key. The idiom !seen[$0]++ means "only process lines not yet seen," making it the standard way to remove duplicate lines.
Uses an associative array to remove duplicates.
awk '{print $3}' scores.txt | awk '!seen[$0]++'
Tokyo
Osaka
Nagoya
printf and OFS for Formatted Output
awk supports a C-style printf for formatting output with fixed-width columns and custom delimiters.
Use printf to format output with fixed-width columns.
awk '{printf "%-20s %3d pts\n", $1, $2}' scores.txt
Yagami 85 pts
Kusanagi 92 pts
Terry 78 pts
Mary 95 pts
Goenitz 88 pts
Use OFS (Output Field Separator) to convert TSV to CSV.
awk 'BEGIN{OFS=","} {$1=$1; print}' scores.txt
Yagami,85,Tokyo
Kusanagi,92,Osaka
Terry,78,Tokyo
Mary,95,Nagoya
Goenitz,88,Osaka
Initialize a variable in BEGIN and print the aggregated result in END.
awk 'BEGIN{max=0} {if ($2 > max) {max=$2; name=$1}} END{print "Top score:", name, max}' scores.txt
Top score: Mary 95
Common Mistakes
Common mistake 1: string vs. number comparison
When using == for comparison, be aware of how awk handles strings vs. numbers. While "85" and 85 are treated as equal, using a numeric literal makes the intent explicit.
awk '$3 == "Tokyo" {print}' scores.txt
Yagami 85 Tokyo
Terry 78 Tokyo
awk '$2 == 85 {print}' scores.txt
Yagami 85 Tokyo
Common mistake 2: quoted CSV fields
awk -F',' splits fields naively and does not handle quoted commas. Input like "a,b","c" would be split incorrectly. Use gawk's FPAT, or Python/Ruby for quoted CSV. For simple CSV without quoted fields, -F',' works fine.
awk -F',' '{print $1}' sales.csv
name
Yagami Iori
Kusanagi Kyo
Terry Bogard
Common mistake 3: missing newline in printf
printf does not append a newline, so all output is merged onto one line.
awk '{printf $1}' scores.txt
YagamiKusanagiTerryMaryGoenitz
Always include \n explicitly to get the correct output.
awk '{printf "%s\n", $1}' scores.txt
Yagami
Kusanagi
Terry
Mary
Goenitz
Notes
awk is a small programming language available in the shell. It supports variables, conditionals, loops, arrays, and built-in functions such as gsub, split, substr, and sprintf. For complex processing, Python or Perl may be more readable, but for one-liner aggregations and transformations, awk is well-suited for the task.
For simple line searching, see grep. For text substitution, see sed.
If you find any errors or copyright issues, please contact us.