awk

Since:		全Linux
		macOS（2001 Cheetah）
		Bash 1.0（1989）

awk is a programming language for processing text field by field (column by column). It excels at aggregating, extracting, and transforming structured text such as CSV files and log files.

Syntax

awk [options] 'program' [file...]

awk -F delimiter 'program' [file...]

Syntax and Variables

Syntax / Variable	Description
'{print $1}'	Prints the first field ($0 refers to the entire line).
-F ':'	Specifies the field delimiter (default is whitespace).
NR	The line number of the record currently being processed.
NF	The number of fields in the current line. Use $NF to reference the last field.
FS	The field separator (Field Separator).
OFS	The output field separator (Output Field Separator).
RS	The record separator (Record Separator; default is a newline).
BEGIN { }	A block that runs once before any input is read.
END { }	A block that runs once after all input has been read.
/pattern/ { }	Applies the action only to lines matching the pattern.

Sample Code

The following files are used in the examples below.

scores.txt

Yagami 85 Tokyo
Kusanagi 92 Osaka
Terry 78 Tokyo
Mary 95 Nagoya
Goenitz 88 Osaka

sales.csv

name,amount,region
Yagami Iori,1200,east
Kusanagi Kyo,800,west
Terry Bogard,1500,east

Prints the first field of each line.

awk '{print $1}' scores.txt
Yagami
Kusanagi
Terry
Mary
Goenitz

Filters rows where the second column is 90 or greater.

awk '$2 >= 90' scores.txt
Kusanagi 92 Osaka
Mary 95 Nagoya

Prints specific fields only for rows that match the condition.

awk '$2 >= 90 {print $1, $2}' scores.txt
Kusanagi 92
Mary 95

Uses NR to print each line with its line number.

awk '{print NR": "$0}' scores.txt
1: Yagami 85 Tokyo
2: Kusanagi 92 Osaka
3: Terry 78 Tokyo
4: Mary 95 Nagoya
5: Goenitz 88 Osaka

Uses the END block to sum a column (summing the second column).

awk '{sum += $2} END {print "Total:", sum}' scores.txt
Total: 438

Uses -F to specify a delimiter and process CSV input.

awk -F',' '{print $1, $2}' sales.csv
name amount
Yagami Iori 1200
Kusanagi Kyo 800
Terry Bogard 1500

Uses BEGIN and END to add a header and footer.

awk 'BEGIN{print "=== Results ==="} {print $1, $2} END{print "=== End ==="}' scores.txt
=== Results ===
Yagami 85
Kusanagi 92
Terry 78
Mary 95
Goenitz 88
=== End ===

Processes only lines matching a pattern (rows where the third column is Tokyo).

awk '$3 == "Tokyo" {print $1, $2}' scores.txt
Yagami 85
Terry 78

The following is a more advanced example. seen is an associative array that records each line as a key. The idiom !seen[$0]++ means "only process lines not yet seen," making it the standard way to remove duplicate lines.

Uses an associative array to remove duplicates.

awk '{print $3}' scores.txt | awk '!seen[$0]++'
Tokyo
Osaka
Nagoya

printf and OFS for Formatted Output

awk supports a C-style printf for formatting output with fixed-width columns and custom delimiters.

Use printf to format output with fixed-width columns.

awk '{printf "%-20s %3d pts\n", $1, $2}' scores.txt
Yagami               85 pts
Kusanagi             92 pts
Terry                78 pts
Mary                 95 pts
Goenitz              88 pts

Use OFS (Output Field Separator) to convert TSV to CSV.

awk 'BEGIN{OFS=","} {$1=$1; print}' scores.txt
Yagami,85,Tokyo
Kusanagi,92,Osaka
Terry,78,Tokyo
Mary,95,Nagoya
Goenitz,88,Osaka

Initialize a variable in BEGIN and print the aggregated result in END.

awk 'BEGIN{max=0} {if ($2 > max) {max=$2; name=$1}} END{print "Top score:", name, max}' scores.txt
Top score: Mary 95

Common Mistakes

Common mistake 1: string vs. number comparison

When using == for comparison, be aware of how awk handles strings vs. numbers. While "85" and 85 are treated as equal, using a numeric literal makes the intent explicit.

awk '$3 == "Tokyo" {print}' scores.txt
Yagami 85 Tokyo
Terry 78 Tokyo

awk '$2 == 85 {print}' scores.txt
Yagami 85 Tokyo

Common mistake 2: quoted CSV fields

awk -F',' splits fields naively and does not handle quoted commas. Input like "a,b","c" would be split incorrectly. Use gawk's FPAT, or Python/Ruby for quoted CSV. For simple CSV without quoted fields, -F',' works fine.

awk -F',' '{print $1}' sales.csv
name
Yagami Iori
Kusanagi Kyo
Terry Bogard

Common mistake 3: missing newline in printf

printf does not append a newline, so all output is merged onto one line.

awk '{printf $1}' scores.txt
YagamiKusanagiTerryMaryGoenitz

Always include \n explicitly to get the correct output.

awk '{printf "%s\n", $1}' scores.txt
Yagami
Kusanagi
Terry
Mary
Goenitz

Notes

awk is a small programming language available in the shell. It supports variables, conditionals, loops, arrays, and built-in functions such as gsub, split, substr, and sprintf. For complex processing, Python or Perl may be more readable, but for one-liner aggregations and transformations, awk is well-suited for the task.

For simple line searching, see grep. For text substitution, see sed.

If you find any errors or copyright issues, please contact us.

Home

Linux & Mac & Bash Command Dictionary

awk