Linux awk command: pattern scanning and processing language
Linux awk command Function Description
The awk command reads the file line by line, slices each line with a space as the default separator, and the sliced portion is then processed for various analyses. awk is a powerful text analysis tool that is particularly powerful when it comes to analyzing data and generating reports as opposed to grep lookups and sed edits.
awk is a programming language for working with text and data under linux/unix. Data can come from standard input (stdin), one or more files, or the output of other commands. It supports advanced features such as user-defined functions and dynamic regular expressions, making it a powerful programming tool under linux/unix. It is used at the command line, but more often as a script. awk has many built-in features, such as arrays, functions, etc., which are the same as C. Flexibility is the biggest advantage of awk.
Linux awk command Syntax
awk [Option] [File]
awk [Option] [Program] [File]
The meaning of each option in the command is shown in the following table:
Option | Description |
---|---|
-f <Program File> |
Reads the AWK program source from the specified program file |
-F <File System> |
Use the specified file system as the input field separator |
-v <variable=value> |
Assign values to variables before starting the program |
-mf<value> |
Set different memory limits. The f flag sets the maximum number of fields |
-mr <value> |
Set different memory limits. The r flag sets the maximum record size |
-O |
Enable optimization in the internal representation of the program |
--compat |
Run in compatibility mode |
--dump-variables=<File> |
Displays a sorted list of global variables, their type values, and final values to files |
--exec=<File> |
Similar to the -f option, but this is the last processing of the option |
--gen-po |
The AWK program is scanned and parsed, and a GNU.po file is generated on standard output |
--non-decimal-data |
Identify octal and hexadecimal values in input data |
--profile=<file> |
Send the analysis data file. The default value is awkprof.out |
--re-interval |
Enable regular expression matching using interval expressions |
--source=<Program Text> |
Uses the specified program text as the source code for the AWK program |
--traditional |
Matches traditional UNIX AWK regular expressions |
--usage |
Displays a relatively short summary of the options available on standard output |
--use-lc-numeric |
Enforce the use of the locale’s decimal character when parsing input data |
AWK has a number of built-in variables for setting environment information that can be changed. The following table gives some of the most commonly used variables:
Built-in variables | Description |
---|---|
ARGC |
Number of command line parameters |
ARGV |
Permutation of command line arguments |
NVIRON |
An array containing the values of the current environment |
FILENAME |
Name of the current input file |
FNR |
The number of input records for the current input file |
FS |
Enter a field separator, which is a space by default |
NF |
The number of fields in the current input record |
NR |
The number of read records |
OFS |
Output field separator |
ORS |
Output record separator |
RS |
Enter a record delimiter, which by default is a newline character |
OFMT |
An output format for numbers |
RT |
Record termination character |
RSTART |
Matches the index of the first character |
RLENGTH |
Match string length |
SUBSEP |
Character to separate multiple elements in an array, by default "\034" |
TEXTDOMAIN |
The text field of the AWK program |
ARGIND |
The ARGV index for the current file is being processed |
BINMODE |
On non-POSIX systems, specify the use of all file I/O in "binary" mode |
CONVFMT |
Conversion format for numbers, default is "%.6g" |
IGNORECASE |
Controls all regular expression and string operations to be case-sensitive |
PROCINFO |
Provides the elements of an array with access to information about running AWK programs |
String constants in AWK are sequences of characters enclosed in double quotes, and the following table lists the commonly used string constants:
String constants | Description |
---|---|
\\ |
The backslash |
\a |
alert characters, usually ASCII BEL characters |
\b |
Backspace key |
\f |
Change the page |
\n |
A newline |
\r |
enter |
\t |
Horizontal TAB characters |
\v |
Vertical TAB characters |
\xhex digits |
The character is represented by a string in x below the hexadecimal number |
\c |
Literal character c |
Linux awk command Example
Show only the last 5 users logged into the system
last -n 5 | awk '{print $1}'
Output:
Show only the accounts in the /etc/passwd file
[root@rhel ~]# cat /etc/passwd |awk -F ':' '{print $1}'
root
bin
daemon
adm
lp
sync
shutdown
halt
..........................
Show only the accounts in the /etc/passwd file and the Shells corresponding to the accounts, with the [Tab] key separating the accounts from the Shells
[root@rhel ~]# cat /etc/passwd |awk -F ':' '{print 1'' \t''7}'
root /bin/bash
bin /sbin/nologin
daemon /sbin/nologin
adm /sbin/nologin
lp /sbin/nologin
sync /bin/sync
shutdown /sbin/shutdown
halt /sbin/halt
mail /sbin/nologin
uucp /sbin/nologin
........................(Omitted)
Show only the accounts and their corresponding shells in /etc/passwd, with a comma between accounts and shells, add the column name, shell to all lines, and add blue, /bin/nosh to the last line
[root@rhel~]#shell''}{print1'', ''7}\
>END{print''blue, /bin/nosh''}'
name, shell
root, /bin/bash
bin, /sbin/nologin
daemon, /sbin/nologin
adm, /sbin/nologin
lp, /sbin/nologin
sync, /bin/sync
shutdown, /sbin/shutdown
halt, /sbin/halt
........................(Omitted)
tcpdump, /sbin/nologin
radiusd, /sbin/nologin
blue, /bin/nosh
Search for all lines in the /etc/passwd file that have the root keyword
[root@rhel ~]# awk -F: '/root/' /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
Search for all lines in the /etc/passwd file that begin with the root keyword
[root@rhel ~]# awk -F: '/^root/' /etc/passwd
root:x:0:0:root:/root:/bin/bash
Search for all lines in the /etc/passwd file that have the root keyword and display the corresponding shell
[root@rhel ~]# awk -F: '/root/{print $7}' /etc/passwd
/bin/bash
/sbin/nologin
Statistics on the /etc/passwd file, showing the file name, the line number of each line, the number of columns in each line, and the corresponding full line content
[root@rhel ~]# awk -F ':' '{print ''filename:'' FILENAME '', linenumber:'' NR '', \
> columns:” NF '', linecontent:'' $0}' /etc/passwd
filename:/etc/passwd, linenumber:1, columns:7, linecontent:root:x:0:0:root:/root:/bin/bash
filename:/etc/passwd, linenumber:2, columns:7, linecontent:bin:x:1:1:bin:/bin:/sbin/nologin
filename:/etc/passwd, linenumber:3, columns:7, linecontent:daemon:x:2:2:daemon:/sbin:/sbin/nologin
filename:/etc/passwd, linenumber:4, columns:7, linecontent:adm:x:3:4:adm:/var/adm:/sbin/nologin
filename:/etc/passwd, linenumber:5, columns:7, linecontent:lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
filename:/etc/passwd, linenumber:6, columns:7, linecontent:sync:x:5:0:sync:/sbin:/bin/sync
filename:/etc/passwd, linenumber:7, columns:7, linecontent:shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
filename:/etc/passwd, linenumber:8, columns:7, linecontent:halt:x:7:0:halt:/sbin:/sbin/halt
filename:/etc/passwd, linenumber:9, columns:7, linecontent:mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
........................(Omitted)
Count the number of accounts in the /etc/passwd file
[root@rhel ~]# awk '{count++; print $0; } END{print ''user count is '', count}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
........................(Omitted)
cyrus:x:76:12:Cyrus IMAP Server:/var/lib/imap:/sbin/nologin
ldap:x:55:55:LDAP User:/var/lib/ldap:/sbin/nologin
squid:x:23:23::/var/spool/squid:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
radiusd:x:95:95:radiusd user:/home/radiusd:/sbin/nologin
user count is 67
Display the accounts in the /etc/passwd file, showing the UID and username
[root@rhel~]#awk-F':''BEGIN{count=0; }{name[count]=$1; count++; }; \
> END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd
0 root
1 bin
2 daemon
3 adm
4 lp
5 sync
6 shutdown
7 halt
8 mail
9 uucp
10 operator
11 games
12 gopher
........................(Omitted)
Count the number of bytes occupied by files in the current directory
[root@rhel ~]# ls -l |awk 'BEGIN {size=0; } {size=size+$5; } END{print ''[end]size is '', size}'
[end]size is 170057
// Statistics does not include subdirectories under the directory
Count the number of MB occupied by files in the current directory
[root@rhel ~]# ls -l |awk 'BEGIN {size=0; } {size=size+$5; } \
> END{print ''[end]size is '', size/1024/1024, ''MB''}'
[end]size is 0.162179 MB
Count the number of MB occupied by files in the current directory, filter files of 4096 bytes size (usually folders)
[root@rhel ~]# ls -l |awk 'BEGIN {size=0; print ''[start]size is '', size} \
> {if(5! =4096){size=size+5; }} END{print ''[end]size is '', size/1024/1024, ''MB''}'
[start]size is 0
[end]size is 0.130929 MB