How do you use AWK to perform advanced searches in Linux?

Do you ever find yourself wanting to do advanced searches in Linux? In the last article we shared how to use grep to find specific text. In this article we’ll introduce AWK.

The awk command is a powerful method for processing or analyzing text files—in particular, data files that are organized by lines (rows) and columns. Simple awk commands can be run from the command line. More complex tasks should be written as awk programs (so-called awk scripts) to a file.

Let’s assume you are working with an access log file and you want to all the 500 errors and print only the date / times. We’ll use this sample access log file.

Here is an example of how you’d run that command:

awk '$9 == 500 { print $4}' /Users/&user/Documents/temp.log

Here is what this is doing:

This section $9 == 500 identifies the 9th element in the log. For example:

83.167.113.100 - - [13/Dec/2015:02:30:28 +0100] "GET /administrator/ HTTP/1.1" 500 88 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"

This section { print $4} prints the 4th element in the log. If you leave it blank after print, or select $0, it’ll print the entire line.

For example:

83.167.113.100 - - [13/Dec/2015:02:30:28 +0100] "GET /administrator/ HTTP/1.1" 500 88 "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/20100101 Firefox/34.0" "-"

This section /Users/&user/Documents/temp.log identifies the log to parse through. The output should look something like this:

$ awk '$9 == 500 { print $4}' /Users/&user/Documents/temp.log
[13/Dec/2015:02:30:28
[13/Dec/2015:02:33:01
[14/Dec/2015:02:32:35
[15/Dec/2015:02:35:41
[15/Dec/2015:02:37:15
[16/Dec/2015:02:36:39

Alternatively, if you want to search for the number of times someone hit the login page (e.g., administrator) and whether they were successful or not, you can do something like this:

$ awk '/administrator/ { print $9}' /Users/&user/Documents/temp.log | sort | uniq -c
4159 200
   1 301
   5 500

Note: You have to make sure to keep the content inside the the two / / . This is important if you’re doing multiple searches, which you separate using the pipe (|). Example: /administrator|wp-login|admin/

From this you can see there were 4,159 successful requests to the /administrator page, and 5 failures (errors). Similar to grep, you can output the results into other commands, which is what you see here with sort and uniq -c.

Good references:

Sharing is caring!

Leave a Reply