Over the past few months, I've been performing more and more analysis in Linux environments and had the opportunity to refine my go-to commands, picking up a few new (to me) tricks. In this post, I'll share some of the techniques I like to use and encourage you to share other tips/tricks you've used to perform analysis on Linux systems, either as your analysis environment or as your target evidence (or both)!

This is written at the introductory level, to help those who may have not have experienced performing analysis within a bash/sh/zsh (or other) command line environment before.

Warning: this post does not contain Python

Spoiler - I didn't get as far this weekend on this post as I wanted, so I'll keep this one short and put another one out soon with more tips & tricks.

For those newer to the command line

If you are newer to the bash command line, please use the manual (man) pages for documentation. It takes a little while to understand how they are written and where the detail you are looking for lives, so it is best to start using them early. Here's an example of using the man command to learn more about ls:

$ man ls

LS(1)                     BSD General Commands Manual                    LS(1)

     ls -- list directory contents

     ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...]

     For each operand that names a file of a type other than directory, ls
     displays its name as well as any requested, associated information.  For
     each operand that names a file of type directory, ls displays the names
     of files contained within that directory, as well as any requested, asso-
     ciated information.

     If no operands are given, the contents of the current directory are dis-
     played.  If more than one operand is given, non-directory operands are
     displayed first; directory and non-directory operands are sorted sepa-
     rately and in lexicographical order.

Common man pages are found on linux.die.net, though they should be available on the same system as where the command is found, as shown above. There are also great sources such as explainshell.com and Julia Evan's reference illustrations & cheat sheets (such as this on about the man command).

For those experienced with the command line please share your favorite resources!

Working with logs

Log data is not only a very common source of evidence on Linux platforms but is also easier to work with at the command line since it is, generally, semi-structured text data.

Identifying the most interesting content in unknown logs

As part of the process we, in DFIR, like to preview and get a sense of what is useful versus what is noise in a log file. To assist with this data reduction, we can use a few tools and processes to cut down on review time.

One trick I like to employ is the use of less in combination with grep -v. In the example below, we will be looking at a server's auth.log, a common log file, and we are interested in seeing successful authentications. While some of us may know to start looking for strings such as "Accepted publickey" from experience, we will walk through getting to that point using the grep -v method. While grep is a great utility for searching datasets, we want to find the inverse of our pattern and need to use the-v parameter:


As seen above, we add keep adding new log messages to remove from our output, until we see something of interest (ie the Accepted publickey statement). Now, we can instead grep for Accepted publickey as shown below:

$ fgrep "Accepted publickey" auth.log
Sep 29 18:49:39 tracker sshd[12509]: Accepted publickey for root from port 32852 ssh2: RSA SHA256:+EQAdisZCdb274cIdoykPH9p5DAL/VUHLsiNm63eSiM
Sep 29 18:50:17 tracker sshd[12580]: Accepted publickey for root from port 36726 ssh2: RSA SHA256:+EQAdisZCdb274cIdoykPH9p5DAL/VUHLsiNm63eSiM

A few notes on this method:

  • We can use OR statements (|) to form one larger grep statement, though do what is most comfortable for you
  • If the log dataset is small enough it may be best to scroll through the text file
  • Inversely to this method, we could start by searching for IP addresses, usernames, and timestamps depending on how much of that information is already known (as sometimes we aren't lucky enough to have any of those indicators up-front)
  • After identifying what string is useful, go back and confirm you didn't accidentally over-exclude content through the use of one or more of your patterns
  • We interchange use fgrep, and egrep in place of grep. These are variations on the standard grep interface
    • fgrep runs much faster as it only searches fixed strings. It is a good default as a fair amount of the time we are searching for a string without any patterns.
    • egrep allows for extended patterns and is a bit slower. It changes the behavior of the patterns, further detailed in man re_format

Pulling out useful statistics from log files

Another useful technique is to extract information such as 'how many IP addresses attempted authentication to the machine' and related 'what usernames were they using'. To do this, using the same log as before, we can leverage grep, less, and awk.

Let's use a pattern we discovered previously, "Invalid user", to pull these types of answers. The below output shows the first 5 attempts, using head, where we see the username and IP address in the same message.

$ fgrep Invalid\ user auth.log | head -n 5
Sep 24 06:27:11 tracker sshd[29197]: Invalid user babs from
Sep 24 06:27:13 tracker sshd[29199]: Invalid user hostmaster from
Sep 24 06:30:10 tracker sshd[29265]: Invalid user prova from
Sep 24 06:30:45 tracker sshd[29267]: Invalid user contact from
Sep 24 06:34:20 tracker sshd[29269]: Invalid user contact from

Since we want to only extract the IP address, for the first part of the question, let's use the awk command. This command allows us to process text, in this case, printing selected columns of data. By default, awk will split on spaces though we can change the delimiter if needed. While awk has many functions, we will use the print feature to select the column with the IP address. Since awk will split on spaces, we will select the 10th column (column numbering starts at 1).

$ fgrep Invalid\ user auth.log | head -n 5 | awk '{ print $10 }'

Great - we now have a list of IP addresses. Let's now generate some stats using sort and uniq. As the names suggest, we will generate a unique list of IP addresses and gather a count of how many times they appear in those messages:

$ fgrep Invalid\ user auth.log | head -n 5 | awk '{ print $10 }' | sort | uniq -c | sort -rn

The new statements sort | uniq -c | sort -rn is what generates our nicely formatted list. The uniq command requires sorted input to properly deduplicate, uniq -c provides a count in addition to a deduplicated list, and finally sort -rn provides a numerically sorted (-n) list in reversed order (-r). Since this is a statement I use fairly often, I have made two aliases that I find useful:

alias usort='sort | uniq -c | sort -n'  # Normal sort
alias ursort='sort | uniq -c | sort -rn'  # Reverse sort

And now I can re-run the prior command using the alias:

$ fgrep Invalid\ user auth.log | head -n 5 | awk '{ print $10 }' | ursort

A few notes on this method:

  • Using space delimiters is dangerous, especially in log files. Imagine, for example, if a username (somehow) contained a space character. We would no longer be able to use column 10 as our IP address column for that row and would need to employ a different technique
  • The aliases provided only read from stdin. This works for my use case but is an important consideration. Worst-case scenario, we can always run cat <file> | usort to leverage the alias.
  • Adding in the username, or any other field, would be as easy as specifying an additional column number in the awk statement. We would have to reconsider how we generate statistics though, as the usort alias will read the whole line when providing the counts.
  • We can use other tools, such as cut to provide similar functionality to awk. Find the ones you like and can remember and use those :)

One last piece on useful statistics, we can quickly generate larger counts using the wc utility. Leveraging the above command, we will use wc to get a count to the number of lines containing "Invalid user":

$ fgrep Invalid\ user auth.log | wc -l

This utility allows us to count other values, such as characters and words, but in this case, we specified -l to only get us the number of lines.

End of part 1

Sorry for the abrupt and early stop, but I wanted to memorialize this before it became another multi-weekend project that took too long to release. I hope to continue to put smaller posts like this out, hoping they help someone looking to add more bash/sh/zsh/other shell command line environment into their casework.

Next post ideas:

  • Working with JSON data at the command line
  • Writing useful loops
  • List of useful aliases and one-liners

Thoughts on the above? Let me know @chapindb or mail \at\ chapinb [dot] com

Originally posted 2018-09-30