How to Search and Analyze Text in Linux

Share This Post:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
text editing linux

Knowing how to search and analyze text in Linux is critical if you want to manage a Linux server as many of the decisions you make on a day to day basis involve either collecting operational data from the server or finding documentation.  There are many structures and tools to search and analyze text in Linux , which can help you in uncovering the knowledge you seek quickly.  This article add more items to your Linux command-line tool belt including:

  • Filtering text
  • Editing text
  • Formatting text
  • Using redirection 

These are concepts that you will use for the rest of your career in Linux and and cybersecurity.

How to Process Text Files in Linux

Once in a while, and by that we mean every 10 to 20 minutes, you will find you need to do something with a text file, like extract information from it, format it differently or transform it in some way.  So the next few sections of this article are going to teach you how to do some of this fancy techno-wizardly.

Filtering Text in Linux

Extracting small data sections with the “cut” command is the best way to sift through a large text file if you are searching for something specific;  The command allow you to view particular fields within a file’s records. The command’s basic syntax is as follows:

cut OPTION… [FILE]…

Using the cut command requires understanding some basic requirements/nomenclature regarding text files:

  • Text File Records: Using the cut command requires delineation of “text file records”.  As seen in the image below, a “text file record” is a single file line that ends in the ASCII newline linefeed character “LF“; if used by your text file, they can be seen via the “cat -E” command which displays every newline linefeed as a “$”.  If your text file records end in the ASCII character “NUL“, you can also use cut on them, but you must use the “-z” option.
cat -e command linux
Demonstration of the "cat -e" command showing the newline linefeed.
  • Text File Record Delimiter:  In order for some  “cut” command options to be used, fields must exist within each text file record. These fields are not database-style fields but instead data that is separated by a delimiter, one or more characters that create a boundary between data items within a record.  For example, the file /etc/passwd, uses colons (:) to separate data items within a record.
  • Text File Changes:  The “cut” command does not change any data within the text file, but rather copies the indicated data and displays it to you. 
ShortLongDescription
-c nlist–characters nlistDisplay only the record characters in
the nlist (e.g., 1-5).
-b blist–bytes blistDisplay only the record bytes in the
blist (e.g., 1-2).
-d d–delimiter dDesignate the record’s field delimiter
as d. This overrides the Tab default
delimiter. Put d within quotation marks
to avoid unexpected results.
-f flist–fields flistDisplay only the record’s fields denoted
by flist (eg 1, 5)
-s–only-delimitedDisplays only records that contain the
designated delimiter.
-z–zero-terminatedDesignates the record end-of-line character
as the ASCII character NUL

The image below show the structure of the passwd file so we know how to structure the “cut” command in order to extract the data we want. This passwd file uses colons “:” as a field delimiter within the records. The “cut” command designates the colon delimiter with the “-d” option. The “-f” option specifies that the following fields should be displayed:

  1. Username: It is used when user logs in. It should be between 1 and 32 characters in length.
  2. Password: An x character indicates that encrypted password is stored in /etc/shadow file. Please note that you need to use the passwd command to computes the hash of a password typed at the CLI or to store/update the hash of the password in /etc/shadow file.
  3. User ID (UID): Each user must be assigned a user ID (UID). UID 0 (zero) is reserved for root and UIDs 1-99 are reserved for other predefined accounts. Further UID 100-999 are reserved by system for administrative and system accounts/groups.
cut -d command linux
Using the "cut -d -f" options to select the field delimiter as well as which fields to display.

In the image below, we used the “-c” option to show the first 7 characters of each line of the file.

cut -c command linux

The “grep” command can be helpful in filtering text files. The table below show some of the most popular options for the command. 

grep: Basic Regular Expressions

Many “grep” commands use regular expressions, a pattern template you define that grep uses the pattern to filter text.  Basic regular expressions/BREs, include:

  • Characters, such as a dot followed by an asterisk (.*) to represent multiple characters,
  • A single dot (.) to represent one character,
  • Brackets to represent multiple characters, such as [a,e,i,o,u] or a range of characters, such as [A-z],
  • Preceding characters you want to find at the beginning of a record with a caret (^) symbol, and,
  • Succeed characters with a dollar sign ($) symbol to find text file records with particular characters at the record’s end.   BRE pattern puts “$” at the end of the word because if you place the “$” before the term, it gets treated as a variable name instead of a BRE pattern.
grep linux command

In image above, the “grep” command is used in conjunction with the regular expression asterisk “*” character. This command instructs “grep” to search the password file for any instances of the word daemon within a record and display that record if it also contains the word nologin after the word daemon.

The next examples demonstrate searching for instances of the term “root” within the password file. The last use of the command employs the caret “^” symbol character before the word root to display just the lines in the password file that begin with term “root”.

The “-v” option produces a list of text file records that do not contain the pattern.

grep -v command
Using the grep -v command to exclude files that do not fit certain criteria.

The image above shows two examples:

  • Finding all the records in the password file that do not end in nologin.
  • Finding all files that do start with the “testuser” prefix.

grep: Extended Regular Expressions

Extended regular expressions (EREs) allow more complex patterns to be searched for.

  • The vertical bar symbol (|) specifies two possible words or character sets to match.
  • Parentheses to designate additional subexpressions.
 

In the image below, we use the “grep” command to search for any password file records start with either the word “root” or “testuser”.  Let’s break down the structure of the instruction:

  • The “-E” option with the “grep” command indicate the pattern is an extended regular expression. 
  • Quotation marks around the ERE pattern protect it from misinterpretation. 
  • As indicated by the caret (^) placed prior to each term, and the vertical bar (|) separating the words, the command indicates that a record can start with either word.   

 

egrep ere linux command
Using the "egrep" command with extended regular expression.

In the second example in the image above, notice that the “egrep” command is employed rather than using the “grep -E” command.   The command also uses

  • Quotation marks to avoid misinterpretation.
  • Parentheses to make use of a subexpression consisting of a choice, indicated by the vertical bar (|) between “daemon” and the letter “s”.
  • The “.* “symbols to indicate anything can be in between the subexpression choice and the word “nologin” in the text file record.

Formatting Text

In order to acruately understand and interprest data within text files, you need to reorganize/reformat the data in some way. We will review a number of ways we can do this.

The "sort" Command

To understand the data within text files, you need to reformat file data in some way. There are a couple of simple utilities you can use to do this. The “sort” command sorts the output of a file’s data, but makes no changes to the original file.  The command’s basic syntax is:

sort [OPTION]… [FILE]…

The image below shows the simple use of the sort command to organize a text file

sort command linux
Using the "sort" command to organize a text file.
When sorting a file with numeric values, add the -n option to the command, as seen in the image below.
sort command linux numbers

The table below shows some commonly used options used with the “sort” command.

ShortLongDescription
-c–checkCheck if file is already sorted. Produces no
output if file is sorted. If file is not sorted,
it displays the file name, the line number, the
keyword disorder, and the first unordered
line’s text.
-f–ignore-caseConsider lowercase characters as uppercase
characters when sorting.
-k n1 [,n2]–key=n1[,n2]Sort the file using the data in the n1 field.
May optionally specify a second sort field
by following n1 with a comma and specifying
n2. Field delimiters are spaces by default.
-M–month-sortDisplay text in month of the year order.
Months must be listed as standard
three-letter abbreviations, such as
JAN, FEB, MAR, and so on.
-n–numeric-sortDisplay text in numerical order
-o file–output=fileCreates new sorted file named “file”.
-r–reverseSorts and displays the text in reverse order.

The "cat" Command

The “cat” command, used concatenate files for display is typically used to display a single file.  However you can use it for concatenating two text files to display their contents, one after the other.  The table below shows a number of the command’s options.

ShortLongDescription
-A–show-allEquivalent to using the option -vET
combination.
-E–show-endsDisplay a $ when a newline linefeed
is encountered.
-n–numberNumber all text file lines and display
that number in the output.
-s–squeeze-blankDo not display repeated blank empty
text file lines.
-T–show-tabsDisplay a ^I when a Tab character is
encountered.
-v–show-nonprintingDisplay nonprinting characters when
encountered using either ^ and/or
M- notation.

The image below shows the use of the “cat” command to concatenate two files.

cat command linux

The printf Command

The purpose of the “printf” utility is formatting and displaying text data; its basic syntax is as follows:

printf FORMAT [ARGUMENT]…

Unlike the pr” utility that formats entire text files, the “printf” command is formats the output of a single text line. If you want it to process a full file, you must incorporate other commands and write a Bash shell script for it to process a whole text file.  When using this command, you provide text formatting via FORMAT for the ARGUMENT.  The image below provides an example of how to use the command.

printf command

The printf command uses the quote enclosed “%s\n” as the formatting description.  Let’s break down what this formatting means

  • %s: Instructs printf to print the characters listed in the ARGUMENT (“Mr Big was Here”).
  •  \n: This portion of the FORMAT tells the printf command to print a newline character after the string. This allows the prompt to display on a new line, instead of at the displayed string’s end as can be seen when we ran the command with and without the “\n” character.

 

The table below shows the most common format settings used with the printf command.

 

FormatDescription
%cDisplay the first ARGUMENT character.
%dDisplay the ARGUMENT as a decimal
integer number.
%fDisplay the ARGUMENT as a floating-point
number.
%sDisplay the ARGUMENT as a character
string.
\%Display a percentage sign.
\”Display a double quotation mark.
\\Display a backslash.
\fInclude a form feed character.
\nInclude a newline character.
\rInclude a carriage return.
\tInclude a horizontal tab.

As seen in the table above, the “printf” command can print floating-point numbers.  The image below shows how the command can process a 5 decimal point number into a 3 decimal points with the %.3f format.

printf number formatting
Using the "printf" command with the "%f" argument to format a number.

Determining Word Count

The “wc” utility is commonly used for determining counts in a text file.  The command’s syntax is:

wc [OPTION]… [FILE]…

A useful “wc” option for troubleshooting configuration files is the “-L “switch. Configuration file line length is usually less than 150 bytes, so if you have just edited a configuration file and run into operational issues, check the file’s longest line length.  Longer than usual line length indicates a possible merger of two configuration file lines.  If executed with no options,  the utility displays the file’s number of lines, words, and bytes in that order as seen in the screenshot below.

wc linux

The table below show a few of the commonly used options for the “wc” command.

ShortLongDescription
-c–bytesDisplay the file’s byte count.
-L–max-line-lengthDisplay the byte count of the file’s
longest line.
-l–linesDisplay the file’s line count.
-m–charsDisplays the file’s character count.
-w–wordsDisplays the file’s word count.

Redirecting Input and Output

Understanding how to redirect input and output is an important part of knowing how to search and analyze text in Linux.

Handling Standard Output in Linux

If you have been paying attention to any of the Linux resources you read, you will know by now that Linux treats every object as a file, including the output process like displaying a text file on the screen. Every file object has a file descriptor, an identifying integer that classifies a process’s open files. For output from a command or script file, that file descriptor is 1. It is also identified by the abbreviation STDOUT, which describes standard output which directs output to your current terminal, represented by the /dev/tty file.

Executing the “echo” command along with a text string results in the display of the text string to your process’s STDOUT, usually the terminal screen. One of the basic characteristics of STDOUT is that you can redirect its output it via redirection operators, which allows you to change the default behavior of where input and output are sent, via the command line. For STDOUT, use the you redirection operator “>“, to direct output.  The image below shows how to build a text file of test users who have passwords using the STDOUT redirect.

redirecting STDOUT
Redirecting STDOUT to a text file.

In the image above, the password file is being audited for all accounts that for test user accounts via the grep command. The grep command’s output was redirected STDOUT to a file and now the data file can be viewed using the “cat” command.  If you use the “>” redirection operator and direct output to preexisting file, it deletes the file’s current data will be deleted.  This situation is avoidable by appending data to a preexisting file, with a different redirection operator, “>>“. If the file does not exist, it gets created with outputted data included.  The image below shows this appending functionality in action.

Append data stdout linux

Redirecting Standard Error

From time to time, it is useful to redirect is standard error, which is identified by its file descriptor, the number “2”. It is also identified by the abbreviation STDERR, which describes standard error. STDERR, like STDOUT, is by default sent to your terminal (/dev/tty).  Similar to STDOUT,  redirect  STDERR to a file with the the “2>” operator (notice the “2” in there).  If you want to append data to the file, use the “2>>” operator.  We demonstrate the use of standard redirection in the image below where we try to access files with the hosts directive in the /etc/ directory that we do not have permission to access.

STDERR Redirection Linux
Using "2>" to redirect Standard Error output in Linux to another file.

You can send standard error and standard output to the same file, using the “&>” redirection operator to accomplish the goal.  You can also throw the output away by redirecting STDOUT or STDERR to the /dev/null file.

Regulating Standard Input

Standard input comes into a Linux system via a keyboard or other input devices and is identified by:

  • The numeral zero “0
  • The abbreviation STDIN

You can redirect STDIN with the redirection operator, the “<” symbol.   You need to redirect STDIN when using some Linux utilities, like the “tr” command, as seen in the image below where we used the command change the spaces between the numbers to commas.  It is important to note that the command does not change the underlying file.

Redirecting STDIN Linux
Using the "tr" command in conjunction with a redirection of STDIN.

The table below outlines the most commonly used redirection operators.

OperatorsDescription
>Redirect STDOUT to a file. If file exists, overwrite it. If it does not exist, create it.
>>Redirect STDOUT to a file. If file exists, append to it. If it does not exist, create it.
2>Redirect STDERR to specified file. If file exists, overwrite it. If it does not exist, create it.
2>>Redirect STDERR to specified file. If file exists, append to it. If it does not exist, create it.
&>Redirect both STDOUT and STDERR to specified file. If file exists, overwrite it. If it does
not exist, create it.
&>>Redirect STDOUT and STDERR to specified file. If file exists, append to it. If it does
not exist, create it.
<Redirect STDIN from specified file into command.
<>Redirect STDIN from specified file into command and redirect STDOUT to
specified file.

You may find yourself applying the redirection of STDOUT/STDIN with the “diff” command, which discovers disparities between two text files and can change the differing text file.  Let’s work through this process together, you know, as a one big happy Linux nerd family ;).  The first screenshot below shows that the two text files that have some similar entries.   We then make a copy of one of the text files (a good practice when modifying files is to either work with a copy or back up the original document).  We then run the “diff” command with the “-e” option, creating the “ed” script, which we will eventually use to make the copyoffile2.txt the same as file1.txt.  After we create the script, which we call “switch.sh” we add two characters that are used to exit the script.

creation ed script diff command linux
Using redirection of STDOUT to text files in conjunction with the "diff" command.

The screenshot below shows the following:

  • The text of the script we created.
  • That the copyoffile2.txt and file1.txt are different.
  • After running the script, which redirects the STDIN of the switch.sh file to the copyoffile2.txt with the “ed” command, we run the “diff” command again to demonstrate the files are now the same.
creation ed script diff command linux
Part 2 of using the "diff" command with redirection of STDIN to change file contents.

Piping Commands

The “pipe“, a simple redirection operator represented by the ASCII character 124 (|), and variously referred to as the:

  • Vertical bar
  • Vertical slash, or
  • Vertical line. 

Using the pipe, STDOUT, STDIN, and STDERR can be directed between multiple commands all on the same command line.  Redirection syntax with the pipe symbol is as follows:

command 1 | command 2 [| command n]…

The above shows that the first command, command 1, is executed and its STDOUT is redirected as STDIN for command2, and so on as any command in the pipeline has its STDOUT redirected as STDIN to the next command in the pipeline.  The image below shows a number of examples of the pipe in action:

  • The first command in the pipe searches the password file for any records that start with testuser, finding all test user accounts on the system. The output from the first command in the pipe is passed as input into the second command in the pipe. The “wc -l” command counts how many lines have been produced by the grep command, showing that there are 11 test user accounts on this Linux system.
  • The second example shows a command employing a pipeline with four different utilities in a pipeline to audit test user accounts.:
    • Output from the “grep” command is fed as input into the “cut” command.
    • The “cut” utility removes only the first 3 fields from each password record, which is the account username, password and UID.
    • Output from the “cut” command serves as the input for the “sort” command, which sorts  usernames in alphabetical order.
    • Output from the “sort” utility serves as input for the “more” command.
  • Make use of the “tee” command if you want to keep a copy of the command pipeline’s output.  The “tee” command allows saving the output to a file as well as displaying it to STDOUT. The “tee” command is handy when you are installing software from the command line and want to see what is happening as well as keep a log file of the transaction for later review.
pipe command Linux
Numerous examples of using the Pipe command in Linux to channel STDOUT as STDIN to new commands.

Creating Here Documents

STDIN redirection can be accomplished using a “here document” (sometimes called here text or heredoc) which allows the redirection of  multiple items into a command. It can also modify a file using a script, create a script, keep data in a script, and so on.  The here document redirection operator is << followed by a keyword, which can be anything and signals both the beginning of the data as well as the data’s end. In the image below: 

  • The “sort” command is entered, followed by “<<“, the redirection operator and a keyword, “stop_wasting_time”. 
  • The Enter key is pressed, 
  • A secondary prompt, “>“, appears to indicate additional data can be entered.
  • Enter the words you want sorted.
  • The keyword, EOF, is entered again to denote that data entry is complete. When this occurs, the sort utility alphabetically organizes the words, displaying the results to STDOUT.
here document creation linux
Creating a HERE document using the sort command.

Creating Command Lines

There are several different methods you can use to create a command line, including piping STDOUT from other commands into the “xargs” utility as demonstrated in the screenshots below. 

  • The first command finds empty files in the tmp subdirectory. 
  • The second command does the same and then pipes the output as STDIN into the “xargs” utility which uses the “ls” command to list the files.  The “xargs” utility requires the program’s location in the virtual directory tree.
  • The second screenshot shows the use of the “-p” option with the “xargs” utility, causing it to stop and ask permission before executing the command-line command to remove all three empty files.
Xargs linux command
Using the "Xargs" command to execute the "ls" function.
Xargs linux command
Using the "Xargs" command in Linux to delete files.

Shell expansion is another methods to create command-line commands.   One method puts a command to execute within parentheses and precedes it with a dollar sign.  Using the $() method to create commands.  The screenshot below uses the “find” command to locate empty files in the Document directory;  As the command is within the $() symbols, it does not display to STDOUT but rather file names are passed to the “ls” command. 

Another method inserts the command to execute within backticks (`); The command between the backticks executes and its output is passed as input to the ls utility.  Both command-line commands behaves exactly alike. 

Shell expansion Linux
Using shell expansion to create command-line commands.

Editing Text Files

Manipulating text is a large part of learning how to search and analyze text in Linux and is performed via either a regular stream editor or a full-fledged interactive text editor.

How to Use Linux Text Editors

Nano Text Editor

Being able to use interactive text file editors is an important component of knowing how to search and analyze text in Linux.

The nano editor is a good text editor to start using if you have never dealt with an edi- tor or have only used GUI editors. To start using the nano text editor, type nano followed by the file’s name you wish to edit.  If the file doesn’t exist, it gets created.  In the screenshot of the nano editor, seen below, you see four main sections of the editor:

  1. Title Bar : Found at the editor window’s top line; shows the current editor version the name of the current file. If you simply typed in nano and did not include a file name, you would see New Buffer in the title bar.
  2. Main Body:  Area where editing is  performed. If the file has existing text, its first lines are shown in this area. To view text not in the main body,  use either arrow keys, Page Up or Down key, and/or the page movement shortcut key combinations to move through the body of text.
  3. Status Bar:  Displays status information for certain events. 
  4. Shortcut List: The list at the window’s bottom shows the most common commands and their associated shortcut keys. The caret (^) symbol in this list indicates that the Ctrl key must be used.  To see additional commands, press the Ctrl+G key combination for help.  In the editor’s help system, you’ll see some key combinations denoted by M-k. These are meta-character key combinations, and the M represents the Esc, Alt, or Meta key, depending on your keyboard’s setup. The k simply represents a keyboard key.

 

Nano Linux
Screenshot of the Nano text editor for Linux.

The vim Editor

Originally, the vi editor was a Unix text editor, and was renamed vim stands for “vi improved” when it was rewritten as an open-source tool.  Some distributions, such as Ubuntu, do not have the vim editor installed by default. Instead, they use an alternative, called vim.tiny, which will not use all of the commands used here.  Some distributions have a vim tutorial installed by default. To use it,  type “vimtutor” at the command line.

Start the vim text editor, type vim or vi, depending on your distribution, followed by the file you wish to edit or create.  The vim editor loads the file data in a memory buffer, and this buffer is what is displayed on the screen. Opening vim without a file name or non-existent file starts a new buffer area for editing.

The vim editor has a message area near the bottom line (as seen in the image below). If opening an existing file, it displays the file name along with the number of lines and characters read into the buffer area. If creating a new file, you see [New File] in the message area.

Vim Linux
Screenshot of Linux vim editor.

vim has three standard modes:

  • Command Mode:  When you first enter the buffer area, vim is in this mode; this is sometimes called normal mode.  In this mode, you enter keystrokes to enact commands is the best mode to use for quickly moving around the buffer area.
  • Insert Mode: Also called edit or entry mode;  Is the mode where you can perform simple editing. There are not many commands or special mode keystrokes in Insert Mode beyond  by pressing the I key to enter it.  At this point, the message –Insert– displays in the message area;  Leave this mode by pressing the Esc key.
  • Ex Mode:  Also called colon commands because every command entered must be preceded with a colon (:).  You must be in command mode to enter into Ex mode and you cannot jump from insert mode to ex mode. If you’re currently in insert mode, press the Esc key to go back to command mode first.

Since you start in command mode when entering the vim editor’s buffer area, it’s good know the commonly used commands to move around the editing screen and edit data.

KeystrokeDescription
hMoves cursor left by one character.
lMoves cursor right by one character.
jMoves cursor down one line.
kMove cursor up one line.
wMove cursor forward one word to the front of the next word.
eMoves cursor to the end of the current word.
bMove cursor backward one word.
^Move cursor tot beginning of line.
$Move cursor to end of line.
ggMoves cursor to the file’s first line.
GMove cursor to the file’s last line.
nGMove cursor to file line number n.
Ctrl+BScroll up almost one full screen.
Ctrl+FScroll down almost one full screen.
Ctrl+UScroll up half a screen
Ctrl+DScroll down half of screen.
Ctrl+YScroll up one line
Ctrl+EScroll down one line.
CWMove the cursor to a word’s first letter
and press CW. The word is deleted, and
you are thrown into insert mode. You can
then type in the new word and press Esc
to leave insert mode.
ZZType ZZ in command mode to write the buffer
to disk and exit your process from the vim editor.

Ex mode commands can be seen in the table below.

KeystrokesDescription
: xWrites buffer to file and quits editor.
:wqWrites buffer to file and quits editor.
:wq!Writes buffer to file and quits editor while overriding protection.
:wWrites buffer to file, stays in editor
:w!Writes buffer to file, overrides protection, stays in editor
:qQuits editor, no writing of buffer to file.
:q!Quits edit, over writes protection and does not write buffer to file.
:! commandExecute the indicated shell command and displays the result without quitting the editor.
:r! commandExecute shell command and includes the results in the editor buffer area.
:r fileRead file contents and include them in the edittor buffer area.

How to Use Stream Editors in Linux

Stream editor modifies text passed to it via a file or a pipeline, making uses of special commands to change text it “streams” through the editor utility.

sed: Stream Editor

Taking a page from the “blinding flash of the obvious” book of naming conventions, the first stream editor we will take a look at is called “the stream editor” and is invoked with the “sed” command.  The utility is quick because it makes only one pass through the text from STDIN to apply the modifications to a stream of text based on a set of commands supplied ahead of time, either entered directly into the command line or stored in a text file.  The syntax for the utility is as follows:

sed [OPTIONS] [SCRIPT]… [FILENAME]

The editing process is as follows:

  • Reads one text line at a time from the input stream.
  • Matches that text with the supplied editor commands.
  • Modifies the text as specified in the commands. 
  • Outputs the modified text to STDOUT
  • Once sed reaches the end of the text lines, it stops.
sed modify file linux

In the first example in the screenshot above, notice that text output from the echo command is piped into the stream editor. The “sed” utility’s “s” command (substitute) instructs that if the first text string, cake, is found, it is changed to “swedish berries” in the output.   When executing this command, make note of:

  • The entire command after sed is considered to be the SCRIPT.
  • The script is encased in single quotation marks. 
  • The delimitation of text words from the “s” command, the quotation marks, and each other via forward slashes (/).
  • Using the “s” command will not change all instances of a word within a text stream.  The second and third examples in the screen shot demonstrate this. Only the first occurrence of the word “cannabis” was modified. However, in the second command, the “g”, which stands for global, was added to the end of the sed script, causing all occurrences of “cannabis” to change to “swedish berries”.
  • You can also modify text stored in a file as seen in the fourth example above.  The file contains text lines that contain the word “cookies. When the cookies.txt file is added as an argument to the sed command, the data output is modified based on the script, but the data in the file is not modified. The stream editor only displays the modified text to STDOUT.

The stream editor has some rather useful command options outlined in the table below.

ShortLongDescription
-e script–expression=scriptAdd commands in script to text processing.
The script is written as part of the sed command.
-f script–file=scriptAdd commands in script to text processing.
The script is a file.
-r–regexp-extendedUse extended regular expressions in script.

In the examples below show the use of the “-e” and “-f” options.   In the first example,  the script contains a semicolon (;) between the two script commands. This allows both commands to be processed on the text stream.  The second example shows how you can store sed script commands in a file, so you can use the script file over and over again. 

sed command linux
Using the sed with both the "-e" and "-f" options.
sed script linux
Syntax of sed script: Notice that each sed command is on its own file line and no single quotation marks are used.

When you inspect the sed script, in the image above, it has:

  • A single sed commands on each file line.
  • Not used single quotation marks.

The gawk Utility

A more powerful  stream editor due to its editing process  and programming language, allowing you to:

  • Define data storage variables.
  • Work on data with arithmetic and string operators to work on data.
  • Process your data use programming structures, such as loops.
  • Create formatted reports from data.

Popular for using large data sets to generate reports, gawk programs can be stored as files for repeated use. gawk‘s basic syntax is:

gawk [OPTIONS] [PROGRAM]… [FILENAME]

While similar to “sed” in some ways in that:

  • You provide the program on the same command line as the gawk command, and
  •  It employs the use of single quotation marks to enclose the script.  

 

The main difference is that “gawk” putting your programming language commands between  curly braces as seen in the screen shot below.

gawk modify stdout linux
Using the "gawk" command to modify a string of text in Linux.com

As seen above, the “echo” command prints text to STDOUT, but the “gawk” command’s ability to define data field variables is demonstrated:

  • The $0 variable represents the entire text line.
  • The $1 variable represents the text line’s first data field.
  •  The $2 variable represents the text line’s second data field. 
  • The $n variable represents the text line’s nth data field.

The gawk utility can also process text data from a file. 

gawk structured commands
Using "gawk" structured commands to modify output from text files.

The screenshot above demonstrates that gawk’s programming language uses  typical structures employed in other programming languages. In the first example, the text in the first field was changed.  The real challenge is dealing with text lines of different lengths, like in the second example, which has cookies in the $4 field rather than the $6 field.  The second gawk attempt employs an if statement to check if data field $6 is equal to the word cookies. If the statement evaluates “true”, the data field is changed to cakes and the text displayed via STDOUT. 

Some common “gawk” commands can be seen in the table below

ShortLongDescription
-F–field-separator dSpecify the delimiter separating the file’s data fields.
-f file–file=fileUse program in file for text processing.
-s–sandboxExecutes gawk program in sandbox mode.
gawk data extraction
Using gawk to extract user ID's from the passwd file in Linux.

 

An example of pulling UID data from the password file using “-F” switch is in the screen shot above.  The screen shot below shows using more complex “gawk” programs that have been stored in files.  The file, gawkscript.gawk, stores complex “if”  and “else if “ statements written in the gawk programming language.  Also remember that no single quotation marks are needed when the gawk program is stored in a file.  Using the -f switch, the program is enacted on the cake.txt file, and the appropriate word is changed.  Always remember that file logic can result in unexpected output.

gawk program files linux
As you now learned, there are a large number of options available to you when it comes to learning how to search and Analyze text in Linux. Your best way to learn is by practicing over and over and over again.

Share This Post:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

Table of Contents

You May Like

Related Posts