add options to have cat
perform minor modifications to the files as it combines them:
Display Line Ends
If you want to see where lines end, add the -E
or -show-ends
option. The result is a dollar sign ($
) at the end of each line.
Number Lines
The -n
or -number
option adds line numbers to the beginning of every line. The -b
or -number-nonblank
option is similar, but it numbers only lines that contain text.
Minimize Blank Lines
The -s
or -squeeze-blank
option compresses groups of blank lines down to a single blank line.
Display Special Characters
The -T
or -show-tabs
option displays tab characters as ∧I
. The -v
or -show-nonprinting
option displays most control and other special characters using carat (∧
) and M-
notations.
The tac
command is similar to cat
, but it reverses the order of lines in the output:
Joining Files by Field with join
The join
command combines two files by matching the contents of specified fields within the files. Fields are typically space-separated entries on a line. However, you can specify another character as the field separator with the -t
char option, where char is the character you want to use. You can cause join
to ignore case when performing comparisons by using the -i
option.
The effect of join
may best be understood through a demonstration. Consider Listing 1.1 and Listing 1.2, which contain data on telephone numbers. Listing 1.1 shows the names associated with those numbers, and Listing 1.2 shows whether the numbers are listed or unlisted.
Listing 1.1: Demonstration file containing telephone numbers and names
Listing 1.2: Demonstration file containing telephone number listing status
You can display the contents of both files using join
:
By default, join
uses the first field as the one to match across files. Because Listing 1.1 and Listing 1.2 both place the phone number in this field, it's the key field in the output. You can specify another field by using the -1
or -2
option to indicate the join field for the first or second file, respectively. For example, type join -1 3 -2 2 cameras.txt lenses.txt to join using the third field in cameras.txt
and the second field in lenses.txt
. The -o
FORMAT option enables more complex specifications for the output file's format. You can consult the man
page for join
for even more details.
The join
command can be used at the core of a set of simple customized database-manipulation tools using Linux text-manipulation commands. It's very limited by itself, though. For instance, it requires its two files to have the same ordering of lines. (You can use the sort
command to ensure this is so.)
Merging Lines with paste
The paste
command merges files line by line, separating the lines from each file with tabs, as shown in the following example, using Listings 1.1 and 1.2 again:
You can use paste
to combine data from files that aren't keyed with fields suitable for use by join
. Of course, to be meaningful, the files' line numbers must be exactly equivalent. Alternatively, you can use paste
as a quick way to create a two-column output of textual data; however, the alignment of the second column may not be exact if the first column's line lengths aren't exactly even.
File-Transforming Commands
Many of Linux's text-manipulation commands are aimed at transforming the contents of files. These commands don't actually change files' contents but instead send the changed files' contents to standard output. You can then pipe this output to another command or redirect it into a new file.
An important file-transforming command is sed
. This command is very complex and is covered later in this chapter in “Using sed
.”
Converting Tabs to Spaces with expand
Sometimes text files contain tabs but programs that need to process the files don't cope well with tabs. In such a case, you may want to convert tabs to spaces. The expand
command does this.
By default, expand
assumes a tab stop every eight characters. You can change this spacing with the -t
num or -tabs=
num option, where num is the tab spacing value.
Displaying Files in Octal with od
Some files aren't easily displayed in ASCII. For example, most graphics files, audio files, and so on use non-ASCII characters that look like gibberish. Worse, these characters can do strange things to your display if you try to view such a file with cat
or a similar tool. For instance, your font may change, or your console may begin beeping uncontrollably. Nonetheless, you may sometimes want to display such files, particularly if you want to investigate the structure of a data file.
In such a case, od
(whose name stands for octal dump) can help. It displays a file in an unambiguous format – octal (base 8) numbers by default. For instance, consider Listing 1.2 as parsed by od
:
The first field on each line is an index into the file in octal. For instance, the second line begins at octal 20 (16 in base 10) bytes into the file. The remaining numbers on each line represent the bytes in the file. This type of output can be difficult to interpret unless you're well versed in octal notation and perhaps in the ASCII code.
Although od
is nominally a tool for generating octal output, it can generate many other output formats, such as hexadecimal (base 16), decimal (base 10), and even ASCII with escaped control characters. Consult the man
page for od
for details on creating these variants.
Sorting Files with sort
Sometimes you'll create an output file that you want sorted. To do so, you can use a command that's called, appropriately enough, sort
. This command can sort in several ways, including the following:
Ignore Case
Ordinarily, sort
sorts by ASCII value, which differentiates between uppercase and lowercase letters. The -f
or -ignore-case
option causes sort
to ignore case.
Month Sort
The -M
or -month-sort
option causes the program to sort by three-letter month abbreviation (JAN
through DEC
).
Numeric Sort
You can sort by number by using the -n
or -numeric-sort
option.
Reverse Sort Order
The