can sort in several ways, including the following:
Ignore Case
Ordinarily, sort
sorts by ASCII value, which differentiates between uppercase and lowercase letters. The -f
or -ignore-case
option causes sort
to ignore case.
Month Sort
The -M
or -month-sort
option causes the program to sort by three-letter month abbreviation (JAN
through DEC
).
Numeric Sort
You can sort by number by using the -n
or -numeric-sort
option.
Reverse Sort Order
The -r
or -reverse
option sorts in reverse order.
Sort Field
By default, sort
uses the first field as its sort field. You can specify another field with the -k
field or -key=
field option. (The field can be two numbered fields separated by commas, to sort on multiple fields.)
As an example, suppose you wanted to sort Listing 1.1 by first name. You could do so like this:
The sort
command supports a large number of additional options, many of them quite exotic. Consult sort
's man
page for details.
Breaking a File into Pieces with split
The split
command can split a file into two or more files. Unlike most of the text-manipulation commands described in this chapter, this command requires you to enter an output filename or, more precisely, an output filename prefix, to which is added an alphabetic code. You must also normally specify how large you want the individual files to be:
Split by Bytes
The -b
size or -bytes=
size option breaks the input file into pieces of size bytes. This option can have the usually undesirable consequence of splitting the file mid-line.
Split by Bytes in Line-Sized Chunks
You can break a file into files of no more than a specified size without breaking lines across files by using the -C=
size or -line-bytes=
size option. (Lines will still be broken across files if the line length is greater than size.)
Split by Number of Lines
The -l
lines or -lines=
lines option splits the file into chunks with no more than the specified number of lines.
As an example, consider breaking Listing 1.1 into two parts by number of lines:
The result is two files, numbersaa
and numbersab
, which together hold the original contents of listing1.1.txt
.
If you don't specify any defaults (as in split listing1.1.txt), the result is output files split into 1,000-line chunks, with names beginning with x
(xaa
, xab
, and so on). If you don't specify an input filename, split
uses standard input.
Translating Characters with tr
The tr
command changes individual characters from standard input. Its syntax is as follows:
You specify the characters you want replaced in a group (SET1) and the characters with which you want them to be replaced as a second group (SET2). Each character in SET1 is replaced with the one at the equivalent position in SET2. Here's an example using Listing 1.1:
The tr
command relies on standard input, which is the reason for the input redirection (<) in this example. This is the only way to pass the command a file.
This example translates some, but not all, of the uppercase characters to lowercase. Note that SET2 in this example was shorter than SET1. The result is that tr
substitutes the last available letter from SET2 for the missing letters. In this example, the J
in Jones
became a c
. The -t
or -truncate-set1
option causes tr
to truncate SET1 to the size of SET2 instead.
Another tr
option is -d
, which causes the program to delete the characters from SET1. When using -d
, you omit SET2 entirely.
The tr
command also accepts a number of shortcuts, such as [:alnum:]
(all numbers and letters), [:upper:]
(all uppercase letters), [:lower:]
(all lowercase letters), and [:digit:]
(all digits). You can specify a range of characters by separating them with dashes (-
), as in A-M
for characters between A
and M
, inclusive. Consult tr
's man
page for a complete list of these shortcuts.
Converting Spaces to Tabs with unexpand
The unexpand
command is the logical opposite of expand
; it converts multiple spaces to tabs. This can help compress the size of files that contain many spaces and can be helpful if a file is to be processed by a utility that expects tabs in certain locations.
Like expand
, unexpand
accepts the -t
num or -tabs=
num option, which sets the tab spacing to once every num characters. If you omit this option, unexpand
assumes a tab stop every eight characters.
Deleting Duplicate Lines with uniq
The uniq
command removes duplicate lines. It's most likely to be useful if you've sorted a file and don't want duplicate items. For instance, suppose you want to summarize Shakespeare's vocabulary. You might create a file with all of the Bard's works, one word per line. You can then sort this file using sort
and pass it through uniq
. Using a shorter example file containing the text to be or not to be, that is the question
(one word per line), the result looks like this:
Note that the words to
and be
, which appeared in the original file twice, appear only once in the uniq
-processed version.
File-Formatting Commands
The next three commands —fmt
, nl
, and pr
– reformat the text in a file. The first of these is designed to reformat text files, such as when a program's README
documentation file uses lines that are too long for your display. The nl
command numbers the lines of a file, which can be helpful in referring to lines in documentation or correspondence. Finally, pr
is a print-processing tool; it formats a document in pages suitable for printing.
Reformatting Paragraphs with fmt
Sometimes text files arrive with outrageously long line lengths, irregular line lengths, or other problems. Depending on the difficulty, you may be able to cope simply by using an appropriate text editor or viewer to read the file. If you want to clean up the file a bit, though, you can do so with fmt
. If called with no options (other than the input filename, if you're not having it work on standard input), the program attempts to clean up paragraphs, which it assumes are delimited by two or more blank lines or by changes in indentation. The new paragraph formatting defaults to paragraphs that are no more than 75 characters wide. You can change this with