Richard Blum

LPIC-1 Linux Professional Institute Certification Study Guide


Скачать книгу

can sort in several ways, including the following:

      Ignore Case

      Ordinarily, sort sorts by ASCII value, which differentiates between uppercase and lowercase letters. The -f or -ignore-case option causes sort to ignore case.

      Month Sort

      The -M or -month-sort option causes the program to sort by three-letter month abbreviation (JAN through DEC).

      Numeric Sort

      You can sort by number by using the -n or -numeric-sort option.

      Reverse Sort Order

      The -r or -reverse option sorts in reverse order.

      Sort Field

      By default, sort uses the first field as its sort field. You can specify another field with the -k field or -key=field option. (The field can be two numbered fields separated by commas, to sort on multiple fields.)

      As an example, suppose you wanted to sort Listing 1.1 by first name. You could do so like this:

      The sort command supports a large number of additional options, many of them quite exotic. Consult sort's man page for details.

      Breaking a File into Pieces with split

      The split command can split a file into two or more files. Unlike most of the text-manipulation commands described in this chapter, this command requires you to enter an output filename or, more precisely, an output filename prefix, to which is added an alphabetic code. You must also normally specify how large you want the individual files to be:

      Split by Bytes

      The -b size or -bytes=size option breaks the input file into pieces of size bytes. This option can have the usually undesirable consequence of splitting the file mid-line.

      Split by Bytes in Line-Sized Chunks

      You can break a file into files of no more than a specified size without breaking lines across files by using the -C=size or -line-bytes=size option. (Lines will still be broken across files if the line length is greater than size.)

      Split by Number of Lines

      The -l lines or -lines=lines option splits the file into chunks with no more than the specified number of lines.

      As an example, consider breaking Listing 1.1 into two parts by number of lines:

      The result is two files, numbersaa and numbersab, which together hold the original contents of listing1.1.txt.

      If you don't specify any defaults (as in split listing1.1.txt), the result is output files split into 1,000-line chunks, with names beginning with x (xaa, xab, and so on). If you don't specify an input filename, split uses standard input.

      Translating Characters with tr

      The tr command changes individual characters from standard input. Its syntax is as follows:

      You specify the characters you want replaced in a group (SET1) and the characters with which you want them to be replaced as a second group (SET2). Each character in SET1 is replaced with the one at the equivalent position in SET2. Here's an example using Listing 1.1:

      The tr command relies on standard input, which is the reason for the input redirection (<) in this example. This is the only way to pass the command a file.

      This example translates some, but not all, of the uppercase characters to lowercase. Note that SET2 in this example was shorter than SET1. The result is that tr substitutes the last available letter from SET2 for the missing letters. In this example, the J in Jones became a c. The -t or -truncate-set1 option causes tr to truncate SET1 to the size of SET2 instead.

      Another tr option is -d, which causes the program to delete the characters from SET1. When using -d, you omit SET2 entirely.

      The tr command also accepts a number of shortcuts, such as [:alnum:] (all numbers and letters), [:upper:] (all uppercase letters), [:lower:] (all lowercase letters), and [:digit:] (all digits). You can specify a range of characters by separating them with dashes (-), as in A-M for characters between A and M, inclusive. Consult tr's man page for a complete list of these shortcuts.

      Converting Spaces to Tabs with unexpand

      The unexpand command is the logical opposite of expand; it converts multiple spaces to tabs. This can help compress the size of files that contain many spaces and can be helpful if a file is to be processed by a utility that expects tabs in certain locations.

      Like expand, unexpand accepts the -t num or -tabs=num option, which sets the tab spacing to once every num characters. If you omit this option, unexpand assumes a tab stop every eight characters.

      Deleting Duplicate Lines with uniq

      The uniq command removes duplicate lines. It's most likely to be useful if you've sorted a file and don't want duplicate items. For instance, suppose you want to summarize Shakespeare's vocabulary. You might create a file with all of the Bard's works, one word per line. You can then sort this file using sort and pass it through uniq. Using a shorter example file containing the text to be or not to be, that is the question (one word per line), the result looks like this:

      Note that the words to and be, which appeared in the original file twice, appear only once in the uniq-processed version.

      File-Formatting Commands

      The next three commands —fmt, nl, and pr– reformat the text in a file. The first of these is designed to reformat text files, such as when a program's README documentation file uses lines that are too long for your display. The nl command numbers the lines of a file, which can be helpful in referring to lines in documentation or correspondence. Finally, pr is a print-processing tool; it formats a document in pages suitable for printing.

      Reformatting Paragraphs with fmt

      Sometimes text files arrive with outrageously long line lengths, irregular line lengths, or other problems. Depending on the difficulty, you may be able to cope simply by using an appropriate text editor or viewer to read the file. If you want to clean up the file a bit, though, you can do so with fmt. If called with no options (other than the input filename, if you're not having it work on standard input), the program attempts to clean up paragraphs, which it assumes are delimited by two or more blank lines or by changes in indentation. The new paragraph formatting defaults to paragraphs that are no more than 75 characters wide. You can change this with