Richard Blum

CompTIA Linux+ Powered by Linux Professional Institute Study Guide


Скачать книгу

add options to have cat perform minor modifications to the files as it combines them:

      Display Line Ends

      If you want to see where lines end, add the -E or -show-ends option. The result is a dollar sign ($) at the end of each line.

      Number Lines

      The -n or -number option adds line numbers to the beginning of every line. The -b or -number-nonblank option is similar, but it numbers only lines that contain text.

      Minimize Blank Lines

      The -s or -squeeze-blank option compresses groups of blank lines down to a single blank line.

      Display Special Characters

      The -T or -show-tabs option displays tab characters as ∧I. The -v or -show-nonprinting option displays most control and other special characters using carat () and M- notations.

      The tac command is similar to cat, but it reverses the order of lines in the output:

      Joining Files by Field with join

      The join command combines two files by matching the contents of specified fields within the files. Fields are typically space-separated entries on a line. However, you can specify another character as the field separator with the -t char option, where char is the character you want to use. You can cause join to ignore case when performing comparisons by using the -i option.

The effect of join may best be understood through a demonstration. Consider Listing 1.1 and Listing 1.2, which contain data on telephone numbers. Listing 1.1 shows the names associated with those numbers, and Listing 1.2 shows whether the numbers are listed or unlisted.

Listing 1.1: Demonstration file containing telephone numbers and names

Listing 1.2: Demonstration file containing telephone number listing status

      You can display the contents of both files using join:

      By default, join uses the first field as the one to match across files. Because Listing 1.1 and Listing 1.2 both place the phone number in this field, it's the key field in the output. You can specify another field by using the -1 or -2 option to indicate the join field for the first or second file, respectively. For example, type join -1 3 -2 2 cameras.txt lenses.txt to join using the third field in cameras.txt and the second field in lenses.txt. The -o FORMAT option enables more complex specifications for the output file's format. You can consult the man page for join for even more details.

      The join command can be used at the core of a set of simple customized database-manipulation tools using Linux text-manipulation commands. It's very limited by itself, though. For instance, it requires its two files to have the same ordering of lines. (You can use the sort command to ensure this is so.)

      Merging Lines with paste

      The paste command merges files line by line, separating the lines from each file with tabs, as shown in the following example, using Listings 1.1 and 1.2 again:

      You can use paste to combine data from files that aren't keyed with fields suitable for use by join. Of course, to be meaningful, the files' line numbers must be exactly equivalent. Alternatively, you can use paste as a quick way to create a two-column output of textual data; however, the alignment of the second column may not be exact if the first column's line lengths aren't exactly even.

      File-Transforming Commands

      Many of Linux's text-manipulation commands are aimed at transforming the contents of files. These commands don't actually change files' contents but instead send the changed files' contents to standard output. You can then pipe this output to another command or redirect it into a new file.

      An important file-transforming command is sed. This command is very complex and is covered later in this chapter in “Using sed.”

      Converting Tabs to Spaces with expand

      Sometimes text files contain tabs but programs that need to process the files don't cope well with tabs. In such a case, you may want to convert tabs to spaces. The expand command does this.

      By default, expand assumes a tab stop every eight characters. You can change this spacing with the -t num or -tabs=num option, where num is the tab spacing value.

      Displaying Files in Octal with od

      Some files aren't easily displayed in ASCII. For example, most graphics files, audio files, and so on use non-ASCII characters that look like gibberish. Worse, these characters can do strange things to your display if you try to view such a file with cat or a similar tool. For instance, your font may change, or your console may begin beeping uncontrollably. Nonetheless, you may sometimes want to display such files, particularly if you want to investigate the structure of a data file.

      In such a case, od (whose name stands for octal dump) can help. It displays a file in an unambiguous format – octal (base 8) numbers by default. For instance, consider Listing 1.2 as parsed by od:

      The first field on each line is an index into the file in octal. For instance, the second line begins at octal 20 (16 in base 10) bytes into the file. The remaining numbers on each line represent the bytes in the file. This type of output can be difficult to interpret unless you're well versed in octal notation and perhaps in the ASCII code.

      Although od is nominally a tool for generating octal output, it can generate many other output formats, such as hexadecimal (base 16), decimal (base 10), and even ASCII with escaped control characters. Consult the man page for od for details on creating these variants.

      Sorting Files with sort

      Sometimes you'll create an output file that you want sorted. To do so, you can use a command that's called, appropriately enough, sort. This command can sort in several ways, including the following:

      Ignore Case

      Ordinarily, sort sorts by ASCII value, which differentiates between uppercase and lowercase letters. The -f or -ignore-case option causes sort to ignore case.

      Month Sort

      The -M or -month-sort option causes the program to sort by three-letter month abbreviation (JAN through DEC).

      Numeric Sort

      You can sort by number by using the -n or -numeric-sort option.

      Reverse Sort Order

      The