4.3 Sort lines in file (sort)



Format

sort [option] file-name


Options

-r
sort by reverse order
-k n
sort by nth column
-n
sort as numerical value


4.3.1 Prepare file

In order to check the function of sort command, let's make the sample file. The file contains names and numbers of three people. The format is 'name, surname, score' from the left.

FILE

$ cat > score↵
tom cruise 85↵
audrey hepburn 70↵
james dean 100↵
(Enter Control-d)
$


4.3.2 Execute sort command

After executing 'sort score', the result is as follows.


Practice: Execute sort command


$ sort score ↵
audrey hepburn 70
james dean 100
tom cruise 85


The first characters in each line are sorted by alphabet order (a -> j -> t).

Practice: Sort in reverse by using -r option


$ sort -r score ↵
tom cruise 85
james dean 100
audrey hepburn 70


The first characters in each line are sorted by alphabet order (t -> j -> a).

4.3.3 Sort data by column n (-k)

In previous section, file is sorted by the first character. That means sort by first-name. How about familyname?
It is possible to specify the column by using -k option. This time, we specify 2, because we sort by family-name on the second column.

Practice: Sort data by the second column


$ sort -k 2 score ↵
tom cruise 85
james dean 100
audrey hepburn 70


Practice: Sort data by the second column in reverse


$ sort -k 2 -r score ↵
audrey hepburn 70
james dean 100
tom cruise 85


Figure4-1
Figure 4-1: Relationship between Text and option -k


They have been sorted by second column, surname (c -> d -> h).

4.3.4 Sort by number

As same as section 'Sort data by column n', let's sort by the column of score(the third column). The command is as follows.

Example

$ sort -k 3 score ↵
james dean 100
audrey hepburn 70
tom cruise 85


The result is a little strange. If it is sorted by descending order, it should be 100 -> 85 -> 70. If it is sorted by ascending order, it should be 75 -> 85 -> 100. The reason is that these three numbers are recognized as characters. The sort command watches only the first character. Therefore as same as sorting alphabets, numbers are also sorted by the first character (1 -> 7 -> 8).

That is called as lexicographic sort. It is ok in case of alphabet, but this lexicographic sort is inappropriate in this case because the third column is score which is numerical. It is necessary to sort as numerical, not character.

Sorting as numbers is called numerical sort, and the sort command has -n option for it. To numerically sort the third column, enter as follows.


Practice: Execution of numerical sort


$ sort -n -k 3 score ↵
audrey hepburn 70
tom cruise 85
james dean 100


Practice: Execution of numerical sort in reverse


$ sort -n -r -k 3 score ↵
james dean 100
tom cruise 85
audrey hepburn 70


Now the score part has been sorted as numeric.


Previous Next