神刀安全网

Transposing rows and columns: 3 methods

Swapping the rows and columns in a table on the command line would seem to be a fairly easy thing to do. For example, with this tab-separated table (called ‘tabbed’) as input:

Transposing rows and columns: 3 methods

get this tab-separated table (‘swapped’) as output:

Transposing rows and columns: 3 methods

Datamash

Transposing rows and columns is indeed easy if you have the GNU datamash utility on your system:

Transposing rows and columns: 3 methods

The default separator for columns in datamash is the tab character, so if your table is comma-separated (‘commas’ table in the following example), you need to specify that separator with a -t option:

Transposing rows and columns: 3 methods

Note that datamash needs special instructions if there are blanks anywhere in the table — more on this later.

AWK

The mighty text-processing program AWK can also do transposing, but the only commands I’ve seen are far from simple. Here’s a command used by ‘fedorqui’ in this Stack Overflow post . It puts the whole table in an array before printing the columns as rows:

Transposing rows and columns: 3 methods

AWK’s default column separator is whitespace (including the tab character), so if the separator is a comma you need to specify that:

Transposing rows and columns: 3 methods

DIY

Let’s build something from simple shell tools to do this transposing job. We can use the cut command to cut out the first column in the table:

Transposing rows and columns: 3 methods

and the paste command with its -s option (‘single line’) to make that column into a row:

Transposing rows and columns: 3 methods

We can loop through all 6 columns in the table with a ‘for’ loop, doing that cut-and-paste job on each:

Transposing rows and columns: 3 methods

The cut and paste commands both use the tab character as default separator, so if the table is comma-separated we need to specify that separator for each command with the -d option:

Transposing rows and columns: 3 methods

Scripting

That ‘for’ loop is fine and good, but notice that it specifies the number of columns, 6, in the table. What if we don’t know the number of columns, or can’t be bothered counting them?

We can find the number of columns (‘numc’) by adding 1 to the number of separators with BASH arithmetic. The number can be found with grep and *wc** from the first line of the table, like this for ‘commas’:

Transposing rows and columns: 3 methods

The variable ‘numc’ can then be fed into the ‘for’ loop:

Transposing rows and columns: 3 methods

These two steps would go nicely in a shell script, here called ‘trans’. I’ll feed the name of the table to be transposed into the script as the the first argument ( $1 ) for the ‘trans’ command:

#!/bin/bash  numc=$(($(head -n 1 "$1" | grep -o , | wc -l)+1)) for ((i=1; i<="$numc"; i++)) do cut -d,  -f"$i" "$1" | paste -s -d, done
Transposing rows and columns: 3 methods

What if the separator isn’t a comma? No problem. We can feed the separator to a modified ‘trans’ as a second argument, $2 :

#!/bin/bash  numc=$(($(head -n 1 "$1" | grep -o "$2" | wc -l)+1)) for ((i=1; i<="$numc"; i++)) do cut -d "$2" -f"$i" "$1" | paste -s -d "$2" done

Note that in the improved ‘trans’, a tab character would be fed in as a quoted variable ( $’/t’ ):

Transposing rows and columns: 3 methods

Time trials

Three methods for transposing rows and columns, but which is fastest? To do a test, I’ll use a tab-separated data table called ‘table1’ with 15 columns and 10004 rows. Not a huge table, but big enough to detect speed differences in the methods. Because there’s whitespace within data items in ‘table1’, I’ll need to specify a tab character as separator for AWK.

And here’s the time test:

Transposing rows and columns: 3 methods

Looks like our shell script ‘trans’ is the winner. But is its output correct? Yes:

Transposing rows and columns: 3 methods

Missing data

If there are blank entries in the table, as there are in ‘table2’, datamash complains:

Transposing rows and columns: 3 methods

The solution in datamash version 1.0.6 (the one I got from the Debian 8 repository) is to use the –no-strict option. This is supposed to replace each blank entry with a string, and the default string is ‘N/A’. Unfortunately, this doesn’t work:

Transposing rows and columns: 3 methods

This was a known bug in version 1.0.6 and is said to have been fixed in the next datamash version. Anyway, both AWK and ‘trans’ will handle blank entries without problems, so let’s do a time test again:

Transposing rows and columns: 3 methods

And once again, AWK and ‘trans’ give the same output:

Transposing rows and columns: 3 methods

The shell script

I wrote ‘trans’ as throwaway code for transposing my own tables. It could be extended so that argument 2 specifies the input column separator, and argument 3 a different column separator for the output.

There are other ways to swap rows and columns in tables, for example using Perl, but for the time being I’m happy with the script!

About the Author

Bob Mesibov is Tasmanian, retired and a keen Linux tinkerer.

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Transposing rows and columns: 3 methods

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址