How to sort or rearrange numbers from multiple column into multiple row [fixed into 4 columns]?

How to sort or rearrange numbers from multiple column into multiple row [fixed into 4 columns]?

I have 1 text file, which is test1.txt.

text1.txt contain as following:

Input:

##[A1] [B1] [T1] [V1] [T2] [V2] [T3] [V3] [T4] [V4]## --> headers 1 1000 0 100 10 200 20 300 30 400 40 500 50 600 60 700 70 800 1010 0 101 10 201 20 301 30 401 40 501 50 601 2 1000 0 110 15 210 25 310 35 410 45 510 55 610 65 710 1010 0 150 10 250 20 350 30 450 40 550

Condition:
A1 and B1 -> for each A1 + (B1 + [Tn + Vn])
A1 should be in 1 column.
B1 should be in 1 column.
T1,T2,T3 and T4 should be in 1 column.
V1,V2,V3 and V4 should be in 1 column.

How do I sort it become like below?

Desire Output:

## A1 B1 Tn Vn ## --> headers 1 1000 0 100 10 200 20 300 30 400 40 500 50 600 60 700 70 800 1010 0 101 10 201 20 301 30 401 40 501 50 601 2 1000 0 110 15 210 25 310 35 410 45 510 55 610 65 710 1010 0 150 10 250 20 350 30 450 40 550

Here is my current code:

First Attempt:

Input

cat test1.txt | awk ' { a=$1 b=$2 } { for(i=1; i<=5; i=i+1) { t=substr($0,11+i*10,5) v=substr($0,16+i*10,5) if( t ~ /^ +[0-9]+$/ || t ~ /^[0-9]+$/ || t ~ /^ +[0-9]+ +$/ ){ printf "%7s %7d %8d %8d n",a,b,t,v } }}' | less

Output:

1 1000 400 0 40 500 800 0 1010 0 401 0 2 1000 410 0 1010 0 450 0

I'm trying using simple awk command, but still can't get the result.
Can anyone help me on this?

Thanks,

Am

Also, do you tags 'linux, perl, awk, sed` mean it doesn't which is used? You will only need one for the solution.
– David C. Rankin
Jul 3 at 5:38

Any text processing languages can be used, but prefer in awk, sed, perl that commonly used in Linux terminal.
– Amz
Jul 3 at 5:41

Do all rows have consistent delimiters, and what are they? So, the second raw after the header (which has no 1 1000), does it have two empty fields delimited with the same thing as the first two fields in the first row?
– zdim
Jul 3 at 5:52

1 1000

yes can have consistent delimiters like whitespaces, as long it output only 4 columns.
– Amz
Jul 3 at 5:59

What makes this tricky is identifying when you've got a 'first row' (with 1:1000 for example in columns 1 and 2) versus a second row (with no value in columns 1 or 2) or a third row (where you've no value in column 1). The third row may be easiest; the number of values is odd. Is it safe to assume that 10 fields means a 'first row'? Or could you have just 1, 2 or 3 pairs (after the key) on a first row?
– Jonathan Leffler
Jul 3 at 6:02

4 Answers
4

This is a rather tricky problem that can be handled a number of ways. Whether bash, perl or awk, you will need to handle to number of fields in some semi-generic way instead of just hardcoding values for your example.

bash

perl

awk

Using bash, so long as you can rely on an even-number of fields in all lines (except for the lines with the sole initial value (e.g. 1010), you can accommodate the number of fields is a reasonably generic way. For the lines with 1, 2, etc.. you know your initial output will contain 4-fields. For lines with 1010, etc.. you know the output will contain an initial 3-fields. For the remaining values you are simply outputting pairs.

1010

1, 2

4-fields

1010

3-fields

The tricky part is handling the alignment. Here is where printf which allows you to set the field-width with a parameter using the form "%*s" where the conversion specifier expects the next parameter to be an integer value specifying the field-width followed by a parameter for the string conversion itself. It takes a little gymnastics, but you could do something like the following in bash itself:

printf

"%*s"

integer

(note: edit to match your output header format)

#!/bin/bash declare -i nfields wd=6 ## total no. fields, printf field-width modifier while read -r line; do ## read each line (preserve for header line) arr=($line) ## separate into array first=${arr[0]} ## check for '#' in first line for header if [ "${first:0:1}" = '#' ]; then nfields=$((${#arr[@]} - 2)) ## no. fields in header printf "## A1 B1 Tn Vn ## --> headersn" ## new header continue fi fields=${#arr[@]} ## fields in line case "$fields" in $nfields ) ## fields -eq nfiles? cnt=4 ## handle 1st 4 values in line printf " " for ((i=0; i < cnt; i++)); do if [ "$i" -eq '2' ]; then printf "%*s" "5" "${arr[i]}" else printf "%*s" "$wd" "${arr[i]}" fi done echo for ((i = cnt; i < $fields; i += 2)); do ## handle rest printf "%*s%*s%*sn" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}" done ;; $((nfields - 1)) ) ## one less than nfields cnt=3 ## handle 1st 3 values printf " %*s%*s" "$wd" " " for ((i=0; i < cnt; i++)); do if [ "$i" -eq '1' ]; then printf "%*s" "5" "${arr[i]}" else printf "%*s" "$wd" "${arr[i]}" fi done echo for ((i = cnt; i < $fields; i += 2)); do ## handle rest if [ "$i" -eq '0' ]; then printf "%*s%*s%*sn" "$((wd+1))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}" else printf "%*s%*s%*sn" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}" fi done ;; * ) ## all other lines format as pairs for ((i = 0; i < $fields; i += 2)); do printf "%*s%*s%*sn" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}" done ;; esac done

Rather than reading from a file, just use redirection to redirect the input file to your script (if you want to just provide a filename, then redirect the file to feed the output while read... loop)

while read...

Example Use/Output

$ bash text1format.sh <dat/text1.txt ## A1 B1 Tn Vn ## --> headers 1 1000 0 100 10 200 20 300 30 400 40 500 50 600 60 700 70 800 1010 0 101 10 201 20 301 30 401 40 501 50 601 2 1000 0 110 15 210 25 310 35 410 45 510 55 610 65 710 1010 0 150 10 250 20 350 30 450 40 550

As between awk and bash, awk will generally be faster, but here with formatted output, it may be closer than usual. Look things over and let me know if you have questions.

awk

bash

awk

Hi David, thank you very much for your advice and answer. We will try these code to see whether it can solve the real problem.
– Amz
Jul 3 at 6:43

"There is very little you can't do with bash" There is also very little that you should do with it. It is also possible to persuade a dog to walk on its hind legs. This is an abomination.
– Borodin
Jul 3 at 10:33

Please don't pressure people to select an answer unless it really seems that they have overlooked it. 48 hours is a reasonable period for assessment.
– Borodin
Jul 3 at 10:34

Don't do this. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why. wrt the statment at the end As between awk and bash, awk will generally be faster, but here with formatted output, it may be closer than usual - no, that shell script will be several orders of magnitude slower than an awk script for this task. For a moderately large file we're literally talking hours vs seconds.
– Ed Morton
Jul 3 at 12:57

As between awk and bash, awk will generally be faster, but here with formatted output, it may be closer than usual

You call the read and echo utilities within the loop. The only mention of bash in my comment was your statement which I quoted. You are correct, people can always debate readability but your shell script is 3-4 times longer than my awk script and contains several non-obvious array manipulation constructs plus nested conditionals within loops within switches within other loops, repeated/redundant code, etc. so I personally don't see any argument for the shell script being clearer than the awk script in this particular case but YMMV I suppose.
– Ed Morton
Jul 3 at 13:08

read

echo

Unlike what is stated elsewhere, there's nothing tricky about this at all, you're just using fixed width fields in your input instead of char/string separated fields.

With GNU awk for FIELDWIDTHS to handle fixed width fields it really couldn't be much simpler:

$ cat tst.awk BEGIN { # define the width of the input and output fields FIELDWIDTHS = "2 4 5 5 6 5 6 5 6 5 6 99" ofmt = "%2s%5s%6s%5s%6s%sn" } { # strip leading/trailing blanks and square brackets from every field for (i=1; i<=NF; i++) { gsub(/^[[s]+|s]+$/,"",$i) } } NR==1 { # print the header line printf ofmt, $1, $2, $3, "Tn", "Vn", " "$NF next } { # print every other line for (i=4; i<NF; i+=2) { printf ofmt, $1, $2, $3, $i, $(i+1), "" $1 = $2 = $3 = "" } }

$ awk -f tst.awk file ## A1 B1 Tn Vn ## --> headers 1 1000 0 100 10 200 20 300 30 400 40 500 50 600 60 700 70 800 1010 0 101 10 201 20 301 30 401 40 501 50 601 2 1000 0 110 15 210 25 310 35 410 45 510 55 610 65 710 1010 0 150 10 250 20 350 30 450 40 550

With other awks you'd use a while() { substr() } loop instead of FIELDWIDTHS so it'd be a couple more lines of code but still trivial.

while() { substr() }

FIELDWIDTHS

The above will be orders of magnitude faster than an equivalent shell script. See https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

++ve for nice code, cheers.
– RavinderSingh13
Jul 3 at 13:53

Hi Ed, thanks for your help and advise. These code looks really good and useful. I have used it on my real sample and it goes well.
– Amz
Jul 5 at 5:31

This isn't easy because it is hard to identify when you have the different styles of row — those with values in both column 1 and column 2, those with no value in column 1 and a value in column 2, and those no value in column 1 or 2. A first step is to make this easier — sed to the rescue:

sed

$ sed 's/[[:space:]]{1,}$// s/^....../&|/ s/|....../&|/ :a s/|( *[0-9][0-9]* )( *[^|])/|1|2/ t a' data 1 | 1000 | 0 | 100 | 10 | 200 | 20 | 300 | 30 | 400 | | 40 | 500 | 50 | 600 | 60 | 700 | 70 | 800 | 1010 | 0 | 101 | 10 | 201 | 20 | 301 | 30 | 401 | | 40 | 501 | 50 | 601 2 | 1000 | 0 | 110 | 15 | 210 | 25 | 310 | 35 | 410 | | 45 | 510 | 55 | 610 | 65 | 710 | 1010 | 0 | 150 | 10 | 250 | 20 | 350 | 30 | 450 | | 40 | 550 $

The first line removes any trailing white space, to avoid confusion. The next two expressions handle the fixed-width columns 1 and 2 (6 characters each). The next line creates a label a; the substitute finds a pipe |, some spaces, some digits, a space, and some trailing material which doesn't include a pipe; and inserts a pipe in the middle. The t a jumps back to the label if a substitution was done.

a

|

t a

With that in place, it becomes easy to manage awk with a field separator of |.
This is verbose, but seems to do the trick:

awk

|

awk -F '|' ' $1 > 0 { printf "%5d %4d %3d %3dn", $1, $2, $3, $4 for (i = 5; i <= NF; i += 2) { printf "%5s %4s %3d %3dn", "", "", $i, $(i+1) } next } $2 > 0 { printf "%5s %4d %3d %3dn", "", $2, $3, $4 for (i = 5; i <= NF; i += 2) { printf "%5s %4s %3d %3dn", "", "", $i, $(i+1) } next } { for (i = 3; i <= NF; i += 2) { printf "%5s %4s %3d %3dn", "", "", $i, $(i+1) } next }'

Output:

1 1000 0 100 10 200 20 300 30 400 40 500 50 600 60 700 70 800 1010 0 101 10 201 20 301 30 401 40 501 50 601 2 1000 0 110 15 210 25 310 35 410 45 510 55 610 65 710 1010 0 150 10 250 20 350 30 450 40 550

If you need to remove the headings, add 1d; to the start of the sed script.

1d;

sed

You definitely get a bonus for both sed and awk solutions... but 0 0 looks suspicious.
– David C. Rankin
Jul 3 at 6:41

sed

awk

0 0

Hi Jonathan, you're awsome and thank you very much for your advice and answer. We will try these code to see whether it can solve the real problem.
– Amz
Jul 3 at 6:42

@DavidC.Rankin: Good spot on the 0 0 — the problem was partly some trailing blanks, and partly not starting at field 3 in the 'nothing in column 1 or column 2' lines (starting at field 5 effectively skipped the first values). I think the output is now correct — as expected. (When I take what my code produces, and what is listed in the question, and run diff -b on the two files, I get no output, so there is no significant difference. Phew!)
– Jonathan Leffler
Jul 3 at 6:58

0 0

diff -b

Yes, this was one that will leave you scratching your head in a couple of different directions, I had to go add more loops to make bash match the posted output format. (selfishly -- it was kinda fun...)
– David C. Rankin
Jul 3 at 7:01

This gonna take some time. I will let you know once I have successfully crack it.
– Amz
Jul 3 at 7:06

This might work for you (GNU sed):

sed -r '1d;s/^(.{11}).{11}/&n1/;s/^((.{5}).*n)2/1 /;s/^(.{5}(.{6}).*n.{5})2/1 /;/S/P;D' file

Delete the first line (if the header is needed see below). The key fields occupy the first 11 (the first key is 5 characters and the second 6) characters and the data fields occupy the next 11. Insert a newline and the key fields before each pair of data fields. Compare the keys on adjacent lines and replace by spaces if they are duplicated. Do not print any blank lines.

If the header is needed, use the following:

sed -r '1{s/[[^]]+]s*//5g;y// /;s/1/n/3g;s/B/ B/;G;b};s/^(.{11}).{11}/&n1/;s/^((.{5}).*n)2/1 /;s/^(.{5}(.{6}).*n.{5})2/1 /;/S/P;D' file

This does additional formatting on the first line to remove superfluous headings, 's, replace 1's by n, add an additional space for alignment and a following empty line.

1

n

Further more. By utilising the second line of the input file as a template for the data, a sed script can be created that does not have any hard coded values:

hard coded

sed -r '2!d;s/s*S*//3g;s/.>/&n/;h;s/[^n]/./g;G;s/[^n.]/ /g;s#(.*)n(.*)n(.*)n(.*)#1d;s/^(12)12/&n1/;s/^((1).*n)2/13/;s/^(1(2).*n1)2/14/;/S/P;D#' file | sed -r -f - file

The script created from the template is piped into a second invocation of the sed as a file and run against the original file to produce the required output.

Likewise the headers may be formatted if need be as so:

sed -r '2!d;s/s*S*//3g;s/.>/&n/;h;s/[^n]/./g;G;s/[^n.]/ /g;s#(.*)n(.*)n(.*)n(.*)#s/^(12)12/&n1/;s/^((1).*n)2/13/;s/^(1(2).*n1)2/14/;/S/P;D#' file | sed -r -e '1{s/[[^]]+]s*//5g;y// /;s/1/n/3g;s/B/ B/;G;b}' -f - file

By extracting the first four fields from the second line of the input file, Four variables can be made. Two regexp and two values. These variables can be used to build the sed script.

N.B. The sed script is created from strings extracted from the template and the variables produced are also strings so they can be concatenated to produce further new regexp's and new values etc etc

Hi potong. This works quite well too, but it is in simplified version. I can use these on other problem too. Thank you very much :)
– Amz
Jul 3 at 8:36

Can you explain a little bit on your script?
– Amz
Jul 3 at 8:42

@Amz See my added edit. For a breakdown of the first script study the crux of the solution sed -r '1d;s/^(.{11}).{11}/&n1/;P;D' file
– potong
Jul 3 at 10:54

sed -r '1d;s/^(.{11}).{11}/&n1/;P;D' file

sed -r '2!d;s/s*S*//3g;s/.>/&n/;h;s/[^n]/./g;G;s/[^n.]/ /g;s#(.*)n(.*)n(.*)n(.*)#s/^(12)12/&n1/;s/^((1).*n)2/13/;s/^(1(2).*n1)2/14/;/S/P;D#' file | sed -r -e '1{s/[[^]]+]s*//5g;y//  /;s/1/n/3g;s/B/ B/;G;b}' -f - file

. Yup, that's definitely what you want to find in your code when you go back to enhance it in 6 months :-).
– Ed Morton
Jul 3 at 13:27

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

E0qytq6sF COVYERtpEtGwlOr3M6GlYiaxVUN,mE4jDL,KWJkRhljr7eLRm4UGZT JRVPMDc6a

搜尋此網誌

Fjhtyj