Linux – shell scripts: How to partition a file into columns?

shell scripts: How to partition a file into columns?… here is a solution to the problem.

shell scripts: How to partition a file into columns?

I have a file that looks like this :

t1   ATGCGTCCGTAGCAG
t2   ATGCCTAGCTAGGCT

That is, a name is followed by a (DNA) sequence. I want to divide the sequence. For example, the above sequence has a length of 15 and I want to split it into 3 parts with length 5. I want three new files:

File 1

t1   ATGCG
t2   ATGCC

File 2

t1   TCCGT
t2   TAGCT

File 3

t1   AGCAG
t2   AGGCT

I’m trying to write a shell script to do this. One way is to use sed '$Nq; d' write a for loop to get the Nth line of the file, then use the cut -c command to cut it and save it to a variable. Then, using the cut, head, tail commands and a variable, I’ll implement it. However, I wonder if there is a better way (tidy and speed) to do this.

PS: The actual file will contain 1-10000 lines, each sequence is 10-50k in length, I would divide the sequence into sequences with length 1-2k.

Solution

The following uses substring notation (i.e. string:start:length) to extract the output of the request:

#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
    echo ${line:0:10} >> file1
    echo ${line:0:5}  ${line:10:5} >> file2
    echo ${line:0:5}  ${line:15:5} >> file3
done < "$1"

Save it to myscript.sh and run: ./myscript.sh <input-file>

Related Problems and Solutions