shell scripts: How to partition a file into columns?
I have a file that looks like this :
t1 ATGCGTCCGTAGCAG
t2 ATGCCTAGCTAGGCT
That is, a name is followed by a (DNA) sequence. I want to divide the sequence. For example, the above sequence has a length of 15 and I want to split it into 3 parts with length 5. I want three new files:
File 1
t1 ATGCG
t2 ATGCC
File 2
t1 TCCGT
t2 TAGCT
File 3
t1 AGCAG
t2 AGGCT
I’m trying to write a shell script to do this. One way is to use sed '$Nq; d'
write a for loop to get the Nth line of the file, then use the cut -c
command to cut it and save it to a variable. Then, using the cut, head, tail
commands and a variable, I’ll implement it. However, I wonder if there is a better way (tidy and speed) to do this.
PS: The actual file will contain 1-10000 lines, each sequence is 10-50k in length, I would divide the sequence into sequences with length 1-2k.
Solution
The following uses substring notation (i.e. string:start:length) to extract the output of the request:
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo ${line:0:10} >> file1
echo ${line:0:5} ${line:10:5} >> file2
echo ${line:0:5} ${line:15:5} >> file3
done < "$1"
Save it to myscript.sh and run: ./myscript.sh <input-file>