Linux - Splits a text file into parts based on the pattern taken from the text file

Splits a text file into parts based on the pattern taken from the text file… here is a solution to the problem.

Splits a text file into parts based on the pattern taken from the text file

I have a lot of text files with fixed width data, for example:

$ head model-q-060.txt 
% x                      y                        
15.0                     0.0                      
15.026087                -1.0                     
15.052174                -2.0                     
15.07826                 -3.0                     
15.104348                -4.0                     
15.130435                -5.0                     
15.156522                -6.0                     
15.182609                -6.9999995               
15.208695                -8.0

The data contains 3 or 4 runs of the simulation, all stored in a text file with no separators between runs. In other words, there are no blank lines or anything, e.g. if there are only 3 “records” per run, 3 runs look like this:

$ head model-q-060.txt 
% x                      y                        
15.0                     0.0                      
15.026087                -1.0                     
15.052174                -2.0                     
15.0                     0.0                      
15.038486                -1.0                     
15.066712                -2.0                     
15.0                     0.0                      
15.041089                -1.0                     
15.087612                -2.0

For those interested, this is a COMSOL Multiphysics output file. Visually you can see where the new run data starts because the first x value is a duplicate (in fact, the entire second row may be the same for all of this data). So I need to first open the file and get this x value, save it, and then use it as a pattern that matches awk or csplit. I’m working on this!

csplit will do the job:

$ csplit -z -f 'temp' -b '%02d.txt' model-q-060.txt /^15\.0\\s/ {*}

But I have to know the pattern of splitting. The problem is similar, but each of my text files may have a different matching pattern: Split files based on file content and pattern matching .

Root.

Solution

Here’s a simple awk script that does what you want:

BEGIN { fn=0 }
NR==1 { next }
NR==2 { delim=$1 }
$1 == delim {
    f=sprintf("test%02d.txt",fn++);
    print "Creating " f
}

{ print $0 > f }

Initialize the output file number
Ignore the first line
Extract the separator of the second row
For each input line whose first mark matches the delimiter, set the output file name
For all rows, the current output file is written

Linux – Splits a text file into parts based on the pattern taken from the text file

Splits a text file into parts based on the pattern taken from the text file

Solution

Related Problems and Solutions