Splits a text file into parts based on the pattern taken from the text file
I have a lot of text files with fixed width data, for example:
$ head model-q-060.txt
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.07826 -3.0
15.104348 -4.0
15.130435 -5.0
15.156522 -6.0
15.182609 -6.9999995
15.208695 -8.0
The data contains 3 or 4 runs of the simulation, all stored in a text file with no separators between runs. In other words, there are no blank lines or anything, e.g. if there are only 3 “records” per run, 3 runs look like this:
$ head model-q-060.txt
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.0 0.0
15.038486 -1.0
15.066712 -2.0
15.0 0.0
15.041089 -1.0
15.087612 -2.0
For those interested, this is a COMSOL Multiphysics output file. Visually you can see where the new run data starts because the first x value is a duplicate (in fact, the entire second row may be the same for all of this data). So I need to first open the file and get this x value, save it, and then use it as a pattern that matches awk or csplit. I’m working on this!
csplit will do the job:
$ csplit -z -f 'temp' -b '%02d.txt' model-q-060.txt /^15\.0\\s/ {*}
But I have to know the pattern of splitting. The problem is similar, but each of my text files may have a different matching pattern: Split files based on file content and pattern matching .
Root.
Solution
Here’s a simple awk script that does what you want:
BEGIN { fn=0 }
NR==1 { next }
NR==2 { delim=$1 }
$1 == delim {
f=sprintf("test%02d.txt",fn++);
print "Creating " f
}
{ print $0 > f }
- Initialize the output file number
- Ignore the first line
- Extract the separator of the second row
- For each input line whose first mark matches the delimiter, set the output file name
- For all rows, the current output file is written