Linux - Merge two files without pseudo-duplicates

Merge two files without pseudo-duplicates… here is a solution to the problem.

Merge two files without pseudo-duplicates

I have two text files file1.txt and file2.txt, both of which contain lines of word like this:
fare word phrasing world world
and

fare text uncial Tree-grown world phrasing
Or something like that. A word, I mean a string of letters a-z that may have accent marks, and the symbol -. My question is, how do I create a third file output from the linux command line (using awk, sed, etc.) .txt two files that meet the following three conditions:

If the same word appears in two files, the third output.txt contains it only once.

If a hyphenated version of a word in a file (e.g. fa-re in

file2.txt) appears in another file, only the hyphenated version remains in output.txt (e.g., only fa-re is kept in our case).

Therefore, output.txt should contain the following text:
fare word phrasing world world text uncial

======================================================================================================================================

I have modified the file and given the output file.
I will try to manually make sure there are no words with different hyphens (e.g. wod-ed and wo-ded).

Solution

Another awk:

!( $1 in a) || $1 ~ "-" { 
    key = value = $1; gsub("-","",key); a[key] = value 
}
END { for (i in a) print a[i] }

$ awk -f npr.awk file1.txt file2.txt
text
word-ed
uncial
wor
wo-ded
word
fa-re

Linux – Merge two files without pseudo-duplicates

Merge two files without pseudo-duplicates

Solution

Related Problems and Solutions