Skip to main content

Pattern Matching Exclude Duplicate Characters [Resolved]

Is there a regular expression for the following that matches characters in a character set but only once? In other words, once a character is found, remove it from the set.

If grep cannot do this, is there a built-in utility which can?

Example:

Characters to match only once:   spine

Input:

spine
spines
spin
pine
seep 
spins

Output:

spine
spin
pine

EDIT:
There are many ways to achieve this output (one example below), but I'm looking for a way to do this without having to customize the command for each pattern I want to match.

grep '[spine]' input_file | grep -v 's.*s' | ... | grep -v 'e.*e'


Question Credit: Steven
Question Reference
Asked March 23, 2019
Posted Under: Unix Linux
7 views
2 Answers

With regular expressions in the mathematical sense, it's possible, but the size of the regular expressions grows exponentially relative to the size of the alphabet, so it isn't practical.

There's a simple way with negation and backreferences.

grep '[spine]' | grep -Ev '([spine]).*\1'

The first grep selects lines that contain at least one of einps; the second grep rejects lines that contain more than one of any (e.g. allowing spinal tap and spend but not foobar or see).


credit: Gilles
Answered March 23, 2019
Your Answer