Java – How to separate multiple different words from a string (Java)

How to separate multiple different words from a string (Java)… here is a solution to the problem.

How to separate multiple different words from a string (Java)

I’ve been struggling to figure out how to get an unknown length word from an unknown length string that I’m reading from a file. The words I want to extract from a string always start with “.” Separate. and/or “&”, the entire string is enclosed in quotation marks. For example: “. Word.Characters&Numeric&Letters.Typos&Mistypes。 “I know every one”. “. and “&” and the number of times they appear.

I want to use the word based on whether it is preceded by “.” Delimiters enter words into the array Example[i][j]. or “&”. So the word is contained in the “. “. The ith column of the array will be set and the word linked by “&” will be to the jth row of the array.

The input string can contain a large number of variable words. This means that there may be only one word of interest, or more than a hundred.

I prefer to use arrays to solve this problem. From what I’ve read, regular expressions will be slow, but will work. split() might also work, but I think I have to know in advance which words to look for.

From this string:”. Word.Characters&Numeric&Letters.Typos&Mistypes。 “I hope to get: (don’t worry about rows or columns).

[[Word],[null],[null]],

[

[character], [number], [letter]],

[

[typo], [error], [null]].

from this string”. Alpha.Beta.Zeta&Iota”。 I hope to get:

[[Alpha],[null]],

[[beta],

[null]],

[[Zeta],[Iota]]

//NumerOfPeriods tells me how many word "sections" are in the string
Stor[] is an array that holds the string index locations of "."
for(int i=0; i<NumberOfPeriods; i++)
{
    int length = Stor[i];
    while(Line.charAt(length) != '"')
    {
        length++;
    }
    Example[i] = Line.substring(Stor[i], length);
}
This code can get the words separated by "." but not by "&"

Stor[] is an array that holds all string index locations of '.'
AmpStor[] is an array that holds all string index locations of '&'
int TotalLength = Stor[0];
int InnerLength = 0;
int OuterLength = 0;
while(Line.charAt(TotalLength) != '"')
{
    while(Line.charAt(OuterLength)!='.')
    {
        while(Line.charAt(InnerLength)!='&')
        {
            InnerLength++;
        }
        if(Stor[i] > AmpStor[i])
        {
            Example[i][j] = Line.substring(Stor[i], InnerLength);
        }
        if(Stor[i] < AmpStor[i])
        {
            Example[i][j] = Line.substring(AmpStor[i],InnerLength);
        }
            OuterLength++;
    }
}
Here I run into the issue of indexing into different parts of the array i & j

Solution

That’s how I’m going to solve your problem (it’s completely different from your code, but it works).

First, remove quotation marks and non-word characters at the beginning and end. This can be done using replaceAll:

String Formatted = Line.replaceAll( "(^\"[.&]*)|( [.&]*\"$)", "" );

The regular expression in the first argument will match the double quotes at both ends and the leading and trailing . and &. The method returns a new string with the matching characters removed because the second argument is an empty string (which is replaced with an empty string).

You can now use the split method to split this string at each . You can only define an output array after this call:

String[] StringGroups = Formatted.split( "\\." );
String[][] Elements = new String[StringGroups.length][];

Use an escape backslash (\\) before the dot to indicate that it should split on the . character, as this method takes a regular expression (and only . splits at any non-newline characters).

Now use the same split method to split each string in that array at each & . Add the result directly to your Elements array:

// Loop over the array
int MaxLength = 0;
for( int i = 0; i < StringGroups.length; i ++ ) {
   String StrGroup = StringGroups[ i ];
   String[] Group = StrGroup.split( "&" );
   Elements[ i ] = Group;

 Measure the max length
   if( Group.length > MaxLength ) {
       MaxLength = Group.length;
   }
}

\

\ is not required for input because & only matches the & character. Now you just need to fill your data into an array. The MaxLength variable is used to add null values to the array. If you don’t need them, just remove them.

However, if you want a null value, iterate through your array of elements and copy the current row into a new array:

for( int i = 0; i < Elements.length; i ++ ) {
    String[] Current = Elements[ i ];
    String[] New = new String[ MaxLength ];

 Copy existing values into new array, extra values remain null
    System.arraycopy( Current, 0, New, 0, Current.length );
    Elements[ i ] = New;
}

The Elements array now contains what you want.

The complete executable code is as follows:

public class StringSplitterExample {
    public static void main( String[] args ) {
        test( "\". Word.Characters&Numeric&Letters.Typos&Mistypes.\"" );
        System.out.println();  Line between
        test( "\". Alpha.Beta.Zeta&Iota.\"" );
    }

public static void test( String Line ) {
        String Formatted = Line.replaceAll( "(^\"[.&]*)|( [.&]*\"$)", "" );
        String[] StringGroups = Formatted.split( "\\." );
        String[][] Elements = new String[StringGroups.length][];

 Loop over the array
        int MaxLength = 0;
        for( int i = 0; i < StringGroups.length; i ++ ) {
            String StrGroup = StringGroups[ i ];
            String[] Group = StrGroup.split( "&" );
            Elements[ i ] = Group;

 Measure the max length
            if( Group.length > MaxLength ) {
                MaxLength = Group.length;
            }
        }

for( int i = 0; i < Elements.length; i ++ ) {
            String[] Current = Elements[ i ];
            String[] New = new String[ MaxLength ];

 Copy existing values into new array, extra values remain null
            System.arraycopy( Current, 0, New, 0, Current.length );
            Elements[ i ] = New;
        }

for( String[] Group : Elements ) {
            for( String String : Group ) {
                System.out.print( String );
                System.out.print( " " );
            }
            System.out.println();
        }
    }
}

Output of this example:

Word null null 
Characters Numeric Letters 
Typos Mistypes null 

Alpha null 
Beta null 
Zeta Iota 

So this is doable, you don’t even need to know where the . and & characters are in your string. Java does that for you.

Related Problems and Solutions