Java-Capturing Groups

Capturing Groups

Earlier you used the group() method for a Matcher object to retrieve the subsequence matched by the entire pattern defined by the regular expression. The entire pattern represents what is called a capturing group because the Matcher object captures the subsequence corresponding to the pattern match. Regular expressions can also define other capturing groups that correspond to parts of the pattern. Each pair of parentheses in a regular expression defines a separate capturing group in addition to the group that the whole expression defines. In the earlier example, you defined the regular expression by the following statement:

This pattern “(AAA)(BBB)(CCC)” defines three capturing groups other than the whole expression: one for the subexpression (AAA), one for the subexpression (BBB), and one for the subexpression (CCC). The Matcher object stores the subsequence that matches the pattern defined by each capturing group, and what’s more, you can retrieve them.

To retrieve the text matching a particular capturing group, you need a way to identify the capturing group that you are interested in. To this end, capturing groups are numbered. The capturing group for the whole regular expression is always number 0. Counting their opening parentheses from the left in the regular expression numbers the other groups. Thus, the first opening parenthesis from the left corresponds to capturing group 1, the second corresponds to capturing group 2, and so on for as many opening parentheses as there are in the whole expression. Following figure illustrates how the groups are numbered in an arbitrary regular expression.

As you see, it’s easy to number the capturing groups as long as you can count left parentheses. To retrieve the text matching a particular capturing group after the find( ) method returns true, you call the group(int groupnum) method for the Matcher object with the group number as the argument. The groupCount( ) method for the Matcher object returns a value of type int that specifies the number of capturing groups within the pattern — that is, excluding group 0, which corresponds to the whole pattern. Therefore, you have all you need to access the text corresponding to any or all of the capturing groups in a regular expression.

Program

 

Program Two


Program Source

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Javaapp {
  
    public static void main(String[] args) {
        
        Pattern pat = Pattern.compile("(AAA)(BBB)(CCC)");
        Matcher mat = pat.matcher("AAABBBCCC AAABBBCCC");

        while(mat.find())
        {
            for(int i=0;i<=mat.groupCount();i++)
            {
                System.out.println("Group "+i+" : "+mat.group(i));
            }
        }
    }
}

Program Two Source

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Javaapp {
  
    public static void main(String[] args) {
        
        Pattern pat = Pattern.compile("(AAA(BBB(CCC)))");
        Matcher mat = pat.matcher("AAABBBCCC AAABBBCCC");

        while(mat.find())
        {
            for(int i=0;i<=mat.groupCount();i++)
            {
                System.out.println("Group "+i+" : "+mat.group(i));
            }
        }
    }
}

Leave a Comment