Supplementary Character Handling Methods

Java-Supplementary Character Handling Methods

Supplementary Character Handling Methods

The Character class encapsulates the char data type. For the J2SE release 5, many methods were added to the Character class to support supplementary characters. The following table lists some of the commonly used methods.


Supplementary Character Handling Methods

Method Description
static char[] toChars​(int codePoint) Converts the specified character (Unicode code point) to its UTF-16 representation stored in a char array. If the specified code point is a BMP (Basic Multilingual Plane or Plane 0) value, the resulting char array has the same value as codePoint. If the specified code point is a supplementary code point, the resulting char array has the corresponding surrogate pair.
static int toChars​(int codePoint, char[] dst, int dstIndex) Converts the specified character (Unicode code point) to its UTF-16 representation. If the specified code point is a BMP (Basic Multilingual Plane or Plane 0) value, the same value is stored in dst[dstIndex], and 1 is returned. If the specified code point is a supplementary character, its surrogate values are stored in dst[dstIndex] (high-surrogate) and dst[dstIndex+1] (low-surrogate), and 2 is returned.
static int toCodePoint​(char high, char low) Converts the specified surrogate pair to its supplementary code point value.
static int charCount​(int codePoint) Determines the number of char values needed to represent the specified character (Unicode code point). If the specified character is equal to or greater than 0x10000, then the method returns 2. Otherwise, the method returns 1.
static int codePointAt​(char[] a, int index) Returns the code point at the given index of the char array. If the char value at the given index in the char array is in the high-surrogate range, the following index is less than the length of the char array, and the char value at the following index is in the low-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at the given index is returned.
static int codePointAt​(char[] a, int index, int limit) Returns the code point at the given index of the char array, where only array elements with index less than limit can be used. If the char value at the given index in the char array is in the high-surrogate range, the following index is less than the limit, and the char value at the following index is in the low-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at the given index is returned.
static int codePointAt​(CharSequence seq, int index) Returns the code point at the given index of the CharSequence. If the char value at the given index in the CharSequence is in the high-surrogate range, the following index is less than the length of the CharSequence, and the char value at the following index is in the low-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at the given index is returned.
static int codePointBefore​(char[] a, int index) Returns the code point preceding the given index of the char array. If the char value at (index – 1) in the char array is in the low-surrogate range, (index – 2) is not negative, and the char value at (index – 2) in the char array is in the high-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at (index – 1) is returned.
static int codePointBefore​(char[] a, int index, int start) Returns the code point preceding the given index of the char array, where only array elements with index greater than or equal to start can be used. If the char value at (index – 1) in the char array is in the low-surrogate range, (index – 2) is not less than start, and the char value at (index – 2) in the char array is in the high-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at (index – 1) is returned.
static int codePointBefore​(CharSequence seq, int index) Returns the code point preceding the given index of the CharSequence. If the char value at (index – 1) in the CharSequence is in the low-surrogate range, (index – 2) is not negative, and the char value at (index – 2) in the CharSequence is in the high-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at (index – 1) is returned.
static int codePointCount​(char[] a, int offset, int count) Returns the number of Unicode code points in a subarray of the char array argument. The offset argument is the index of the first char of the subarray and the count argument specifies the length of the subarray in chars. Unpaired surrogates within the subarray count as one code point each.
static int codePointCount​(CharSequence seq, int beginIndex, int endIndex) Returns the number of Unicode code points in the text range of the specified char sequence. The text range begins at the specified beginIndex and extends to the char at index endIndex – 1. Thus the length (in chars) of the text range is endIndex-beginIndex. Unpaired surrogates within the text range count as one code point each.
static char highSurrogate​(int codePoint) Returns the leading surrogate (a high surrogate code unit) of the surrogate pair representing the specified supplementary character (Unicode code point) in the UTF-16 encoding. If the specified character is not a supplementary character, an unspecified char is returned.
static char lowSurrogate​(int codePoint) Returns the trailing surrogate (a low surrogate code unit) of the surrogate pair representing the specified supplementary character (Unicode code point) in the UTF-16 encoding. If the specified character is not a supplementary character, an unspecified char is returned.
static boolean isHighSurrogate​(char ch) Determines if the given char value is a Unicode high-surrogate code unit (also known as leading-surrogate code unit). Such values do not represent characters by themselves, but are used in the representation of supplementary characters in the UTF-16 encoding.
static boolean isLowSurrogate​(char ch) Determines if the given char value is a Unicode low-surrogate code unit (also known as trailing-surrogate code unit). Such values do not represent characters by themselves, but are used in the representation of supplementary characters in the UTF-16 encoding.
static boolean isBmpCodePoint​(int codePoint) Determines whether the specified character (Unicode code point) is in the Basic Multilingual Plane (BMP). Such code points can be represented using a single char.
static boolean isSupplementaryCodePoint​(int codePoint) Determines whether the specified character (Unicode code point) is in the supplementary character range.
static boolean isSurrogate​(char ch) Determines if the given char value is a Unicode surrogate code unit. A char value is a surrogate code unit if and only if it is either a low-surrogate code unit or a high-surrogate code unit.
static boolean isSurrogatePair​(char high, char low) Determines whether the specified pair of char values is a valid Unicode surrogate pair.
static boolean isValidCodePoint​(int codePoint) Determines whether the specified code point is a valid Unicode code point value.


Program

Supplementary Character Handling Methods output2

Program Source

public class Javaapp {
    
    public static void main(String[] args)  {
        
        char getchar[] = Character.toChars(164119);
        System.out.println("164119-> "+String.valueOf(getchar));
        
        char getsurrogate[] = new char[2];
        getsurrogate[0] = Character.highSurrogate(164119);
        getsurrogate[1] = Character.lowSurrogate(164119);
        int codepoint = Character.toCodePoint(getsurrogate[0],getsurrogate[1]);
        System.out.print(codepoint+"-> ");
        System.out.println(Character.toChars(codepoint));
      
        System.out.println("1  : "+Character.codePointAt(getsurrogate, 0));
        System.out.println("2  : "+Character.codePointAt(getsurrogate, 1));
        
        System.out.println("3  : "+Character.codePointBefore(getsurrogate, 1));
        System.out.println("4  : "+Character.codePointBefore(getsurrogate, 2));
        
        System.out.println("5  : "+Character.isHighSurrogate(getsurrogate[0]));
        System.out.println("6  : "+Character.isLowSurrogate(getsurrogate[0]));
        
        System.out.println("7  : "+Character.isBmpCodePoint(35000));
        System.out.println("8  : "+Character.isSupplementaryCodePoint(100000));
        
        System.out.println("9  : "+Character.isSurrogate(getsurrogate[0]));
        System.out.println("10 : "+Character.isSurrogatePair(getsurrogate[0],getsurrogate[0]));
        
        System.out.println("11 : "+Character.isValidCodePoint(1000000));
        System.out.println("12 : "+Character.isValidCodePoint(1500000));
    }
}

Leave a Comment