utf8

Java-Read and Write UTF-8 Encoded Data

Read and Write UTF-8 Encoded Data

The DataOutputStream class provides a void writeUTF( ) method that encodes a string into UTF8 format. It first writes the number of encoded bytes in the string (as an unsigned short) followed by the UTF-8 encoded format of the string onto the underlying output stream.

The DataInputStream class provides a String readUTF( ) and String readUTF(DataInput in) methods to read a UTF8 encoded string from its underlying input stream. Each of these first reads a two-byte, unsigned short that tells it how many more bytes to read. These bytes are then read and decoded from UTF-8 into a Java Unicode string. An EOFException is thrown if the stream ends before all the expected bytes have been read. If the bytes read cannot be interpreted as a valid UTF-8 string, then a UTFDataFormatException is thrown. DataInputStream and DataOutputStream actually read and write a slight modification of the official UTF-8 format. They encode the null character (0x00) in two bytes rather than one.

Program

utf8utf8

Program Source

import java.io.FileOutputStream;
import java.io.DataOutputStream;
import java.io.FileInputStream;
import java.io.DataInputStream;
import java.io.IOException;

class Javaapp {
    
    public static void main(String[] args) throws IOException {
        
        FileOutputStream fos = new FileOutputStream("data.da");
        DataOutputStream dos = new DataOutputStream(fos);
        dos.writeUTF("AआBèDशEæ");
        dos.close();
        
        FileInputStream fis = new FileInputStream("data.da");
        DataInputStream dis = new DataInputStream(fis);
        System.out.println(dis.readUTF());
        dis.close();        
    }
}

Leave a Comment