Chapter 13: IO with Compression

13.1: Compression Classes

Java provides a set of classes in package java.util.zip to write and read files and other data format (e.g. data to be sent through network) in a compressed format. They all inherit from IS or OS.

During compression and decompression, checksum technique is used to validate the data. A checksum is a number generated from a stream of data representing its uniqueness. Different composition of the data group will very probably generate different checksums. Therefore, after data has been decompressed, a new checksum will be generated from the new data, and compared with old checksum. If they are the same, then very possibly there is no error involved.

Interface java.util.zip.Checksum's getValue( ) method returns the checksum value. There are two classes implementing Checksum: java.util.zip.Adler32 and java.util.zip.CRC32. The later is slower but more accurate.

As I understand now, checksum is used internally by the compression classes. You needn't explicitly invoke it. What you need to do is to wrap the java.util.zip.ZipInputStream or java.util.zip.ZipOutputStream around java.util.zip.CheckedInputStream or java.util.zip.CheckedOutputStream which can generate the checksum.

There are two pairs of compression classes. One is simple and one has full function.

Class

Function

Inherit

Const. Argu

CheckedOutputStream

Generate checksum from the underlying stream

IS

IS, Checksum

CheckedInputStream

Generate checksum from the underlying stream

OS

OS, Checksum

ZipOutputStream

Compress data into zip file

OS

OS

ZipInputStream

Decompress data from zip file

IS

IS

GZIPOutputStream

Compress data into GZIP file

OS

OS

GZIPInputStream

Decompress data from GZIP file

IS

IS

* ZipInputStream & ZipOutputStream

ZipInputStream and ZipOutputStream have full compression functions. They use checksum technique, so they must either directly or indirectly wrap around a CheckedInputStream/CheckedOutputStream.

ZipOutputStream can compress multiple files together into one zip file. When it opens a zip file to output, it can separate data in this file by putting ZipEntry objects as dividers with putNextEntry( ). A ZipEntry wraps a string which is normally the file name. Therefore, you can write data from different files into one ZipOutputStream, and use the filename to divide the data.

Later, when you use ZipInputStream to read the data from the zip file, you can retrieve the ZipEntry objects one by one using getNextEntry( ). Then you can get filenames from these ZipEntry objects, and write the data after each ZipEntry into separate files.

The end of data before next ZipEntry divider is treated as end of file. So you can use whether read( ) returns -1 to check for the end of one file.

You can also use setComment( ) to write a comment for this zip file. When you open this zip file with WinZip software, for example, you can see this comment. But this comment is not accessible by ZipInputStream.

ZipEntry also contains other methods which allows you to set the name, compressed and uncompressed sizes, date, CRC checksum, extra field data, comment, compression method, and whether it's a directory entry. It only supports CRC checksum.

Example:

   
   import java.io.*;
   import java.util.*;
   import java.util.zip.*;
   
   public class ZipCompress {
      public static void main(String[] args) 
      {
         try {
            // 1. Generating output zip file:
            String [] files = {"f1.txt", "f2.txt", "f3.txt"};
            CheckedOutputStream csum1 = new CheckedOutputStream(
                         new FileOutputStream("test.zip"), new Adler32());
            ZipOutputStream out = new ZipOutputStream(
                                  new BufferedOutputStream(csum1));
            out. setComment("A test of Java Zipping");
   
            // 2. Writing files into zip file:
            for(int i = 0; i < files. length; i++)
            {
               BufferedReader in = new BufferedReader(
                                   new FileReader(files[i]));
               out. putNextEntry(new ZipEntry(files[i]));
               int c;
               while((c = in. read()) != -1)
               out. write(c);
               in. close();
            }
   
            out. close();
            System.out.println("Checksum after writing into zip file: "
                                + csum1. getChecksum().getValue());
   
            // 3. Connecting to the zip file:
            CheckedInputStream csum2 = new CheckedInputStream(
                                 new FileInputStream("test.zip"), new Adler32());
            ZipInputStream in2 = new ZipInputStream(
                                 new BufferedInputStream(csum2));
   
            // 4. Reading from the zip file and displaying the content:
            ZipEntry ze1;
   
            while((ze1 = in2.getNextEntry()) != null)
            {
               System.out.println("ZipEntry is " + ze1);
               int x;
               while((x = in2.read()) != -1)
                  System.out.write(x);
               System.out.println("\n");
            }
   
            in2.close();
   
            // 5. Alternative way to get ZipEntry:
            System.out.println("Alternative way to get ZipEntry:");
            ZipFile zf1 = new ZipFile("test. zip");
            Enumeration e = zf1. entries();
   
            while(e. hasMoreElements())
            {
               ZipEntry ze2 = (ZipEntry)e. nextElement();
               System.out.print(ze2 + " ");
            }
         }
         catch(Exception e) {
            e. printStackTrace();
         }
      }
   }

Output will be:

   
   Checksum after writing into zip file: 1843876626
   ZipEntry is f1.txt
   When I was a lad of ten,
   ZipEntry is f2.txt
   my father said to me:
   ZipEntry is f3.txt
   Come here and take a lesson
   Alternative way to get ZipEntry:
   f1.txt f2.txt f3.txt

In this example, when ZipInputStream read out the data separated by different ZipEntry objects, it did not use FileOutputStream(ZipEntry) to create a new file and write data into it, as WinZip software package does. It only displayed the data read from the compressed zip file.

* GZIPInputStream & GZIPOutputStream

GZIPInputStream and GZIPOutputStream are simple compression classes. As you can see in the following example, they do not use checksum technique, and do not have data dividers.

   
   public class GZIPcompress1 {
      public static void main(String[] args) {
         try {
            BufferedReader in = new BufferedReader(
                                new FileReader("source.txt"));
            BufferedOutputStream out = new BufferedOutputStream(
                                       new GZIPOutputStream(
                                       new FileOutputStream("test.gz")));
            System.out.println("Writing file");
            int c;
   
            while((c = in. read()) != -1)
               out. write(c);
   
            in. close();
            out. close();
            System.out.println("Reading file");
   
            BufferedReader in2 = new BufferedReader(
                                 new InputStreamReader(
                                 new GZIPInputStream(
                                 new FileInputStream("test.gz"))));
            String s;
   
            while((s = in2.readLine()) != null)
              System.out.println(s);
            }
   
            catch(Exception e) {
               e. printStackTrace();
         }
      }
   }

13.2: Java Archive (JAR) Utility

Java archive (JAR) is a utility which is able to compress a group of files into one, just like Zip. Its format is cross-platform, so you needn't worry about platform issue. It has the facility to include audio and image files as well as class files.

All Java's libraries (class files) are provided in JAR files in e.g. c:\jdk1.3\lib. Several nested directories i.e. packages can be packed inside one JAR file. As long as you point the CLASSPATH system variable to this JAR file - not only the directory but also the JAR file, JVM will search for all the packages and classes inside this JAR file.

A JAR file consists of a group of zipped files plus a "manifest" that describes them. You can designate your own manifest file when you run the JAR tool. If you don't, Java will do it for you.

JAR tool is invoked on the command line:

   
   jar [options] [JAR filename] [manifest filename] [source filenames]

The JAR filename can either be a new file which is to be created, or an existing file which is to be listed or extracted. There are the following options:

   c    create a new empty filename
   t    List the contents in the following JAR file
   x    extract all files in the following JAR file
   x filename    extract only the named file in the following JAR file
   f    telling JAR that you will provide the source or destination file. Otherwise JAR will assume that its input (when compressing) or output (when extracting) will be the standand input or output
   m    telling JAR that you will designate your own manifest file
   v    create verbose output describing what JAR is doing
   O    Only store but do not compress the files, used to form a file which you can put in your classpath
   M    telling JAR not to automatically create a manifest file

Examples:

   jar cf Test1.jar *.class    compress all class files into Test1.jar
   jar xf Test1.jar    extract all files in Test1.jar
   jar tf Test1.jar    List the contents of Test1.jar
   jar cmf Test1.jar Manif1.mf *.class    compress all class files into Test1.jar with designated manifest file Manif1.mf
   jar tvf Test1.jar    List the contents of Test1.jar, with more detailed description
   jar cvf Test1.jar dir1 dir2 dir3    combine subdirectory dir1, dir2, dir3 into Test1.jar