Blog Archives

Compression (part 1, basics), GZip

We have done all sorts of things with data now. Compression is next on the agenda. The consideration to apply compression deals with quantity and speed. The more data you have, or the less network/hard disk speed you have, the more attracting compression becomes.
Streaming media would still be in the stone age without complex compression. And what about backup software?

For many local network messages we see a different picture. It might take longer to compress and then send data rather than to send the corresponding raw data. For small packages it is definitely not worth it. You need larger data to make compression efficient.

Some files like MP3 or JPG should not be compressed. They are compressed already. Theses files do not get smaller. You spend a lot of time compressing data just to have a larger file afterwards.

The .Net Framework supports GZip and Zip.
Today I will show you, how GZIP could be used to compress/decompress data easily with some built-in methods of the .Net Framework. I added a StopWatch to show how much time you would spend packing data. Do the maths and see how reasonable compression is compared to your device or network speed.

You obviously would not use a MemoryStream to generate the byte data. You can replace it by eg. a NetworkStream or a FileStream.

//byte[] lData = new byte[cDataSize];
//lDecompressorStream.Read(lData, 0, cDataSize);

The above lines are commented out in the following source code example. The reason for this is that the program does not use any data header to know the raw data size in advance. Therefore the final decompressed size cannot be allocated. The example would run smoothly, because the size is known in this case, it would a questionable example though.

using System;
using System.Diagnostics;
using System.IO;
using System.IO.Compression;
using System.Linq;
const int cDataSize = 10000;
const int cIterations = 1000;

public static void Test() {
   Stopwatch lWatch = new Stopwatch();
   lWatch.Start();

   byte[] lSource = GenerateRandomData(cDataSize);
   for (int i = 0, n = cIterations; i < n; i++) {
      //byte[] lSource = GenerateRandomData(cDataSize);
      byte[] lCompressed = Compress(lSource);
      byte[] lTarget = Decompress(lCompressed);

      if (i < n - 5) continue; // we just print the last 5 results
      // compare the result with the original data
      Console.WriteLine("before: " + string.Join(" ", lSource.Take(20)) + " ...  ,length: " + lSource.Length);
      Console.WriteLine("after:  " + string.Join(" ", lTarget.Take(20)) + " ...  ,length: " + lTarget.Length);
      Console.WriteLine("compressed size was: " + lCompressed.Length + " = " + (lCompressed.Length/(double)lSource.Length).ToString("0.00%"));
      Console.WriteLine();
   }

   lWatch.Stop();
   Console.WriteLine();
   Console.WriteLine("time elapsed: " + lWatch.ElapsedMilliseconds.ToString("#,##0") + " ms");
   Console.WriteLine("iterations: " + cIterations.ToString("#,##0"));

   Console.ReadLine();
} //


// creates random data with frequent repetitions
private static byte[] GenerateRandomData(int xDataSize) {
   byte[] lData = new byte[xDataSize];
   Random lRandom = new Random(DateTime.Now.Millisecond);
   for (int i = 0, n = lData.Length; i < n; i++) {
      lData[i] = (byte)lRandom.Next(0, 10);
   }

   return lData;
} //


private static byte[] Compress(byte[] xData) {
   MemoryStream lTargetStream = new MemoryStream();
   using (GZipStream lCompressorStream = new GZipStream(lTargetStream, CompressionMode.Compress)) {
      lCompressorStream.Write(xData, 0, xData.Length);
   }

   return lTargetStream.ToArray();
} //

private static byte[] Decompress(byte[] xCompressedData) {
   GZipStream lDecompressorStream = new GZipStream(new MemoryStream(xCompressedData), CompressionMode.Decompress);
   //byte[] lData = new byte[cDataSize];
   //lDecompressorStream.Read(lData, 0, cDataSize);
   MemoryStream lTargetStream = new MemoryStream();
   lDecompressorStream.CopyTo(lTargetStream);

   int lLength = (int)lTargetStream.Length;
   byte[] lDecompressedData = new byte[lLength];
   lTargetStream.Seek(0, SeekOrigin.Begin);
   lTargetStream.Read(lDecompressedData, 0, lLength);
         
   return lDecompressedData;
} //      

example output:
before: 8 5 6 1 2 3 0 8 8 6 5 5 0 4 4 8 3 2 9 0 … ,length: 10000
after: 8 5 6 1 2 3 0 8 8 6 5 5 0 4 4 8 3 2 9 0 … ,length: 10000
compressed size was: 5089 = 50.89%

before: 8 5 6 1 2 3 0 8 8 6 5 5 0 4 4 8 3 2 9 0 … ,length: 10000
after: 8 5 6 1 2 3 0 8 8 6 5 5 0 4 4 8 3 2 9 0 … ,length: 10000
compressed size was: 5089 = 50.89%

before: 8 5 6 1 2 3 0 8 8 6 5 5 0 4 4 8 3 2 9 0 … ,length: 10000
after: 8 5 6 1 2 3 0 8 8 6 5 5 0 4 4 8 3 2 9 0 … ,length: 10000
compressed size was: 5089 = 50.89%

before: 8 5 6 1 2 3 0 8 8 6 5 5 0 4 4 8 3 2 9 0 … ,length: 10000
after: 8 5 6 1 2 3 0 8 8 6 5 5 0 4 4 8 3 2 9 0 … ,length: 10000
compressed size was: 5089 = 50.89%

before: 8 5 6 1 2 3 0 8 8 6 5 5 0 4 4 8 3 2 9 0 … ,length: 10000
after: 8 5 6 1 2 3 0 8 8 6 5 5 0 4 4 8 3 2 9 0 … ,length: 10000
compressed size was: 5089 = 50.89%

time elapsed: 1,274 ms
iterations: 1,000