Packing and Unpacking: A Look Inside File Archivers

We’ve all probably used a file archiver before. WinRAR, Zip, and the like have become an essential part of our digital lives. Some of the best archivers available not only help you. But have you ever wondered how these programs work their magic? Suddenly a 10-gig file is shrunk into a couple of megabytes.

In this article, we’re about to dive into a very fascinating topic, which is of compression or shrinking of files. We’ll go into the history of file compression, its theory, and its inner workings, plus some finer details. If you are especially a computer science student, you’ll want to stick around.

Let’s dive right in!

What Is File compression? Plus, the History of File Compression

Most of the files we use, view, or download over the internet and media are usually compressed versions of the original files in one way or the other. This action saves us bandwidth, space, and even time, while still being able to get more or less the same as the original. So, you can “zip”, archive, or compress a folder (however you want to call it) by making the files inside it smaller.  But how does file compression actually work and what happens behind the scenes?

That sounds like a question right up the alley of expert academic writers like Essay Writing Service UK. If you’re looking to improve your grades in this subject, then those are just the right folks to call.

Anyway, back to the beginning. So, just what is the principle behind file compression? The history of file compression goes back to the Morse Code, in the early 1800’s. Morse Code assigned shorter sequences to more frequent letters, which saved time and wire usage.

Compression in computing focuses on reducing data redundancy. Hence, the most frequently occurring symbols can be assigned shorter codes. Huffman Coding is an improvement on Morse Code, developed in the 1950’s. Then, the Lempel-Ziv family of algorithms completely revolutionized compression by identifying patterns in data. That innovation led to the forerunner of modern zip programs, i.e. PKZIP.

How Compression Works

At the back of compression is the principle of computing which is the bit – binary digits composed of ones and zeros. Without boring you to death, each file or a component of a file such as a pixel in an image or a sample in an audio file is computationally represented as a string of ones and zeros. Compression is all about either eliminating some redundant ones and zeros in these strings. These strings can also be represented more efficiently, such as using codes.

Let’s dive into the two main approaches adopted in the compression of data:

1. Removing Redundant Data

This is a common approach adopted in compression. This can be achieved by either finding and/or replacing repeating data patterns and storing them with a reference code. Thus, repeated phrases, spaces and tabs, logos, etc., can be replaced with a shorter code in a Word doc.

This can also be done through statistical methods that analyze the frequency of symbols such as letters and numbers in the data. Less frequent symbols are assigned shorter codes while frequently occurring symbols are assigned longer codes.

2. Representing Data More Efficiently

This can be achieved in two ways:

  • Quantization – For files like multimedia files like images and audio, the number of bits used to represent each element (for example a pixel in a picture or a sample in audio) can be reduced. Let’s say a pixel at 4K display is 24 bits. With quantization, some data may be removed to make the pixel 8 bits, thus reducing quality but overall efficient performance. This is an example of lossy compression.
  • Run-length encoding: This focuses on sequences of identical values, i.e., instead of storing the value itself, the system stores the number of times the value appears consecutively. This is useful for data with long stretches of the same value, for example, a monochrome image with large areas of single color.

These run up to another crucial area that you need to be aware of, lossy and lossless compression.

Lossy vs. Lossless

Compression can be lossless or lossy. Lossless means you don’t lose any data in the process of compression. The data is of a higher quality and information is arranged more efficiently. For methods like Lempel-Ziv and Huffman, while the data integrity is retained, there is a limit on the compression ratio.

Lossy means that there is a trade-off between data integrity and the compression ratio. Thus, in the ‘80’s there was an emergence of lossy compression where multimedia files were compressed with some acceptable loss of data and, therefore, quality. This doesn’t matter whether it is a Microsoft Word file you need to compress or a high-quality painting or video.

  • Lossless: Huffman Coding, LZW (used in GIF, TIFF), DEFLATE (used in ZIP)
  • Lossy: JPEG (images), MPEG (video), MP3 (audio)

In the ‘90s and 2000s, there was a vast improvement in the types of compression algorithms and types available.

The Impacts of Compression

Most times we think about compression it is along the lines of documents that we use on our PCs. However, compression has a wider impact beyond just personal computer usage.

For example, streaming services like Netflix and YouTube use aggressive lossy compression to stream efficiently and at lower bandwidth. Thus, you can choose your type of video quality in your subscription to these services. However, the quality of such streaming services might also vary depending on other factors such as network congestion and bitrate adjustment.

You can now perceive that this is indeed a viable topic in computing worthy of research. If you are a computer science or IT student, then you need to check out this RoyalWriter review to discover how such a service can help you achieve an excellent academic paper.

So, here’s a quick test question for you. Let’s say you choose to view a lossy video such as a movie on a 4K or 8K resolution display, how would that impact your viewing experience? That’s some homework for you to jog your mind on how compression actually works beyond the PC.

Final Words

Fascinating, isn’t it? Bet you didn’t know that compression was a pretty intriguing subject. If you were curious then we hope you’ve learned a ton in this article.

Remember to use the principles explored herein to improve your academic research on subjects such as lossy versus lossless compression, quantization, data encoding, and wavelength reduction. Adios!

Sharing is caring!