What is File Compression? - Definition & Overview

In this lesson you will learn how file compression is used in different situations, some different types of compression models, and where each might be commonly seen.

The Basics of Compressing Data

Ding! A text message comes in from a family member on the other side of the country. You open the message and see a picture of … a hamburger. It looks really delicious and the picture is so clear that you start to wonder about lunch. You laugh about this photo as you video chat on your phone with your brother. Amazing how these photos and videos can make their way to your phone from all the way across the country. All of this is possible using file compression.

Compressing data is possible by using statistics and determining patterns of repetition in the data being compressed. Redundant data is identified and replaced with placeholders to reduce the overall size of files. Let's use an example to make this clearer. Say we put the lyrics to the song 'Happy Birthday' into a file:

Happy birthday to you!

Happy birthday to you!

Happy birthday dear Johnny!

Happy birthday to you!

In this short song, we see a total of 73 characters (not including spaces), many of which are used in words duplicated several times throughout the song. Specifically, 'Happy' and 'Birthday' appear 4 times, and 'to' and 'you' appear 3 times. If we were to then take these words and place them in a numbered list, we get this catalog:

1 Happy

2 birthday

3 to

4 you

We can use this catalog to replace the original words in our song lyrics, using the numbers as a placeholder. Now our song looks very different:

1 2 3 4!

1 2 3 4!

1 2 dear Johnny!

1 2 3 4!

Of course, the song itself does not make much sense without the catalog to translate the numbers back to the original words, so we need to include that in the new file we have created.

After all, what good is a compressed file if we can't decompress it, right? After replacing the words in the song with numbers, and including the catalog the explains which word corresponds to which number, the original file size and number of characters being used has been reduced by just over one-third!

This example is simple, but consider performing this task on a different document such as a term paper or a novel. More complex techniques exist for identifying patterns beyond just the number and type of words, which can allow for even greater compression of the file. You can see how this technique can take large amounts of data and reduce it a considerable amount.

Lossless and Lossy Compression

In most cases, the benefit of file compression is only apparent if you can get your original data back after it has been compressed. That is to say, there should be no loss of the original information when compressing the file. This is referred to as lossless compression, and relies on breaking the file into smaller chunks, and using catalogs of these chunks to represent the original data perfectly.

Instances exist, however, where some loss of the original data is acceptable in favor of a reduced file size. But this often depends on where that file is being used. Mathematical formulas called algorithms determine what data is statistically similar enough that it could be combined to reduce the size of the file, but not impact it so much that the end user might notice. Smart phone users often encounter this when taking a photo with the camera and sending it via text message to a friend. The original image is intact on the phone in a high quality format, but the photo that gets sent is compressed and delivered in a reduced quality, to save time and file space. This is referred to as lossy compression. With this method, some of the original data has been lost as part of the compression process, but the compressed version is often close enough to the original that you cannot tell a difference.

