Compressing and Archiving Files in Linux

Share This Post:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
compression

Downloading and install new software often involves multiple scripts and large files, which are often compressed into smaller, single file.  In the Windows ecosystem, you see that the .zip format that combines and compresses files for transferring over the net is commonly used.  Linux provides users with a number of related commands that we will explore in this section.

What Is Compression?

The interesting subject of compression could fill an entire book by itself, but for this book we only need a rudimentary understanding of the process. Compression, which can be classified as either lossy or lossless, is a process that makes data smaller so that it:

  • Requires less space to store;
  • It is easier to send across the Internet.

Lossy Compression

As a compression technique, lossy compression is very effective reducing file size but at the cost of information integrity;  after compression, the file after is not exactly the same as the original which works well for graphics, video, and audio files, where small differences in the file is hardly noticeable.  Lossy compression algorithms included mp3, .mp4, .png, and .jpg.  A missing pixel or note change isn’t likely to be noticed.  Since the lossy compression ratio is very high, the resultant file is significantly smaller than the original.  Lossy compression is not well suited when working with files/software and data integrity is critical.

Lossless Compression

When sending data that requires the integrity of the original file be retained upon decompression, you need to use lossless compression techniques. lossless compression are not as efficient as lossy compression, but in these situations integrity is more important than compression ratio.

Tarring Files Together

The first thing to do when compressing files is assemble  them into an archive, which usually involves the tar command, which is a contraction of the term “tape archive“,  referencing when computer systems used tape to store data. The “tar” command creates an archive, which is a single file made from many files, which is then referred to as an archive, tar file, or tarball.  For instance, say you had three files (also seen in the screenshot where we ran a long listing “ls -la” command):

  • file1
  • file2
  • file3
Listing of files before running tar
Listing of three files in Desktop directory before running the “tar” command in Linux.

Now before we send these three files to another person, lets combine them into a single archive, you would run the following command:

tar -cvf newtarfile.tar  file1 file2 file3

Using “tar” command with “cvf” switches to create tar archive.

using the
Creating an archive with the “tar” command.

Let’s review the “tar” command we used above:

  • tar: The archiving command in Linux is “tar”, and it is used here with three options:
    • The “c” option means create;
    • The “v” (which stands for verbose and is optional) lists the files that tar is dealing with, and,
    • The “f” command means write to the following file. This last option will also work for reading from files.
  • The name of the file archive you want to create, in this case “newtarfile”; and
  • The names of the file you want to include in the archive.

When you run the listing command again “ls -la”, you now see the new file archive “newtarfile.tar” there.

You should also notice the size of the archive file (“tarball“) is 20,480 bytes which is larger than the 10,240 bytes of the unarchived files. When archiving files, tar adds significant overhead.  As the size of the archived files increases, overhead becomes smaller on a percentage basis of the overall size of the archive.

You can display “tarballed” files without extracting them, by using the “tar” command with the “-t “content list switch (along with the vf switches:

Viewing the contents of a tarball archive
Using the “tar” command with the “tvf” switches to view the contents of a file archive.

The three files included in the archive are displayed and you can see their original sizes.

Extracting Files from a Tarball

Use the “-x” switch with the “tar” command to extract files as seen in the screen shot below.  We have also used the “-v” switch to output the files extracted with this command:

extracting files with tar
Using the “tar” command with the “xvf” flags to extract compressed files from an archive.

After extracting the files, run the long listing command “ls -la” to confirm that the operation has been successful.   Looking at the screenshot above, you see that file1, file2 and file3 are back in the directory.
The files have been extracted into the current directory; you can do a long listing on the directory to double-check.

Compressing Files

So now that a file archive exists, how can we compress the files to reduce storage space or improve transmission speed?  Linux has several commands capable of creating compressed files; each command uses different compression algorithms and has different compression ratios. , including:

  • gzip, which uses the extension .tar.gz or .tgz.  In between bzip2 and compress in terms of both speed and file size produced;
  • bzip2, which uses the extension .tar.bz2.   This is the slowest compression method, but produces the smallest files; and,
  • compress, which uses the extension .tar.z and is the fastest, but produces the largest files.

gzip Compression

The “gzip” command (GNU zip) is the most commonly used compression utility in Linux. Compress the newtarfile.tar archive with the following command:

gzip newtarfile.tar

After running the “ls -la” command, notice, in the screenshot above,  the changes that happened, the “newtarfile.tar” became “newtarfile.tar.gz” and the file size fell from 20,480 bytes to 178 bytes.

Using gzip to compress a tar file
Using “gzip” command to compress a “tarball” file.

The “newtarfile.tar.gz” file can be uncompressed with the following command:

gunzip newtarfile.tar.gz

Compressing with bzip2

bzip2 is another commonly used Linux compression tool, but with better compression ratios.   Run the command as follows:

bzip2 “Insert names of files to compress” (remove the quotation marks)

Use bunzip2 to uncompress files like so: 

bunzip2 “Insert names of files to decompress” (remove the quotation marks)
The uncompressed file returns to its original size, and file extension returns to .tar.

Compressing with compress

The “compress” command is probably the least used of the three compression utilities mentioned here. Use it like so:

compress newtarfile.tar

In addition to compression, the file extension now is .tar.Z.  Decompressing a file made with compress, involves using the “uncompress” command (You can also use the gunzip command with files that have been compressed with compress).

uncompress newtarfile.tar.Z

dd: Creating Bit-by-Bit or Physical Copies of Storage Devices

The “dd” command makes a bit-by-bit copy of anything that it is asked to copy, a file, a filesystem, or an entire hard drive, including deleted files (unlike the “cp” command), which allows for easy discovery and recovery. 

Forensic examiners and hackers make use of the “dd” command to copy entire hard drives/storage device to their system to make sure they get copies of deleted files and other artifacts that might be useful for them.  Do not use the “dd” command for typical day-to-day copying of files and storage devices because it is very slow.  Use it only when you want a copy of a storage device without the filesystem. 

The syntax for the “dd” command is as follows: 

dd if=inputfile of=outputfile

To make a physical copy of your flash drive (named “sdb”) , you would enter the following:

dd if=/dev/sdb of=/root/flashcopy

  • dd: The physical “copy” command; 
  • if: Designates the input file, with /dev/sdb representing the flash drive in the /dev directory; 
  • of: Designates the output file; and /root/flashcopy is the name of the file you want to copy the physical copy to.

Share This Post:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

Table of Contents

You May Like

Related Posts

Linux meme
Linux Basics
Linux Administrator

Adding and Removing Linux Software

A fundamental task as system administrator is adding and removing Linux software that either didn’t come with the distribution or removing unwanted software to free

Read More »