HOME>> linux>> command line>> compression>> Compression in linux from the command line


Compression in linux from the command line


Compressing files in linux is more or less like using a GUI in windows, So you right click a folder and chose to compress it with something like file roller, or you double click a compressed archive to extract it or extract files from it.

But this is not the point of this article, This article is all about compression from the command line, Quick tips to get you up to speed compressing, slicing, creating multi part archives, utilizing multiple processors, and so forth.

Two concepts you have to be familiar with is, Compression algorithm (Example: bzip2), and container, Example .bz2

But it is not always like this, if the file is bzip2 compressed, it is not necessarily true that it is in a bz2 container, it could be bzip2 compressed but in a .7z container for example, The file extension tells you what the container is, but some containers can contain files compressed an arbitrary algorithm (Encoding).

So, now with that out of the way, i should mention that new compression software is starting to support multiple processors, So you can utilize all four processors (Sometimes 8) to compress a file, in case the processors are too fast too many, your bottleneck would probably be the disk read/write speed.

Enough already, Show us some of that stuff you are talking about. Using P7ZIP

Lately, i have been using a tool called 7ZIP to create bzip2 archives and put them in .7Z containers, this is my favorite, and i will start with it.

To use this, You will need to install b7zip for linux, Follow the instructions from 7zip to install that on your linux distro.

Now that it is installed, assume i want to use a 7z container, but i would like the compression inside to be bzip2, simply because this way (With bzip2) i can utilize all 8 processors on my dual socket, quad core xeon system, let us also assume i want to split the output file to 80MB chunks.

 

NOTE: To use the advanced functions below, please make sure you use the p7zip-full package on linux, otherwise not everything is supported
1- 7Z archive with bzip2 compression to compress the folder /etc/myfolder and all its contents into /home/me/anarchive.7z, using multithreading to bzip2 compress, left MT
On Linux 7z a -t7z /home/me/anarchive.7z /etc/myfolder -v80m -m0=bzip2 -mmt=4
On Windows 7z a -t7z k:\anarchive.7z C:\Users\user1\Desktop -v80m -m0=bzip2 -mmt=4
Explained for Linux
7z a -t7z /home/me/archive.7z /etc/myfolder -v80m -m0=bzip2 -mmt=4    
7ZIP Add files to archive a 7zip container (File extension) Where the output file will go The folder to compress with all subfolders Split archive into 80 MB chunks Use bzip2 compression utilize my 4 processors or processor cores    
2- Extracting a multipart archive with 7-zip, the .001 is appended when you make a multi file (spanning) archive
On linux 7z x /home/me/anarchive.7z.001

 Now that was easy was it not ? BZIP2 without 7ZIP

Most linux installations come with bzip2 pre installed, You can compress A SINGLE FILE by using a command such as

bzip2 -z /myfolder/myfile.img

and bzip2 will make the file /myfolder/myfile.img.bz2 then delete the source file, if you don't want bzip2 to delete the source file you will need to use the k switch as follows

bzip2 -z -k /myfolder/myfile.img

but what about folders, How can i compress a folder not a file ?

In this case we will need to create a .tar file first

tar --create --verbos --file=/mydirectory1/myfile.tar /mydirectory2/directory3

Then, we can execute bzip2 on the file /mydirectory1/myfile.tar in order to get /mydirectory1/myfile.tar.bz2

Then you ask, What about multi processor support (Multi threading)

Bzip2 is not multi threaded, and will not allow you to use multiple processors to compress a file at a time

So you will need to use another tool called pbzip2 and run it as follows

pbzip2 -p8v somefile

in order to utilize 8 processors

just like bzip2 k means keep original file, d is decompress v is tell me what is happening and all the other things are shown here

Usage: pbzip2 [-1 .. -9] [-b#cdfklp#qrtV] <filename> <filename2> <filenameN>

-b# : where # is the file block size in 100k (default 9 = 900k)

-c : output to standard out (stdout)

-d : decompress file

-f : force, overwrite existing output file

-k : keep input file, don't delete

-l : load average determines max number processors to use

-p# : where # is the number of processors (default: autodetect)

-r : read entire input file into RAM and split between processors

-t : test compressed file integrity

-v : verbose mode

-V : display version info for pbzip2 then exit

-1 .. -9 : set BWT block size to 100k .. 900k (default 900k)

   Example: pbzip2 -b15vk myfile.tar
   Example: pbzip2 -p4 -r -5 myfile.tar second*.txt
   Example: pbzip2 -d myfile.tar.bz2

What about all the fancy piping and doing everything in one line ?

Well, they are simple really, But the syntax of the Linux command line (Shell, Born again shell or bash) is beyond the scope of this, those are Bash specific tricks you will pick up as you gain familiarity with Linux

Although there are many more compression algorithms, formats, and containers, i only covered the two mentioned above because they are the ones you will need in 90% of the situations, I will cover more as soon as time permits

 


Copyright, V.CheapDomains 2010

Live Help