Compressing files in linux is more or less like using a GUI in windows, So you right click a folder and chose to compress it with something like file roller, or you double click a compressed archive to extract it or extract files from it.
But this is not the point of this article, This article is all about compression from the command line, Quick tips to get you up to speed compressing, slicing, creating multi part archives, utilizing multiple processors, and so forth.
Two concepts you have to be familiar with is, Compression algorithm (Example: bzip2), and container, Example .bz2
But it is not always like this, if the file is bzip2 compressed, it is not necessarily true that it is in a bz2 container, it could be bzip2 compressed but in a .7z container for example, The file extension tells you what the container is, but some containers can contain files compressed an arbitrary algorithm (Encoding).
So, now with that out of the way, i should mention that new compression software is starting to support multiple processors, So you can utilize all four processors (Sometimes 8) to compress a file, in case the processors are too fast too many, your bottleneck would probably be the disk read/write speed.
Enough already, Show us some of that stuff you are talking about. Using P7ZIP
Lately, i have been using a tool called 7ZIP to create bzip2 archives and put them in .7Z containers, this is my favorite, and i will start with it.
To use this, You will need to install b7zip for linux, Follow the instructions from 7zip to install that on your linux distro.
Now that it is installed, assume i want to use a 7z container, but i would like the compression inside to be bzip2, simply because this way (With bzip2) i can utilize all 8 processors on my dual socket, quad core xeon system, let us also assume i want to split the output file to 80MB chunks.
| NOTE: To use the advanced functions below, please make sure you use the p7zip-full package on linux, otherwise not everything is supported | |||||||||||||||||||||
| 1- 7Z archive with bzip2 compression to compress the folder /etc/myfolder and all its contents into /home/me/anarchive.7z, using multithreading to bzip2 compress, left MT | |||||||||||||||||||||
| On Linux | 7z a -t7z /home/me/anarchive.7z /etc/myfolder -v80m -m0=bzip2 -mmt=4 | ||||||||||||||||||||
| On Windows | 7z a -t7z k:\anarchive.7z C:\Users\user1\Desktop -v80m -m0=bzip2 -mmt=4 | ||||||||||||||||||||
| Explained for Linux |
|
||||||||||||||||||||
| 2- Extracting a multipart archive with 7-zip, the .001 is appended when you make a multi file (spanning) archive | |||||||||||||||||||||
| On linux | 7z x /home/me/anarchive.7z.001 | ||||||||||||||||||||
Now that was easy was it not ? BZIP2 without 7ZIP
Most linux installations come with bzip2 pre installed, You can compress A SINGLE FILE by using a command such as
bzip2 -z /myfolder/myfile.img
and bzip2 will make the file /myfolder/myfile.img.bz2 then delete the source file, if you don't want bzip2 to delete the source file you will need to use the k switch as follows
bzip2 -z -k /myfolder/myfile.img
but what about folders, How can i compress a folder not a file ?
In this case we will need to create a .tar file first
tar --create --verbos --file=/mydirectory1/myfile.tar /mydirectory2/directory3
Then, we can execute bzip2 on the file /mydirectory1/myfile.tar in order to get /mydirectory1/myfile.tar.bz2
Then you ask, What about multi processor support (Multi threading)
Bzip2 is not multi threaded, and will not allow you to use multiple processors to compress a file at a time
So you will need to use another tool called pbzip2 and run it as follows
pbzip2 -p8v somefile
in order to utilize 8 processors
just like bzip2 k means keep original file, d is decompress v is tell me what is happening and all the other things are shown here
Usage: pbzip2 [-1 .. -9] [-b#cdfklp#qrtV] <filename> <filename2> <filenameN>
-b# : where # is the file block size in 100k (default 9 = 900k)
-c : output to standard out (stdout)
-d : decompress file
-f : force, overwrite existing output file
-k : keep input file, don't delete
-l : load average determines max number processors to use
-p# : where # is the number of processors (default: autodetect)
-r : read entire input file into RAM and split between processors
-t : test compressed file integrity
-v : verbose mode
-V : display version info for pbzip2 then exit
-1 .. -9 : set BWT block size to 100k .. 900k (default 900k)
Example: pbzip2 -b15vk myfile.tar Example: pbzip2 -p4 -r -5 myfile.tar second*.txt Example: pbzip2 -d myfile.tar.bz2
What about all the fancy piping and doing everything in one line ?
Well, they are simple really, But the syntax of the Linux command line (Shell, Born again shell or bash) is beyond the scope of this, those are Bash specific tricks you will pick up as you gain familiarity with Linux
Although there are many more compression algorithms, formats, and containers, i only covered the two mentioned above because they are the ones you will need in 90% of the situations, I will cover more as soon as time permits
Copyright, V.CheapDomains 2010