2018 Comparison of Popular Archive Utility
2018 comparison of popular archive / backup / compression utility: tar / ZIP / gzip / RAR / 7-zip
Definition of Backup vs Archive
An archive is a collection of historical records that are kept for long-term retention and used for future reference. Typically, archives contain data that is not actively used.
A backup is a copy of a data set, while an archive holds original data that has been removed from its original location (Dorion 2008).
There is not much difference between the two, so the main point of an archive is to hold data that is meant to be removed from its active state.
TAR
Tar is an archive utility, that simply stacks files sequentially from input or as arguments in a single file, with a small payload at the end to store file structure and start/end of each file within the archive. Developed for UNIX in 1979 for tape archive recorders (Deutsch and Aladdin Enterprises 1996). Cannot span files, cannot compress, cannot encrypt, cannot backup unnamed pipes. Compression is achieved on the fly by piping the output with compress or gzip. Modern versions of tar are now linked to local UX compressors such as compress or gzip, and there are many forks such as gtar (GNU tar). It’s confusing because most of the time the tar command keeps the same name and one cannot guess its capabilities unless one prints out its usage with tar --help
.
Option arguments do not always require dashes (“Tar(5) — Format Of Tape Archive Files” 2004).
tar usage (truncated)
Usage: tar [OPTION...] [FILE]... GNU 'tar' saves many files together into a single tape or disk archive, and can restore individual files from the archive. Examples: tar -cf archive.tar foo bar # Create archive.tar from files foo and bar. tar -tvf archive.tar # List all files in archive.tar verbosely. tar -xf archive.tar # Extract all files from archive.tar. Main operation mode: -A, --catenate, --concatenate append tar files to an archive -c, --create create a new archive -d, --diff, --compare find differences between archive and file system --delete delete from the archive (not on mag tapes!) -r, --append append files to the end of an archive -t, --list list the contents of an archive --test-label test the archive volume label and exit -u, --update only append files newer than copy in archive -x, --extract, --get extract files from an archive Operation modifiers: --check-device check device numbers when creating incremental archives (default) -g, --listed-incremental=FILE handle new GNU-format incremental backup -G, --incremental handle old GNU-format incremental backup --ignore-failed-read do not exit with nonzero on unreadable files --level=NUMBER dump level for created listed-incremental archive -n, --seek archive is seekable --no-check-device do not check device numbers when creating incremental archives --no-seek archive is not seekable --occurrence[=NUMBER] process only the NUMBERth occurrence of each file in the archive; this option is valid only in conjunction with one of the subcommands --delete, --diff, --extract or --list and when a list of files is given either on the command line or via the -T option; NUMBER defaults to 1 --sparse-version=MAJOR[.MINOR] set version of the sparse format to use (implies --sparse) -S, --sparse handle sparse files efficiently
tar examples
Output file list piped to tar:
<command that produce file list> | tar cf backupfile.tar -
list content of tar file:
tar tvf backupfile.tar
backup recursively a directory:
tar cf backupfile.tar /path/to/backup
backup recursively a directory + gzip max compression:
tar cf - /path/to/compress | gzip -9 > backupfile.tar.gz
backup with gzip and rename directories inside the backup file:
tar zcvf backupfile.tar.gz --transform=s/path2rename/newName/ path2backup
ZIP
Zip is created in 1989 by Phil Katz to replace other concurrent formats such as ARC, traditionally uses DEFLATE compression (just like gzip). It has been greatly developed, maintained, and openly documented by PKWARE (PKWARE 2017). It is a de facto industry standard, and handles now more algorithms, supports file spanning and encryption (Zip-Crypto and AES). It’s the most popular file format used around the world, because DEFLATE is so fast when creating archives. It has been implemented in OS like MacOS and Windows to create on-the-fly compressed directories in their explorer. It’s primary behavior makes it a backup utility (Adler 2008).
Algorithms:
- Deflate Standard LZ77-based algorithm
- Deflate64 Standard LZ77-based algorithm
- BZip2 Standard BWT algorithm
- PPMD Dmitry Shkarin’s PPMdH with small changes
- LZMA Improved and optimized version of LZ77 algorithm
zip usage
UX zip utility is divided in 2 binaries: zip to compress, unzip to decompress:
Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license. Zip 3.0 (July 5th 2008). Usage: zip [-options] [-b path] [-t mmddyyyy] [-n suffixes] [zipfile list] [-xi list] The default action is to add or replace zipfile entries from list, which can include the special name - to compress standard input. If zipfile and list are omitted, zip compresses stdin to stdout. -f freshen: only changed files -u update: only changed or new files -d delete entries in zipfile -m move into zipfile (delete OS files) -r recurse into directories -j junk (don't record) directory names -0 store only -l convert LF to CR LF (-ll CR LF to LF) -1 compress faster -9 compress better -q quiet operation -v verbose operation/print version info -c add one-line comments -z add zipfile comment -@ read names from stdin -o make zipfile as old as latest entry -x exclude the following names -i include only the following names -F fix zipfile (-FF try harder) -D do not add directory entries -A adjust self-extracting exe -J junk zipfile prefix (unzipsfx) -T test zipfile integrity -X eXclude eXtra file attributes -y store symbolic links as the link instead of the referenced file -e encrypt -n don't compress these suffixes -h2 show more help
unzip usage
UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send bug reports using http://www.info-zip.org/zip-bug.html; see README for details. Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir] Default action is to extract files in list, except those in xlist, to exdir; file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage). -p extract files to pipe, no messages -l list files (short format) -f freshen existing files, create none -t test compressed archive data -u update files, create if necessary -z display archive comment only -v list verbosely/show version info -T timestamp archive to latest -x exclude files that follow (in xlist) -d extract files into exdir modifiers: -n never overwrite existing files -q quiet mode (-qq => quieter) -o overwrite files WITHOUT prompting -a auto-convert any text files -j junk paths (do not make directories) -aa treat ALL files as text -U use escapes for all non-ASCII Unicode -UU ignore any Unicode fields -C match filenames case-insensitively -L make (some) names lowercase -X restore UID/GID info -V retain VMS version numbers -K keep setuid/setgid/tacky permissions -M pipe through "more" pager See "unzip -hh" or unzip.txt for more help. Examples: unzip data1 -x joe => extract all files except joe from zipfile data1.zip unzip -p foo | more => send contents of foo.zip via pipe into program more unzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer
zip examples
Create an archive:
zip archive.zip file(s)
Create an archive recursively
zip -r archive.zip directory(s)
List archive content:
unzip -l archive.zip
Check integrity of an archive:
zip -T archive.zip
unzip -t archive.zip
Extract an archive:
unzip archive.zip
Update (refresh) files inside an archive:
zip -u archive.zip file(s)
Zip list of files returned from the find command:
find . -name "pattern" -print | zip archive.zip -@
gzip
Gzip is an inline compression utility, that compresses redirects from terminal, or that compresses files in place. Algorithm used is DEFLATE, developed in 1992 to replace the compress program from early UNIX systems (J.-L. Gailly 2003). “G” stands for GNU, superseded by the Free Software Movement (Free Software Foundation 2018). Because of its on-the-fly abilities, this format is used today as the standard HTTP compression offered by every web servers available today (J. Gailly 2017). It doesn’t supports file spanning. It’s primary behavior (in-place compression) makes it an archive utility.
Algorithms:
- Deflate Standard LZ77-based algorithm
gzip Usage
Usage: gzip [OPTION]... [FILE]... Compress or uncompress FILEs (by default, compress FILES in-place). Mandatory arguments to long options are mandatory for short options too. -c, --stdout write on standard output, keep original files unchanged -d, --decompress decompress -f, --force force overwrite of output file and compress links -h, --help give this help -k, --keep keep (don't delete) input files -l, --list list compressed file contents -L, --license display software license -n, --no-name do not save or restore the original name and time stamp -N, --name save or restore the original name and time stamp -q, --quiet suppress all warnings -r, --recursive operate recursively on directories -S, --suffix=SUF use suffix SUF on compressed files -t, --test test compressed file integrity -v, --verbose verbose mode -V, --version display version number -1, --fast compress faster -9, --best compress better --rsyncable Make rsync-friendly archive With no FILE, or when FILE is -, read standard input.
gzip Examples
Compress a file in place and delete original file (produces file.gz):
gzip file
Compress file to file.gz and keep original:
gzip -k file
gzip -c file > file.gz
Decompress an archive and remove it:
gzip -d file.gz
Compress output of an sqldump into a gziped file:
mysqldump --opt <database> | gzip -c > database.sql.gz
RAR
Another backup compression utility developed by a Russian engineer in 1993, with a proprietary, licensed algorithm (win.rar GmbH 2018). It’s been updated over time so it can decompress a variety of format. Supports encryption and file spanning (win.rar 2018). On UX systems, rar utilities are split in two: while unrar is publically available, installing rar requires an additional proprietary repository, which few users care about since xz and 7zip are free and preferred. Because of its license, rar usage is clearly plummeting overall.
Algorithms:
- RAR proprietary
- LZSS Lempel-Ziv
- Deflate Standard LZ77-based algorithm
Unrar usage
UNRAR 5.30 beta 4 freeware Copyright (c) 1993-2015 Alexander Roshal Usage: unrar <command> -<switch 1> -<switch N> <archive> <files...> <@listfiles...> <path_to_extract\> <Commands> e Extract files without archived paths l[t[a],b] List archive contents [technical[all], bare] p Print file to stdout t Test archive files v[t[a],b] Verbosely list archive contents [technical[all],bare] x Extract files with full path <Switches> - Stop switches scanning @[+] Disable [enable] file lists ad Append archive name to destination path ag[format] Generate archive name using the current date ai Ignore file attributes ap<path> Set path inside archive c- Disable comments show cfg- Disable read configuration cl Convert names to lower case cu Convert names to upper case dh Open shared files ep Exclude paths from names ep3 Expand paths to full including the drive letter f Freshen files id[c,d,p,q] Disable messages ierr Send all messages to stderr inul Disable all messages kb Keep broken extracted files n<file> Additionally filter included files n@ Read additional filter masks from stdin n@<list> Read additional filter masks from list file o[+|-] Set the overwrite mode ol[a] Process symbolic links as the link [absolute paths] or Rename files automatically ow Save or restore file owner and group p[password] Set password p- Do not query password r Recurse subdirectories sc<chr>[obj] Specify the character set sl<size> Process files with size less than specified sm<size> Process files with size more than specified ta<date> Process files modified after <date> in YYYYMMDDHHMMSS format tb<date> Process files modified before <date> in YYYYMMDDHHMMSS format tn<time> Process files newer than <time> to<time> Process files older than <time> ts<m,c,a>[N] Save or restore file time (modification, creation, access) u Update files v List all volumes ver[n] File version control vp Pause before each volume x<file> Exclude specified file x@ Read file names to exclude from stdin x@<list> Exclude files listed in specified list file y Assume Yes on all queries
Winrar examples
On windows, installing the Winrar software also gives access to some command line utilities that can compress and decompress using RAR algorithm. unrar command uses the same arguments as on UX systems:
Create a zip backup:
winrar a -afzip backup.zip file(s)
Create a rar backup:
winrar a -r backup.rar file(s)
Create a rar backup recursively:
winrar a -r backup.rar path
Test backup integrity:
unrar t backup.rar
List backup content:
unrar va backup.rar
Extract specific files in a backup file + directory structure:
unrar x backup.rar *.ext [extractfolder\]
Extract backup without directory structure:
unrar e backup.rar [extractfolder\]
7-zip
7zip is a modern, free (Pavlov 2018a) backup compression utility developed in 1999 by another Russian engineer. It uses a variety of algorithms to compress and decompress many backup formats including tar. It can only unRAR archives due to licensing (Pavlov 2018b). Its flagship algorithm is LZMA, and today LZMA2 (Pavlov 2018c). It offers the best compression ratio available on the market, at the cost of speed though. It uses a huge dictionary size with new coding techniques, which explains the ratio, and also why it’s so slow while compressing. Surprisingly, decompression is almost as fast as DEFLATE. Supports file spanning and AES encryption.
One can install p7zip for Linux, but there exists a more common utility called xz that also uses LZMA/LZMA2 compression (Collin 2016). Just like gzip, it compresses data streams: no ability to store multiple files (Himanshu 2012).
Algorithms:
- LZMA2 Improved version of LZMA
- LZMA Improved and optimized version of LZ77 algorithm
- PPMD Dmitry Shkarin’s PPMdH with small changes
- BZip2 Standard BWT algorithm
- BCJ Converter for 32-bit x86 executables
- BCJ2 Converter for 32-bit x86 executables
- Deflate Standard LZ77-based algorithm
p7zip: Differences between 7z, 7za and 7zr binaries
The package p7zip usually includes three binaries, 7z, 7za, and 7zr (Vinet and Griffin 2017). Their differences are:
- 7z: 7z uses plugins to handle archives.
- 7za: is a stand-alone executable. 7za handles fewer archive formats than 7z, but does not need any other plugin.
- 7zr: is a stand-alone executable. 7zr does not need any other plugin, and is a light-version of 7za that only handles 7z archives.
7zip Usage
7-Zip (a) 9.38 beta Copyright (c) 1999-2014 Igor Pavlov 2015-01-03 p7zip Version 9.38.1 (locale=C,Utf16=off,HugeFiles=on,2 CPUs) Usage: 7za <command> [<switches>...] <archive_name> [<file_names>...] [<@listfiles...>] <Commands> a : Add files to archive b : Benchmark d : Delete files from archive e : Extract files from archive (without using directory names) h : Calculate hash values for files l : List contents of archive rn : Rename files in archive t : Test integrity of archive u : Update files to archive x : eXtract files with full paths <Switches> -- : Stop switches parsing -ai[r[-|0]]{@listfile|!wildcard} : Include archives -ax[r[-|0]]{@listfile|!wildcard} : eXclude archives -bd : Disable percentage indicator -i[r[-|0]]{@listfile|!wildcard} : Include filenames -m{Parameters} : set compression Method -o{Directory} : set Output directory -p{Password} : set Password -r[-|0] : Recurse subdirectories -scs{UTF-8|UTF-16LE|UTF-16BE|WIN|DOS|{id}} : set charset for list files -sfx[{name}] : Create SFX archive -si[{name}] : read data from stdin -slt : show technical information for l (List) command -so : write data to stdout -ssc[-] : set sensitive case mode -t{Type} : Set type of archive -u[-][p#][q#][r#][x#][y#][z#][!newArchiveName] : Update options -v{Size}[b|k|m|g] : Create volumes -w[{path}] : assign Work directory. Empty path means a temporary directory -x[r[-|0]]]{@listfile|!wildcard} : eXclude filenames -y : assume Yes on all queries
Examples
Create a backup:
7z a backup.7z archiveDir
List backup content:
7z l backup.7z
Check integrity of a backup:
7z t backup.7z
Extract a backup:
7z e backup.7z
Update (refresh) files inside a backup:
7z u backup.7z backupDir
Create a max compression LZMA backup with tar and xz on the fly:
tar -cf - path/ | xz -9 -c - > archive.tar.xz
Comparison
A thorough comparison is indeed found on Wikipedia, but here is a quick facts comparison table, based on my personal experience:
[table-wrap bordered=”true” striped=”true”]
Name | Recommended Extension | Algorithm | Cost | Compression speed | Compression ratio |
Tar (original) | tar | None | Free | Fastest | None |
Zip | zip | DEFLATE | Free | Fast | Average |
Gzip | Gz | DEFLATE | Free | Fast | Average |
Rar | Rar | RAR | Licensed | Slow | Best |
7-zip xz | 7z xz | LZMA2 | Free | Slowest | Bestest |
[/table-wrap]
[callout type=”info” size=”lg”]
Notes:
- Compression ratios are subjective to the data compressed: none of these software do good on media files for instance.
- The comparison above is valid between the utilities mentioned only.
- tar, and at some extent, any compression software, actually do not require a file extension. However it’s best practice to use them for good file management.
[/callout]